April 8, 2013

First look at QlikView Direct Discovery

Direct Discovery is one of the recent and long awaited features of QlikView which became available starting from version 11.2. Direct Discovery allows mixing data from external data sources and from QlikView's in-memory data engine using the same QlikView scripting language and expressions. Aim of Direct Discovery is to go beyond RAM limitations and enable real-time querying against large volumes of data outside of QlikView, while presenting the same associative experience as for regular in-memory data. Here are two good documents that describe it in more details: QlikView Direct Discovery Datasheet and QlikView Direct Discovery FAQ.

From development standpoint, Direct Discovery can be employed by two ways: using DIRECT SELECT statement in loading script and SQL( ) function in expressions.

Example of DIRECT statement:

Tab1:
DIRECT (Stale after 1 minute) SELECT
    FieldA,
    FieldB
IMPLICIT
    Metric1

FROM DBTable;

After this statement is executed, Tab1 becomes associated with DBTable. If you look at data model in QlikView (by pressing Ctrl+T) you see that Tab1 is empty -- it has no rows. However, if you build a straight table with, say, FieldA and Metric1 -- it is filled with values that are pulled dynamically from the database.

By default all fields form DBTable are implicitly available for all expressions, even if they are not mentioned in SELECT statement. And here bizarre things start -- you can't use these fields in non-dimensional objects (i.e. text boxes). You should use IMPLICIT keyword to explicitly declare these fields as implicit, otherwise they're implicitly implicit (weird, right?). Only then they become available for text objects, chart titles, etc. Second effect of using IMPLICIT is that only the fields after IMPLICIT keyword will be available for expressions, and not all fields from DBTable, as it would be without IMPLICIT part.

DBTable will be queried every time when an object with one of the fields needs to be redrawn. To reduce number of queries STALE condition can be used to cache query results. Also keep in mind that if source data in DBTable has changed QlikView UI objects do not change automatically (contrary to Dynamic Updates) until some user action forces redraw.

SQL function allows the use of native SQL functions in QlikView expressions. E.g.:

SQL(‘count(Orders)’)
SQL(‘checksum_agg(OrderKey)’)


Here 'count(Orders)' is an SQL function and is calculated by source database, not QlikView. Second example has checksum_agg -- this function itself is barely useful for anything in QlikView, but I'm showing it here just to give an example of non-QlikView function.

Direct Discovery as it is now has lots of restrictions and limitations. Some of the limitations:
  • No Section Access and data reduction based on DD tables
  • No Binary Load of DD tables
  • No GROUP BY, WHERE in DIRECT SELECT SQL queries (use views as target for queries)
  • No Pivot Charts, Table Boxes and mini-charts 
  • No Calculated Dimensions 
  • No Alternate States
I also noticed when a chart is based on DD table only, no Set Analysis in expressions possible -- it's simply ignored. Things change when DD table is linked with in-memory table, although I would avoid using DD fields as chart dimensions at all.

Unfortunately, it's not the limitations that are the saddest part of the story. It's reliability and performance. I'll elaborate:

For testing I took QV 11.2 SR1 and fresh installation of SQL Server with AdventureWorksDW2012 sample database. All ran on 8GB 2CPU Amazon instance. Data model consisted of three tables -- a fact table loaded using DIRECT, the same fact table loaded as usually in-memory (as data island),  and one more in-memory dimension table linked to the first table. Fact table contained about 60'000 records. UI part consisted of three charts and three text boxes. I compared the same set of expressions on both fact tables to find out differences in behavior.



First, I constantly experienced crashes. When interacting with objects with DD fields, QlikView hung so often, that I was never sure if my next selection would hang the application or not. I had troubles demonstrating Direct Discovery to my colleagues because of constant crashes. I'm happy I didn't show it to customers.

Second, as we know 60K rows is nothing for QlikView. Even 6 millions is nothing -- response rate is still sub-second and that's what we love QlikView for. But not in case of Direct Discovery -- I've experienced substantial delays with changing selections in the test application. I haven't measured them, but 5-10 seconds would be not far from truth. Sometimes faster, sometimes even longer. And it's only 60K rows. Database hardly was a bottleneck as it executed manually entered queries (via SQL Server Management Studio) reasonably fast.

Who knows, maybe it was something wrong with my test environment. Maybe, I had to tweak some settings or something. Maybe, someone had better experience with DD, but it wasn't my case. Again, I took the most recent version of QlikView and made fresh install of MS SQL Server. I expected Direct Discovery to work reliably out of the box in this simple test, but it didn't. And that was disappointing.

Therefore, my first impression is as follows: Direct Discovery is slow, unreliable and is not ready for production use. Which makes me think that among major features introduced in QlikView during the last 3 years, perhaps only Alternate States works fine. Extensions slowly becoming more or less reliable, but JS API is still very limited and buggy (extension developers know it).  Annotations are flawed by design (that's a separate story). Dynamic Updates and now Direct Discovery are unstable and barely usable. I'm not talking about things that need to be redesigned (or removed), like ugly UI object properties forms a-la 1990s or overly basic reports -- this should have been fixed long time ago.

I've strong impression that QlikTech was quickly blinded by early success and praise from market analysts. QlikTech pretended to "disrupt BI industry" and be "the Apple for BI", as we remember. It released sarcastic promo videos claiming "we outclassed the competition", "we left mammoths of yesterday behind", "we're the amazing ones".

Not so fast, gentlemen -- from the outside the picture is different. In order to "outclass" the competition, QlikTech needs to at least keep software quality and innovation pace on a par with that competition, however it's not the case (look at Tableau!). Any innovation is useless if it simply doesn't work. Alas, quality of software development at QlikTech leaves much to be desired nowadays. QlikView has accumulated a lot of technical debt and keeps adding to it further. Current state of Direct Discovery seems to only be in line with this trend.