January 27, 2018

Automation server for Qlik Sense

Qlik Sense is in many ways a more advanced platform than its predecessor, QlikView. Scalability, rich APIs, enterprise-level administration -- there are many features of a good architecture in it. However, what can be challenging for Qlik Sense customers (besides dealing with rudimentary data visualization) is automation. QlikView had embedded VBA scripting engine, which let designing automation scenarios initiated by users, but Qlik Sense doesn't have it. Disabled in the standard mode EXECUTE statement only aggravates the situation.

In this article, I'm proposing to extend Qlik Sense's capabilities with an automation server, based on EasyMorph Server. Such extension significantly simplifies a whole range of automation scenarios initiated by Qlik Sense user, that are difficult or non-trivial to implement otherwise. For instance:
  • Database writeback based on current selection in Qlik Sense.
  • One-click export of a subset of data from a Qlik Sense app into an external system or a disk folder.
  • Sending personalized emails with attached customized data extracts from Qlik Sense.
  • Downloading a particular file from a web-site and dynamically adding its content to a Qlik Sense app.
  • Automated data quality checks of incoming source data with rule-based email notifications.
The integration mechanism between Qlik Sense and EasyMorph Server is based on REST API and Websockets (see the diagram below):

Click to zoom

Actions are initiated by a Qlik Sense user by clicking a dynamically generated hyperlink, or an extension button. This triggers an EasyMorph Server task which runs an EasyMorph project with specified parameters (passed through the hyperlink or extension). The project performs required actions with external files and systems. Finally, the task status is reported back into the Qlik Sense app that initiated it. Alternatively, the task initiates a full or partial reload of the app using the REST API. 

A few benefits of such integration scheme:
  • In one Qlik Sense application there can be multiple action buttons that initiate different actions.
  • It works well with tiered Qlik Sense apps, where one app is for ETL, another for building a data model, and another one for the UI objects.
  • Closed loop feedback: task status and result are reported back to the user. If the task fails the errors will be reported to the user as well.
  • Task parameters can be assigned dynamically using Qlik Sense variables and expressions.
  • The action server can be hosted on a different machine thus reducing exposure of the Qlik Sense sever.
At this point we're half-way to implementing the integration scheme described above. This means that some of its elements are already in place, while other are in active development and will be released soon. As of version 3.7.1 already available:
The tool set described above is already suitable for adding automation capabilities to Qlik Sense apps. For instance database writebacks, extract generation or email sendouts are already possible by using dynamic hyperlinks in Qlik Sense applications. For better user experience and more advanced automation capabilities, a few more features are being developed and planned for release in version 3.8:
  • An interactive Qlik Sense app extension for triggering EM Server tasks and monitoring task status and errors in real-time.
  • The Qlik Sense Command transformation for triggering reloading Qlik Sense apps and QMC tasks right from EasyMorph projects.
  • Fetching emails and processing attachments.
With the addition of these features, the full integration scenario described in this article becomes possible. Besides that, EasyMorph will be able to work as a visual data transformation tool for Qlik Sense:

Click to zoom
In this case, a user triggers (through a link or extension) an EasyMorph task that generates QVD files (one file per one table in data model) and initiates reloading of the Qlik Sense app that called it. The app loads the generated QVDs.

If you would like to talk about about automation for Qlik Sense, send me an email (you can find my address in the upper right area of this blog or here).

To receive future updates on EasyMorph and its integrations with Qlik and other systems, subscribe to our newsletter on the download page.

January 4, 2018

EasyQlik QViewer acquired by Rob Wunderlich



Effective January 1st, 2018 EasyQlik QViewer has been acquired by Rob Wunderlich. 

I believe it's a great outcome for the product, its users and customers. It was a bit challenging for me to keep focus on QViewer and EasyMorph simultaneously, which resulted in a slower development pace for QViewer. It's hard to imagine a better new owner than Rob who is well known in the Qlik community and who surely has a great vision on what would make QViewer even more useful.

From now on, the existing licensed QViewer customers should contact support@panalyticsinc.com for all questions related to QViewer. The website http://easyqlik.com keeps operating as usually.

I, from now on, focus solely on EasyMorph.

Read also Rob's statement on the acquisition.

October 19, 2017

Tableau Maestro vs 3rd Party Data Prep Tools

If you haven't seen Tableau Maestro -- you should. I've seen the demo shown at the Tableau Conference 2017 and it's pretty cool (sorry, can't find a publicly available video). It's obvious that someone from the product management team has done a good job trying to address common challenges of data preparation (such as incorrect joins) in a visual way. Of course, Maestro is still in its infancy, but its introduction raises interesting questions. First of all, what does it mean for 3rd party data preparations tools, that target Tableau users?

Tableau Maestro. This screenshot belongs to tableau.com

Before I go further let me classify the existing data transformation offerings:

Personal Data Preparation Tools
These are rather simple applications that allow performing basic operations such as cleansing, filtering, merging in a linear and non-parameterized way. While they're visual and target non-technical audience their applicability is usually pretty limited as they don't support non-linear workflows (a must have for anything non-trivial), have no means of automation and integration (e.g. running external applications) and have a limited set of available transforms. On the positive side, they're usually reasonably priced and easy to start with.

Departmental Data Transformation (ETL) Applications
Applications in this category are full-featured, rather capable ETL programs that allow designing non-linear workflows (that typically look like a block diagram where inputs and outputs of blocks are connected with arrows), integrate with external applications, and run parameterized tasks on schedule. They are way more capable than the personal data prep tool described above, while still remaining rather affordable. However the vast majority of them have one big flaw that renders them barely useful for Tableau audience -- they are too IT/DBA/SQL-centric and therefore simply not suitable for an average Tableau user. Unless s/he wants to dive into topics such as the nuances of differences between CHAR, VARCHAR and NVARCHAR data types on a daily basis (hint: it's not much fun).

EasyMorph, the data transformation tool I've designed and produced, technically also belongs to the Departmental ETL category. However unlike most ETL tools, it's designed from the ground up for non-technical users first of all, which required walking away from the traditional approach to ETL and re-thinking data transformation from scratch.

Enterprise ETL platforms
These are mastodons. In terms of features, scale, and of course, price. Big guns with a big price tag. Most of them are also heavily IT-centric, however some Enterprise ETL platforms (e.g. Alteryx and Lavastorm) have managed to become closer to non-technical users than the rest of the group. The exorbitant cost of licenses in this category severely restricts number of people that can use it for self-service data transformation within an organization. Especially, taking into account that in many cases they are used for departmental (and even personal) ETL, not enterprise, which is clearly overkill. After all, having 75-80% of revenue reinvested into sales and marketing allows hiring very skilled sales people :)

Now, where does Maestro fit in this classification? While it's still in beta, and no final product has been demonstrated yet I probably wouldn't be terribly off base if I assume that Maestro is a personal data preparation tool (probably with a non-linear workflow). Which means that Maestro, once released, would leave very little room for 3rd party software vendors in this category, especially if offered for free. Many will have simply to leave the market.

OK, what about EasyMorph then? I believe Maestro is a good thing for EasyMorph. While some our potential users might not realize at first that the two tools are in different categories, the introduction of Maestro actually makes EasyMorph a big favor:

1. It proves that good data analysis requires good data preparation. Tableau developers are incredibly creative people. It never ceases to amaze me what kinds of hacks and workarounds they use in Tableau in order to bring the source data into necessary shape. However, in many cases a decent non-linear data transformation tool would make this task straight forward, maintainable and debuggable.

2. It introduces the idea of a dedicated data transformation tool for wide audience. When pitching EasyMorph to various organizations I noticed that the idea of a specialized data transformation tool is not familiar to the non-technical audience. Developers understand the idea of a dedicated ETL tool and the benefits such a tool can provide. But business users (who comprise the biggest part of the Tableau user base) usually have hard times understanding the whole idea of visual data transformation. Maestro solves this task for us. With the power of Tableau's marketing :)

Someone said, that it's Apple and Steve Jobs who have taught smartphone users to buy apps and music instead of pirating it from somewhere. Apple's AppStore and iTunes have changed the mindset. I believe that Maestro will discover to many Tableau fans the convenience and power of visual self-service data preparation.

3. It makes it easier for us to explain to Tableau audience what EasyMorph is.  Now it's plain simple: "EasyMorph is Maestro on steroids". The more people will use Maestro, the more people will buy into the benefits and convenience of visual programming (yes, Ladies and Gentlemen, it's visual programming), so that EasyMorph would be a logical next step for Maestro users once the complexity of required calculations grows beyond trivial.

PS. It's interesting to see that the "data kitchen" concept that I wrote about almost 2 years ago has been materializing more and more.

September 17, 2017

Excel is incredibly successful application development platform



The more I think about Excel the more it looks to me as an application development platform, probably the most successful ever.

My first programming language was BASIC. Therefore the user interface of my first programs was basically a sequence of questions, which required typing in answers. PRINT this, INPUT that, IF/THEN, GOTO here. I started reading books on computer languages, learned Pascal, C++, ASM and eventually went to study Computer Science. A rather standard story of a software developer.

For many people far from software development their first programs looked radically different. They didn't even know they were programs. Because they were spreadsheets.

Modern Computer Science identifies 3 programming concepts:
  • Imperative (e.g. BASIC, C++, Python)
  • Functional (e.g. List, ML, Haskell, OCaml, F#)
  • Declarative (e.g. SQL, Prolog)
While there are heated debates among CS enthusiasts whether Excel falls into the declarative or functional programming concept, or it's a separate category on its own, most people never think of spreadsheets as of any kind of programming at all. But that didn't stop them from creating their first application without even knowing about it, because it was not a program in traditional sense, it's a spreadsheet. Millions of people develop applications and don't even realize it. Isn't it amazing?

If you think about it for a moment, a spreadsheet is a great way of creating applications. No need to deal with UI libraries, dialogs and forms -- just write in a cell what you need where you need it and you're done. You don't even have to bother with aligning objects -- everything is already aligned. A spreadsheet is visual and reactive -- whatever change you make, the result is immediately visible. It has only two abstractions: a cell, and a formula, and therefore is very easy to learn.

Yes, spreadsheet apps don't have the polished, productionalized look of "real" applications. But my first BASIC program didn't look that either. Yet it did its job. One can argue that spreadsheets are not suitable for general purpose programming. But neither is SQL, yet it's great for its purpose.

Spreadsheets allowed millions of people without programming skills create practical, useful applications that solve real life problems. If this is not amazing success for an application platform, then what is?

May 30, 2017

Understanding bar charts in QViewer listboxes

Starting from version 3 listboxes in QViewer have green bars beneath counts, like in the image below:



It may look as the bars are proportional to counts with 100% equal to the max count in the column. However, it's not the case. If you look closely at the screenshot above you can notice that values 02 and 08 have full bars (i.e. 100%) but their counts are not equal (2,236 and 2,922 respectively). More than that -- when no filter applied, all bars are at 100% regardless of the counts. OK, what do they mean then?

Each bar shows the selection ratio for each value and its value is calculated as count for current selection / total count of that particular value.

The best way to understand the bars is to think of them as of an inclusion indicator. They show the degree to which the current selection includes particular value: 0 means current selection doesn't include the value at all, while 100% means that current selection includes all rows with the value. That's why when no filters applied all bars are at 100%. For instance if dataset has field "Gender" with two possible values: "Male" and "Female", the bars would show to what degree the current selection includes males and females present in the dataset.

April 17, 2017

QViewer ver.3.3 is out

QViewer version 3.3 is now available for downloading. Here is what's new and exciting in the new version:

Support for QVD files larger than 2GB
Now you can open QVD files basically of any size. It was tested on QVD files up to 20GBs. Technically, at this point the only limitation is 2 billion rows per file. QViewer aggressively parallelize calculations so you may want to give it some time to perform necessary counts after opening a large file, because during that time CPU utilization can be near 100% which would make the application less responsive.

New Partial Load workflow
In the new version the workflow for partial load was significantly changed. The Partial Load button has been removed. Instead, when a file larger than certain threshold is opened QViewer suggests  performing a partial load (see the screenshot below). The threshold can be set at 512MB, 1GB, 2GB, 4GB or 8GB.



Indication of metrics affected by partial load
When a partial load is performed some metrics calculated by QViewer can be distorted because the loaded dataset is incomplete. To avoid confusion QViewer now shows which metrics are affected by partial load (shown red) and which are not.



Comments in generated script
When generating a LOAD statement, QViewer can now insert field comments obtained from the XML header. These comments are typically created using COMMENT FIELD or COMMENT TABLE statements in QlikView / Qlik Sense before exporting a table into QVD file.

Download QViewer

UPDATE 4/23/2017
I'm considering further improvements to QViewer. The ideas are floating around a few topics: table inspection for Qlik Sense Server, [shareable] selection bookmarks, support for other file formats (e.g. QVX, CSV, or XLSX), aggregation with built-in pivot tables (with selections applied). However, I'm not sure what would be the most useful for the Qlik dev community. I would appreciate hearing it from you -- what would you like to see in future versions of QViewer? Feel free to send me your suggestions (see my email in the upper right corner of this blog), or just leave a comment below.

April 9, 2017

Websites should offer downloading micromodels instead of CSV dumps

Many websites such as Google Analytics or PayPal allow users to download records (transactions) as CSV files or, sometimes, Excel spreadsheets. While it's more convenient than copy-pasting-parsing HTML from web-pages it's still not optimal because such dumps usually lack many details. For instance, when I download PayPal transactions I would also want to see product items, so that I could make a break down of sales by products. Or see customer addresses, so that I could analyze them from a geospatial perspective. Since the CSV dumps are denormalized, adding all the details and attributes would dramatically increase file sizes and also clutter them.

What I'm suggesting is that instead of CSV dumps websites should allow downloading micromodels -- user-related subsets of a bigger data model used by the web-service itself. Such micromodels would contain several linked normalized tables with only data relevant to the user who requested it. From a technical standpoint it can be one SQLite file generated on the fly. The SQLite file format allows packing multiple tables into one file which can hold millions of records.

Having a relational micromodel would allow more meaningful and interesting data analysis. It would play well with popular data analysis tools (except Excel which is poorly suited to work with relational data by design). Support for SQL queries would immediately make it compatible with vast amount of systems.

For information providers, an SQLite file with micromodel would be of size similar to current CSV dumps so it won't increase workload and traffic. Also generating micromodels can be even faster than generating CSV dumps since it won't require joining multiple tables in order obtain a denormalized view.

Below is an example of a possible micromodel schema for PayPal.