So by now you’ve heard all about Yellowfin’s in memory database (IMDB) and how it may quickly and painlessly improve the speed of your reporting and data analysis, but how does it actually work? As one Yellowfin programmer would say: “Computers. It works in the computers.” If you’re interested in a bit more of a thorough breakdown than that, though, we have it here.
First things first, we need to understand how “view caching” works in Yellowfin 5. The query for an entire view is run on the source database, and then all of the data is dumped into a flat table in a target database, essentially the view data has been completely denormalised and moved to a new database. Reports which are run on this view will now reference the flat table.
For some views we may get significant performance gains through the caching process simply because of the elimination of the joins. Obviously there’s a reason we normalised our data in the first place, though: flattening the data will create a lot of redundancy and potentially massively blow out the storage space required. However, making this process available at the view level allows us to denormalise small subsets of the data.
So now we’ve flattened some views and the reports written off them are chugging along running reasonably quickly. What could be better than that? Wait a second, how about we improve the performance again by having the view data loaded into memory? Now there’s a good idea!
Yellowfin allows you to set up a connection to its embedded IMDB the same way as you would set up any other data source. The IMDB can then be selected as the target when you’re setting up caching on a view. When Yellowfin sees it’s loading data into memory the work is passed off to the IMDB’s bulk loader, this may be a lot quicker than caching data into a normal DBMS where Yellowfin has to generate and run each INSERT statement individually.
The IMDB goes some way to reducing the issues of space blowout caused by redundant data in flat tables by run length encoding each column. This means that if a column contained the word “Apple” 100 consecutive times the word itself would only be stored once along with the fact that it was reoccurring in the next 99 rows. Strings of a certain length are also stored in a lookup table, so when they appear in the data they can be replaced by their index within the table. This can result in very good compression when long strings occur repeatedly.
The open source Java SQL parser JSqlParser (http://jsqlparser.sourceforge.net/) is used as a front end for the IMDB, so reports based off views which are cached in memory can be run exactly the same way as reports written off any other cached view.
At this point the in memory database has been used by a few different clients, and in several cases the performance improvement has far exceeded even our expectations. If you’re having performance issues and don’t want the headache of rethinking your DB schema then give it a try, it might surprise you too!