Boris Evelson serves Application Development & Delivery Professionals. See the full Analyst bio.
Visit Forrester.com to learn how we make Application Development & Delivery Professionals successful every day.
Follow Boris on Twitter.
Boris Evelson serves Application Development & Delivery Professionals. See the full Analyst bio.
Visit Forrester.com to learn how we make Application Development & Delivery Professionals successful every day.
Follow Boris on Twitter.
Posted by Boris Evelson on March 31, 2010
By Boris Evelson
In-memory analytics are all abuzz for multiple reasons. Speed of querying, reporting and analysis is just one. Flexibility, agility, rapid prototyping is another. While there are many more reasons, not all in-memory approaches are created equal. Let’s look at the 5 options buyers have today:
1. In-memory OLAP. Classic MOLAP cube loaded entirely in memory
Vendors: IBM Cognos TM1, Actuate BIRT
Pros
Cons
2. In-memory ROLAP. ROLAP metadata loaded entirely in memory.
Vendors: MicroStrategy
Pros
Cons
3. In memory inverted index. Index (with data) loaded into memory
Vendors: SAP BusinessObjects (BI Accelerator), Endeca
Pros
Cons
4. In memory associative index. An array/index with every entity/attribute correlated to every other entity/attribute
Vendors: QlikView, TIBCO Spotfire, SAS JMP, Advizor Solutions (also OEMd by Information Builders)
Pros
Cons
5. In memory spreadsheet. Spreadsheet like array loaded entirely into memory
Vendors: Microsoft (PowerPivot)
Pros
Cons
What did I miss and what did I get wrong - comments?
Attend the complimentary Webinar Provide Next Generation Services To Your Customers June 5, 2013, 1:00–2:00 p.m. EST
Attend the complimentary Webinar Strategies For The Mobile Mind Shift June 5, 2013, 1:00–2:00 p.m. UK time
Comments
Another for your list
Hi Boris
you mentioned that your list was complete
what about Yellowfin
www.yellowfin.bi
Comment
Yellowfin, are you serious, that is not even a real BI solution
some
Very good article that helps to clasify the technologies. I would say that inverted indices have proved to manage large volumes of data, because they allow compression (same for associative index) and parallelisation (see search technologies
And there are already live examples of big volumes of data with BW accelerator
I also feel that inverted indexes or associative enable new user experiences similar to search such as the one you get in qlikview, Spotfire or Polestar
Last, I fully agree when you say that associative technolgies is easier to model, which is very important for self service BI
In-memory analytics
Hi Boris
I have always found the analytics in BI tools to be very thin. There is very little ability to inspect causality or consequence. There is no doubt that drill down and drill through are powerful, but they don't really get to causality and consequence. They are much earlier in the discovery process, perhaps at the "symptom" level.
Are you aware of any process of industry specific tools that can do causality and/or consequence analysis? Perhaps it is just ignorance on my part.
Regards
Trevor
Trevor, there are two ways to
Trevor, there are two ways to look at it: probabilistic and deterministic causality. Probabilistic causality is the realm of statistical analysis, data mining, predictive modeling, etc tools which can find a relationship (or not) between one set of entities and attributes and another set (SAS, IBM SPSS, etc). Deterministic causality - is where among thousands or millions of entities, attributes or occurrences only one is related to another – is a much tougher challenge. The real problem here is that in order to find that one occurrence you have to have a data model that allows you to examine the relationship between these two elements or occurrences. But since you-don't-know-what-you-don't know that is always a problem. Therefore we call traditional analysis techniques that require data models “pre-discovery” vs. analysis that does not require any data modeling “post-discovery” (search for my blog with these keywords). Traditional relational data bases don’t do this well at all. Typical approaches that are used for this (still with limited success, but you can get better results than you’d get from relational models) are in memory arrays/indexes (QlikTech, TIBCO Spotfire, SAP BI Accelerator, Microsoft PowerPivot), inverted index DBMS (Endeca, Attivio), and associative DBMS (Saffron Tech, illuminate).
Hi Travor, I would recommend
Hi Travor,
I would recommend checking System Dynamics modelling tools for deterministic cause&effect modelling techniques. Stella and Vensim are the typical tools used with such modelling methodologies. Obviously BI tools are far behind in such advanced analytical modelling approaches.
Regards,
Mucahit
In-Moemory Analytics
Thanks for this Boris. Being an engineer who ended up studying OR and applied statistics at the graduate level, I get the difference between probabalistic and deterministic causality. Here's I blog I wrote along these lines:
http://blog.kinaxis.com/2009/12/i-am-adamant-that-an-accurate-forecast-d...
There is no doubt that probabalistic causality is the right way to go when a deterministic model is difficult to construct.
Working in supply chain management I find that "pre-discovery" is crucial for effective performance and risk management. Not only that, but predictive analytics are absolutely paramount, not just the analysis of history. In other words that a supplier has had a 75% on-time delivery over the past 3 months may be very useful to a contracts manager, but is of very little value to a production manager trying to satisfy customer demand. What the production manager needs to know is that a particular delivery from that supplier is late, but even that is of limited value. What the organization really needs is to be able to predict which customer orders/forecast will be delayed as a consequence and and the intermediate stes of production and distribution that need to be altered as a consequence. Not only that, the customer will want a revised promise date. This is the causality and consequence that I believe adds real value to an organization.
Don't get me wrong, I think there is tremendous value added by "post-discovery" analytics too, especially in determinign markets shift and buying patterns.
Regards
Trevor
What is and what should be
Trevor,
I appreciate your insight on the applicability of getting to the predictive side of things, anticipating the changes in the business processes given the anticipated blip in expectations so we can be value-add customer (and supplier) focused to forewarn of changes in original expectations made. This IS where we need to continue to drive the value-add of insights gained from effectively designed predictive models.
The historical-centric models of-course are a foundational basis from which prospective predictive models can be validated. Naturally the continual review of historical trends are important for "learning from history" and therefore need to continue to be considered significant (but not sufficient).
Refreshing times
What about refreshing times?
I mentioned IMA in power Pivot is very very slow.
It has to rebuild "the array" (isn't it the hidden OLAP?) every time it is refreshed.
Result is:
30 s for ODBC + OLAP
110 s for Power Pivot
Have You got the times (comparison) for the another applications?
http://afin.net/webcasts/Demo_RefreshData_ADO&AfinNet_vs_Excel2010&Power...
(real time - long)
Good point. Watch out for a
Good point. Watch out for a new blog I am working on with 5-6 criteria to evaluate when comparing in-memory analytics vendors like QlikTech, Spotfire and PowerPivot.
one item I'd add for the SAP
one item I'd add for the SAP BI Accelerator is that you still need to do scripting to get the data loaded into the in-memory structures from your source.
Boris, This is an interesting
Boris,
This is an interesting summary. While you are at this nice simple rubric, I would like you to also add: Cloud9 Analytics, PivotLink, Tableau, SSAS+Vertipaq, Oracle Exadata, Oracle OLAP 11g, Oracle Essbase, Vertica, Greenplum, HP Neoview, IBM DB2/DW on Power7+ Platform, Netezza and Birst into this mix as well. Almost all of them have in-memory caches for query acceleration. Some of them are multidimensional data stores and some of them are Relational with SQL extensions.
Lastly, it will be nice if there is a simple decision tree that can be derived from these various Pros & Cons. I realize that Forrester would not like to take sides, but a nice well written decision tree may be valuable to your readers.
Madhukar, I don't consider
Madhukar, I don't consider data caching directly related to the new trend of in-memory analytics. What's different is data caching basically moves the same data structures from disk to memory, but access, analysis methods are still designed as if data is on a disk. New in-memory techniques completely redesigned the way data is stored, managed and analyzed. Also, take a look at my latest blog - just posted yeterday - on more decision criteria.
Using Solid State Storage for in-memory
Boris,
Are any vendors using flash boards to support the caching of data from disk? This would allow more data to be analyzed economically.
Yes, some, like Teradata, but
Yes, some, like Teradata, but I do not equate that to in-memory analytics. Caching just moves data from disk to memory, but still uses memory the same way as disk in terms of data structures, reads, writes, updates, etc. So it's not as efficient or as "next gen" as true in-memory analytical databases. Check with my colleague, Jim Kobielus, who covers DW on which DW vendors are supporting flash.
difference of Caching vs in-memory
I'd be interested to understand the "secret sauce" that fundamentally distinquishes a memory caching solution vs an in-memory database. As a Data Architect practitioner, really feels like the same thing.
Mike, caching disk based DBMS
Mike, caching disk based DBMS data in memory still requires manipulating that data using methods optimized for disk access, and not necesserily the most optimal methods for when all the data is in memory all the time. The front or middle tier program doesn't know whether the data it's accessing is on disk or in memory. Programs for in memory databases are specifically written for optimized in memory data manipulation. Actually, in some cases a program that manipulates the data and the database itself is one and the same (this is true for QlikView, TIBCO Spotfire, AdvizorSolutions)