When ROLAP Is Not A ROLAP

I get many inquiries on the differences and pros and cons of MOLAP versus ROLAP architectures for analytics and BI. In the old days, the differences between MOLAP, DOLAP, HOLAP, and ROLAP were pretty clear. Today, given the modern scalability requirements, DOLAP has all but disappeared, and the lines between MOLAP, ROLAP, and HOLAP are getting murkier and murkier. Here are some of the reasons:

  • Some RDBMSes (Oracle, DB2, Microsoft) offer built-in OLAP engines, often eliminating a need to have a separate OLAP engine in BI tools.
  • Some of the DW-optimized DBMSes like Teradata, SybaseIQ, and Netezza partially eliminate the need for an OLAP engine with aggregate indexes, columnar architecture, or brute force table scans.
  • MOLAP engines like Microsoft SSAS and Oracle Essbase can do drill-throughs to detailed transactions.
  • Semantic layers like SAP BusinessObjects Universe have some OLAP-like functionality.

For multiple reasons, outlined in OLAP and BI self-service documents including the main one — drill anywhere — I feel that ROLAP is a superior approach to MOLAP. Many vendors support that notion and are rushing to invent ROLAP architecture, or, alas, in many cases “markitechture”. So, here’s my first attempt to identify a few ROLAP features that can clearly differentiate a true ROLAP engine from a fake. True ROLAP has to:

  • Support multipass SQL. This is required to answer analytical questions that cannot be answered in a single pass of SQL.
  • Minimize the amount of data movement between RDBMS and client app by:
    • Storing (temp tables, derived tables) intermediate query results from multipass SQL and joining them in RDBMS.
    • Pushing all calculations to RDBMS.
    • Pushing all filtering and grouping to RDBMS.
    • Pushing all filtering and grouping on calculations to RDBMS.
  • Be aggregate-aware. That means that adding or dropping aggregate tables should have no effect on reports and SQL.
  • Optimize SQL based on RDBMS type.
  • Support heterogeneous joins to join data from multiple RDBMSes in a single report. Again, to minimize data movement, this should be done in RDBMS. Putting in a separate EII/data federation layer/tool does not count.
  • Handle normalized and denormalized data models equally efficiently.
  • Automatically resolve typical start/snowflake query errors and conflicts, such as trying to join multiple fact tables.

Before I publish a document on this subject, I’d like to collect everyone’s feedback on:

  • Did I get it right?
  • Am I missing any other key ROLAP differentiators?
  • What current OLAP engines qualify as ROLAP given these criteria? MicroStrategy and OBIEE immediately come to mind. What about Mondrian? Others?

Comments

Boris, great post and

Boris, great post and interesting topic. I would probably argue that in-memory products, such as Qlikview and Spotfire, as well as columnar file based apps, like Lyzasoft and MS PowerPivot, would be classified as DOLAP. Also, many people still use Cognos Powerplay client, which would be considered DOLAP. So, I'm not sure that you could say that DOLAP has all but disappeared.

With regard to ROLAP, I think you have most of it right. I would only question a few of your points. First, I'm not sure that heterogeneous joins would be considered a requirement for true ROLAP. While it's a nice feature, it usually requires collation at the BI server layer and I haven't seen many vendors pull that off without embedding an EII tool (i.e. Composite with Cognos). Second, I would probably add hierarchy support (i.e. drill down) as a requirement of true ROLAP. Third, I'm not sure that all calculations get pushed to the RDBMS by the vendors we would consider offering ROLAP. Given the advanced analytic functions now offered, I don't believe that they are all being pushed to the database.

Steve

Steve, excellent suggestions,

Steve, excellent suggestions, thank you.

Great summary

Great summary! I would also add the scalability benefits of a ROLAP approach. ROLAP is more scalable when dimensions have very large cardinality. Also, ROLAP eliminates data redundancy as it leaves the data in the RDBMS vs. MOLAP that creates a second storage of data.

And yes, Mondrian is an ROLAP engine. Here in Pentaho we have tightly integrated Mondrian into our Pentaho Analysis Services layer. Pentaho Analyzer, a web-bases ad hoc analysis tool, sits on top of this powerful services layer to give business users an intuitive interface for analyzing data and building charts and interactive graphs.

ROLAP

Boris,

Thanks for the post.

1. I'm not sure why you tie multi-pass SQL to (R)OLAP.

2. I agree with most of Steve's comments. I would say, though, the Qlik themselves would probably disagree with the DOLAP categorization. They are very clearly moving away from this approach. This trend started happening when Microsoft announced plans for PowerPivot. Qlik reacted by giving away their "desktop analytic tool" for free. That is not where they make their money. Also, there is really nothing in common between QlikView's data model and a classic cube.

3. Finally, in response to Farnaz, I personally think that saying ROLAP and Scalability in the same sentence is very strange :)

Keep up the good work.

Elad

Elad1. Some OLAP operations,

Elad

1. Some OLAP operations, like joining two fact tables can only be done with multi-pass SQL. I am not aware of any MOLAP products that do that, but I may be wrong.
2. As you know, I am a big fan of in-memory technology, I just don't equate it with OLAP, since it (in-memory) can do so much more. IBM Cognos TM1 is an exception - it is an in-memory MOLAP.
3. If we take anything other than OLAP options out of the equation/discussion, don't you agree that ROLAP is indeed more scalable than MOLAP?

Cheers

In regards to 1, I'm pretty

In regards to 1, I'm pretty sure that's not true. I'll try to find some references.

As for 3, I do not necessarily agree with that but it has a lot to do with how you define 'scalability'. My experience shows that you have a much better chance of having a solution hold solid for an increased time frame and increased usage patterns with MOLAP than with ROLAP.

Doubt ...

Hi
Boris ..

Good Post ...

I have one doubt .. Consider There's an sql server Database around 8-10 GB of data
with all the relation for tables .. now suppose if we want to implement Data warehousing Concept, which one u prefer to use (ROLAP or MOLAP).. and why ..?

and also what will the MAX size of single cube ... ?

Help me ..