OLAP's Cube Is Crumbling Around The Edges

JameskobielusBy James Kobielus

BI is essentially a set of best practices for building models to answer business questions. However, today's BI best practices may be suboptimal for many enterprises' decision-support requirements.

For most users, BI is a journey that's been modeled and mapped out in advance by others, following a well-marked path through vast data sets. Data models, which must often be pre-built by specialists, generate or shape the design of such key BI artifacts as queries, reports, and dashboards. Essentially, every BI application is some data modeler's prediction of the types of questions that users will want to ask of the underlying data marts. Sometimes, those predictions are little more than an educated guess -- and are not always on the mark.

BI's most ubiquitous data-modeling approach is the online analytical processing (OLAP) data structure known as a "cube." The OLAP cube -- essentially a denormalized relational database -- sits at the heart of most BI data marts. OLAP cubes, usually implemented as multidimensional "star" or "snowflake" schemas, allow large recordsets to be quickly and efficiently summarized, sorted, queried, and analyzed. However, no matter how well designed the dimensional data models within any particular cube, users eventually outgrow these constraints and demand the ability to drill down, up, and across tabular recordsets in ways not built into the underlying data structures.

The chief disadvantage of multidimensional OLAP cubes is their inflexibility. Cubes are built by pre-joining relational data tables into fixed, subject-specific structures. One way of getting around these constraints is the approach known as relational OLAP, which retains the underlying normalized relational storage approach while speeding multidimensional query access through "projections." However, relational OLAP also suffers from the need for explicit, upfront modeling of relationships within and among the underlying tabular data structures.

From the average end user's point of view, all of this is mere plumbing -- invisible and boring -- until it prevents them from obtaining the new query tools, structured reports, and dashboards needed to do their jobs. One unfortunate consequence of OLAP cubes' inflexibility is that requests for new BI applications inevitably wind up in a backlog of IT projects that can take weeks or months to deliver. What might seem a trivial thing to the end user -- such as adding a new field or new calculation to an existing report -- might represent a time-consuming technical exercise for the data modeling professional. Behind the scenes, this simple decision-support request might, beyond the front-end BI tweaks, also require remodeling of the data mart's OLAP star schema, re-indexing of the data warehouse, revision of extract transform load (ETL) scripts, and retrieval of data from different transactional applications.

No one expects the OLAP cube to vanish completely from the BI landscape, but its role in many decision-support environments has been declining over the past several years. Increasingly, vendors are emphasizing new approaches that, when examined in a broader context, appear to be loosening OLAP's lockhold on mainstream BI and data warehousing. The emerging paradigm for ad-hoc, flexible, multi-dimensional, user-driven decision support includes the following important approaches:

  • Automated discovery and normalization of dispersed, heterogeneous data sets through a pervasive metadata layer
  • Semantic virtualization middleware, which supports on-demand, logically integrated viewing and query of data from heterogeneous, distributed data sources without need for a data warehouse or any other centralized persistence node
  • On-the-fly report, query, and dashboard creation, which relies on dynamic aggregation of data, organization of that data within relevant hierarchies, and presentation of metrics that have been customized to the user or session context
  • Interactive data visualization tools, which enable user-driven exploration of the full native dimensionality of heterogeneous data sets, thereby eliminating the need for manual modeling and transformation of data to a common schema
  • Guided analytics tools, which support user-driven, ad-hoc creation of sharable, extensible models containing data, visualization, and navigation models for customizable decision-support scenarios
  • Inverted indexing storage engines, which support more flexible, on-the-fly assembly of structured data in response to ad-hoc queries than is possible with traditional row-based or column-based data warehousing persistence layers
  • Distributed in-memory processing, which enables continuous delivery of intelligence being extracted in real-time from millions of rows of data that originates in myriad, distributed data sources

Unfortunately, this new decision-support paradigm has no pithy name or coherent best practices. If we were call it the "post-OLAP" paradigm, that would give the false impression that OLAP cubes are obsolete, when in fact they are simply being virtualized and embedded within a more flexible Web 2.0 and SOA framework. We could call this the new “hypercube” paradigm, but that might give the mathematical purists among us a case of indigestion. Boris Evelson's "BI workspaces" is a good, non-geeky name for this new wave.

Whatever we choose to call this new era, look around you. It has already arrived. We can see this trend in the growing adoption of all of these constituent approaches in production BI environments everywhere. However, to date, few enterprises have combined these post-OLAP approaches in a coherent BI architectural framework.

But that day is rapidly coming to mainstream BI and data warehousing environments everywhere. OLAP's hard-and-fast, cube-based approach is slowly but surely dissolving in this new era of more flexible, user-centric decision support.


re: OLAP's Cube Is Crumbling Around The Edges

I'd love to see some examples of what you mean. I agree with some of your points about classical OLAP cubes, however, I don't understand exactly what you suggest replacing them with.

re: OLAP's Cube Is Crumbling Around The Edges

I agree with the comment from Massimo. I would like to see some examples. Are there vendors in this space? Provide a couple sample oranizations that have been successful with this approach. I am a software product manager responsible for our reporting and datawarehouse strategies, so this is very on topic for me and I am interested mostly in the metadata layer approach.Thanks

re: OLAP's Cube Is Crumbling Around The Edges

Massimo and Karl:This is all very much a work in progress, industry-wide. See my later post on database virtualization. In particular:"One of the chief trends driving database virtualization is users’ need for more robust middleware fabric in support of real-time BI. In order to ensure guaranteed subsecond latency for BI, the infrastructure must incorporate a policy-driven, latency-agile, distributed-caching memory grid from end-to-end. However, the convergence of real-time business-intelligence approaches onto a unified, in-memory, distributed-caching infrastructure may take more than a decade to come to fruition because of the immaturity of the technology; lack of multivendor standards; and spotty, fragmented implementation of its enabling technologies among today's BI/DW vendors.Nevertheless, all signs point to this trend’s inevitability--most notably, Microsoft’s recent announcement that it is developing its own information fabric platform, codenamed “Project Velocity,” to beef up its real-time analytic and transactional computing capabilities. Bear in mind that no BI/DW vendor has clearly spelled out its approach for supporting the full range of physical and logical data-persistence models across its real-time information fabrics. But it's quite clear that the industry is moving toward a new paradigm wherein the optimal data-persistence model will be provisioned automatically to each node based on its deployment role (including EDW, ODS, staging, data mart)-- and in which data will be written to whatever blend of virtualized memory and disk best suits applications' real-time requirements.It would be ridiculous to imagine this evolution will take place overnight. Even if solution vendors suddenly converged on a common information-fabric framework -- which is highly doubtful -- I&KM managers have too much invested in their enterprises’ current data environments to justify migrating them to a virtualized architecture overnight."I'm co-authoring an upcoming Forrester report with Boris Evelson that spells out our perspectives on this in greater detail. For here-and-now harbingers of real-world post-OLAP platforms, check out:--Approach: OLAP against in-memory datasets with static dimensionality: This involves BI where there is interactive visualization, manipulation, and (optionally) cube write-back are required on data in static multidimensional models. Who has this? IBM Cognos TM1.--OLAP against in-memory data sets with fluid dimensionality with non-cube data: This involves BI where interactive visualization, manipulation, and (optionally) write-back required in fluidly multidimensional data models that do not conform with OLAP cubing; i.e., every element can be instantaneously used as a fact or a dimension. Who has it now? QlikTech and TIBCO Spotfire.Wait for the doc, coming in the next month or so, for more details.Jim