- Forrester Councils
- Councils Overview
- log in
Posted by James Kobielus on March 19, 2010
What’s the perfect database management system (DBMS) architecture, where analytics is concerned? It seems as if everybody in business intelligence (BI), data warehousing (DW), and related areas has their own opinion on this topic. In fact, it’s more than just opinions. In the database wars, we have huge communities of vendors and users with substantial investments in one or more approaches—from traditional relational DBMSs to column-oriented, in-memory, dimensional, inverted indexing, and other approaches.
In the analytic database wars, new architectures are springing up everywhere, each with its own devotees and differentiators, and each with a go-to-market message that takes potshots at established approaches. The “No SQL” movement is just the latest coalition to emerge in this titanic struggle, and is essentially a loose coalition of diverse approaches rallying around a common theme: that traditional RDBMSs and their SQL-based access languages are unfit, or at least, maladapted to the new world of cloud, social networking, and Web-oriented analytics applications.
DW industry analysts such as myself are of course embedded in these wars—we’re the neutral observers often caught in the crossfire. Of course, this is not a new battle. The columnar database industry has been around for many years and positioned itself as the chief high-performance, low-footprint analytics contender to traditional row-based RDBMSs. And columnar has steadily encroached on relational’s turf, not only through established columnar-based DW vendors such as Sybase, Vertica, and ParAccel, but also through recent adoption (albeit in limited fashion) by the likes of Oracle, SAP, and others into their analytics architectures.
Query performance is king in the analytics wars, and columnar has important advantages, especially when used to support structured, repeatable scan and queries against very large aggregated tables. Query speed advantages have also driven adoption of dimensional (i.e., denormalized relational) databases for online analytical processing (OLAP). It’s also what’s driving the push to in-memory cache-based data persistence and solid-state drives, which ramp up the input-output operations per second by an order of magnitude.
So, the million-dollar question (actually, the stakes are far higher) is: Will columnar, or any of the other contenders, eventually drive relational/row-based databases to extinction in the realm of BI, DW, and advanced analytics? Or, if the outlook isn’t that extreme, will columnar and other approaches doom relational databases to a greatly reduced role in analytics. My educated hunch on the first question is “no,” but a full-throated “yes” on the second.
Relational databases will be with us for a long time to come.No one’s expecting non-relational DBMSs to gain any significant traction in online transaction processing (OLTP), where it serves as the basis for most mission-critical business applications. Even if there were some critical flaw to relational where transaction processing is concerned, year upon year of massive corporate investment in the technology, tools, and applications would argue against any hasty migration. Where analytics is concerned, let’s not forget that many of the business world’s reports and queries are against these transaction-oriented relational stores. Most of the world’s business data is stored in row-based repositories and most applications have been written to access and manipulate this data through SQL and other standard approaches.
But even in the analytics-optimized world of stand-alone BI and DW solutions, columnar won’t completely push relational databases over the parapet and into the moat. Let’s not forget that most of the world’s deployed DWs are built on relational technology, and vendors such as Teradata, Oracle, IBM, Microsoft, Netezza, Greenplum, Aster Data, and others continue to scale up their platforms quite well through parallel processing, indexes, partitioning, materialized views, query predicate pushdown, hardware acceleration, and other approaches. Nevertheless, we are already seeing traditional RDBMS vendors evolve into hybrid architectures that add columnar, in-memory, and other approaches. Oracle’s recent addition of hybrid columnar compression to its Exadata DW architecture is a noteworthy example, and SAP’s adoption of in-memory columnar in its Business Warehouse Accelerator.
Let’s not mince words though: relational’s deployment roles aren’t growing. The familiarity, maturity, and standardization of relational technology are its strengths, but also its chief weaknesses, specifically as related to DW and BI. In a world where unstructured and semi-structured information is growing in importance, relational databases seem as if they are pushing their limits. Even in their traditional strongholds—such as DW and BI—relational databases are increasingly becoming marginalized to specific roles, such as master data management hubs. All the while, newer, more agile species—such as dimensional, columnar, inverted index, in-memory, and even flat file databases—slowly but surely come to the surface.
What explains this somewhat gloomy outlook are the following traditional weaknesses of relational databases:
It’s clear that analytics-optimized databases are the wave of the future, and that the role of transactions-optimized databases for BI will be circumscribed more narrowly as time wears on. So what of the challengers to relational? How are they likely to fare in this new order of information management? Here are some Forrester quick takes on their respective roles:
There’s a contrarian view too. None of these non-traditional databases are optimal for all roles in a traditional EDW ecosystem, such as handling mixed transactional/analytic workloads. So while Forrester sees a steady trickling of these new technologies in departmental and specific use-case scenarios, these are mostly installed and used in coexistence with, not as outright replacements for, your EDW.
Clearly, there are ongoing roles for relational databases, as well as for their denormalized cousins—star and snowflake OLAP cubes--but they must be justified for each new application. Likewise, you should do the same—tailor the deployment to your specific requirements—for the non-traditional databases we’ve discussed in this post. Keep in mind that no one data storage, persistence, or structuring approach is optimal for all deployment roles and workloads.
For example, no matter how well-designed the dimensional data model is within an OLAP environment, users eventually outgrow these constraints and demand more flexible decision support. By requiring that relational data be denormalized and prejoined into star schemas and other fixed, subject-specific structures, traditional multidimensional OLAP denies users the flexibility to drill down, up, and across data sets in ways that were not designed into the underlying cubes.
The leading non-traditional database approaches--such as columnar, in-memory, and inverted indexing—may be more appropriate for the new generation of advanced analytics. However, just like their relational grand-daddy, no one of these upstarts is a one-size-fits-all solution. None are inherently optimized for all analytic applications, or transactions for that matter.
In the big sprawling empire of your data management environment, each of these fiefdoms has its proper sphere. Columnar, in-memory, and dimensional make great sense in the front-end, as the enabler for fast query processing in data marts. Row-based relational is often optimal for multi-subject hub-and-spoke EDWs with complex, conformed hierarchies. Inverted indexing and file-based storage are ideal for storing, processing, and manipulating the complex information sets of the Web 2.0 world.
What do you think? We at Forrester would love to hear your thoughts as we push inexorably into this new frontier.