Posted by James Kobielus on April 8, 2008
If you're an analyst, one of the nice things about having a blog is that you can provide out-of-band commentary on, or elaboration of, points raised in your formally published reports. That way, you don't need to clutter up the body or endnotes of the published reports with digressive — albeit important — discussions.
This present post elaborates on the discussion of enterprise data warehouses (EDW) in my latest research report: "Appliance Power: Crunching Data Warehousing Workloads Faster And Cheaper Than Ever." As I was writing this document, it occurred to me that a formal, nuanced definition of EDW was important — but not within the proper scope of that particular report.
In that report, I implicitly defined EDW not in its own right, but in contrast to the notion of a DW appliance. In other words, I implicitly contrasted a "custom-built" DW (i.e., EDW) with a pre-configured/pre-optimized DW (i.e., appliance-based DW). In the context of that particular report, that EDW definition worked well, helping me clearly spell out the value-added that appliance-based packaging brings to the DW arena.
But there's a risk here — that the reader will think I'm arguing that EDWs and DW appliances are two distinct, mutually exclusive species that can never overlap. Far from it. Fundamentally, I regard DW appliances as an alternate deployment model for building EDWs, subject-specific data marts, operational data stores (ODSs), real-time DWs, or any other server-side analytic repository. I call your attention to the following key excerpt:
"Appliances are rapidly scaling and maturing into enterprise-grade DW platforms. Though they still primarily address tactical data mart requirements, DW appliances from established pure plays such as DATAllegro, Dataupia, Greenplum, and Netezza are rapidly surmounting the scalability curve. Increasingly, enterprises are offloading more and more of their analytics processing to robust DW appliances, paving the way for these solutions to someday address high-end enterprise DW (EDW) requirements." Notice the qualifier "someday" in that last sentence. If you read through the entire document, you'll notice the following critical observations in "What It Means":
"Given the immaturity of the DW appliance market, the burden of proof is still on vendors to show that these pre-integrated offerings can scale in capacity and broaden in scope to support users' most demanding ETL, OLAP, BI, and other requirements" and "No DW appliance pure play can offer a commercial platform as scalable — in database capacity, mixed-workload support, concurrent usage, and fast data loading — or as flexible in deployment, optimization, and administration features as DW market leader Teradata, which has long had an appliance-like value proposition though it has never chosen to set sail under that marketing banner." All of this demands that I define EDW more crisply. Here now is that formal EDW definition, which forms the conceptual backbone of my evolving coverage of DW solutions and best practices:
EDW refers to the infrastructure, tooling, and deployment flexibility that enable a DW platform to support the evolving business analytic requirements of large, complex, dynamic enterprises for performance scaling, multi-domain and multi-entity scoping, high availability, user-driven extensibility, application and middleware integration, and life-cycle data governance. That said, it's clear that some DW appliance vendors — most notably IBM — already provide EDW-grade solution portfolios. Likewise, Teradata, though it keeps its distance from the "appliance" label, also clearly fits that description. So does Oracle, with its growing range of partner-sold "Oracle Optimized Warehouse" appliances. HP, with Neoview, also fits that description. The DW appliance pure-plays — e.g., Netezza, Greenplum, DATAllegro, Dataupia — are rapidly evolving in that direction, and I certainly won't draw a line and say that their solutions won't attain "EDW-grade" status in the next 1-2 years. I think they will.
But the high bar of "EDW-grade" continues to evolve, and it remains to be seen which of the current DW appliance vendors — big and small, declared and reluctant — will keep up with the times. The concept of an EDW, as an intermediary persistence layer in the business-analytics fabric, is evolving to encompass several growing enterprise requirements:
- Real-time DW (the subject of the current report I'm working on for Q2 delivery)
- Federation of DWs (the subject of a report for Q3 delivery)
- Convergence of structured and unstructured data in the DW (the subject of a report for delivery in Q4)
Fundamentally, then, an EDW is a data-services platform that is extensible and sturdy enough to run the business on and feed all your BI and analytic applications from. If an appliance-based solution delivers those benefits, then it can be considered EDW-grade.