The Forrester Wave™: Enterprise Data Warehousing (EDW) Platforms Q1 2009: The Key Takeaway

Jameskobielus By James Kobielus

Today we published the first Forrester Wave™ specifically focused on Enterprise Data Warehousing (EDW) Platforms. The final published report is now available on Forrester’s website to clients. Information and knowledge management (I&KM) professionals will find it a timely and actionable study of the leading EDW platform vendors: Teradata, Oracle, IBM, Microsoft, SAP, Sybase, and Netezza. I urge you to download and read it, and then engage me, the author-analyst, in inquiries and advisories to help you apply it to your EDW initiatives.

The key takeaway from this Wave is that scalability, flexibility, and affordability are the dominant requirements in today’s budget-stressed EDW platforms market. I&KM professionals are under the gun, trying to keep EDW and business intelligence (BI) costs under tight control while preserving the flexibility to grow and repurpose these investments to support an ever-changing array of decision-support requirements. Hence, an EDW platform--to score well in the Wave--should address the following high-bar requirements:

  • Extremely scalable: The EDW platform should be scalable to support petabytes of usable data; thousand-plus distributed compute/storage nodes; tens of thousands of concurrent users and queries; many terabytes of daily or continuous data loads; and expanding mixed workloads of reporting, query, OLAP, in-database analytics, real-time analytics, ETL, data cleansing, and other transactions. It should support this extreme scalability through scale-out, shared-nothing MPP, optimized appliances, optimized storage, dynamic query optimization, and mixed workload management technologies.
  • Extremely flexible: The EDW platform should be flexible to support diverse applications, including business intelligence, online analytical processing, data mining, predictive analytics, text      analytics, closed-loop business process management, and complex event processing; and various deployment roles, including multi-domain data hubs, subject-specific data marts; operational data stores, master data management hubs, staging nodes, analytic data marts, multi-temperature      hierarchical storage management and archiving, and source and/or target repository in data federation environments. It should support this extreme flexibility by being fluid, adaptive, and virtualized; enabling data to be transparently persisted, in diverse physical and logical formats, to an abstract, seamless grid of interconnected memory and disk resources; and delivered with subsecond delay to consuming applications; and ensuring application service levels through an end-to-end, policy-driven, latency-agile, distributed-caching and dynamic query-optimization memory grid.
  • Extremely affordable: The EDW platform should be affordable for all customer segments and use cases. It should support this extreme affordability through flexible packaging/pricing, including      licensed software, modular appliances, and “pay as you go” subscription-based SaaS/cloud offerings.

EDW platforms vendors that can’t address these key requirements--now or in their enhancement road maps over the coming 2-3 years--will not survive in this very competitive arena.

As noted above and in my blog post last week, scalability, performance, and optimization are perhaps the most important criteria in today’s EDW market. And, of course, they are quite difficult to nail down into a single yardstick that does justice to different vendors’ approaches. Nevertheless, I believe this Wave accomplishes that. I have boiled down “scalability, performance, and optimization” (SPO) into a single criterion that defines five profiles (from 5= most scalable to 1 = least scalable), focusing on the degree of parallelism in the underlying architecture.

For each of the vendors in this Wave, I got a deep dive on their SPO architecture, but I didn’t stop there. I asked each vendor for reference customers, and conducted a structured interview with each. I asked each for a list and description of their largest production customer deployments. And I asked each for published benchmarks, plus all the supporting info on how the test environments, scenarios, and criteria. In other words, I applied the standard Forrester Wave methodology.

Essentially, the customer deployment and benchmark data corroborated whether a vendor in fact earned the particular SPO score associated with their architectural approach. Clearly, there were plenty of gray areas. Also, quite clearly, vendors had plenty of comments on the definitions of the SPO scales, and on where they fell on this spectrum. And, of course, many pointed out that being scored, say, a “2” rather than a “4” or “5” didn’t necessarily mean they were slower, less efficient, or incapable of processing various EDW and BI workloads. It also didn’t mean that they couldn’t, in practice and in customer deployments, push the scalability and speed envelope that one would associate with their architecture. Architecture isn’t destiny, but it definitely sets SPO constraints, which is the whole point of the scoring on this criterion in this Wave.

All the vendor feedback was excellent and helped me tweak and tune the scale to fit the EDW market’s current and emerging state of the art. With that said, here are the final SPO scales in this Wave:

  • 5 = scale out through shared-nothing massively parallel processing (MPP), up to 100-1000+ storage/compute nodes in single-tier grid of compute/storage nodes, and well beyond 1000s of terabytes (TBs) of online, usable production data across distributed deployment
  • 4 = scale out in the storage tier to 100-1000+ nodes and/or up to around 1000 TBs of online, usable production data, but lacking support for single-tier-grid shared-nothing MPP and/or lacking the ability to scale out to 100-1000+ nodes in the compute tier
  • 3 = scale-out through shared-nothing MPP and/or clustering, up to 2-100 storage and/or compute nodes and up to 100s of TBs of online, usable production data across distributed deployment
  • 2 = scale-up through symmetric multiprocessing (SMP), and up to 10s of TBs of online, usable production data, and scale-out in a clustered deployment of 2-99 compute nodes
  • 1 = scale-up through SMP and up to 10s of TBs of online, usable production data on a single-node deployment

To see how the vendors ranked, you’ll need to read the Wave. Or engage me in an inquiry or advisory. Or, preferably, both.

Comments

re: The Forrester Wave™: Enterprise Data Warehousing (EDW) Plat

what is the reason you do not even mention HP Neoview???

re: The Forrester Wave™: Enterprise Data Warehousing (EDW) Plat

Because HP Neoview did not qualify for inclusion in the Wave.

re: The Forrester Wave™: Enterprise Data Warehousing (EDW) Plat

aha, and can I know which is the qualifying criteria???

re: The Forrester Wave™: Enterprise Data Warehousing (EDW) Plat

Luis:Sure. Here were the inclusion criteria for the EDW Platforms Wave Q1 09, which were sent to a broad range of EDW vendors this past summer (including HP):**********************1. Report at least $30 million (US dollars) in DW-specific revenues in the latest fiscal or calendar year, where at least 80% of revenues are from DW solutions, not professional services.2. Offer at least the following functional components, tools, or features in at least one DW solution that is generally available by the time of Forrester’s hands-on evaluation:a. Storage engine for persisting and managing structured datab. Database management system that has been designed or tuned for processing query, reporting, and other analytical workloadsc. Query acceleration, optimization, or tuningd. Loading acceleration, optimization, or tuning3. Show that these commercially components, tools, or features:a. Support query, reporting, and analytics against relational or multidimensional data directly by Structured Query Language (SQL), Multidimensional Expressions (MDX), and/or XML Query Language (XQuery).b. Constitute generic DW offerings that are not technologically or functionally tied or limited to particular functional or horizontal applications, such as ERP or CRM, or to a particular business intelligence (BI), business performance solution (BPS), predictive analytics, extract transform load (ETL), or middleware stackc. Are being marketed, sold, and implemented as integral features of a self-sufficient, general-purpose DW environment or platform that does not need to be embedded in other applications4. Substantiate at least 100 in-production customers that:a. Span more than one major geographical region (Americas, Europe/Middle East/Africa, and Asia-Pacific)b. Represent five or more industry verticalsc. Collectively have more than 10 percent of installations with more than 100 users (developers, power users, and/or end users) eachd. Collectively have more than 10 percent of installations that serve applications that cross business group/departmental boundaries5.Sufficient interest from Forrester Information and Knowledge Management (I&KM) clients, with at least 10 percent of DW-related customer inquiries and/or advisory/consulting projects mentioning, addressing, or concerning the vendor’s solutions**********************HP did not meet criteria 4 & 5, nor, with Neoview, did it meet criteria 1. Nor did such other EDW platforms vendors as Greenplum, Vertica, Aster, Dataupia, Kickfire, Kalido, Kognitio, ParAccel, SAND, and 1010data.Nevetheless, I think HP and most of these other EDW platform vendors have their respective strengths and differentiators. I'm certainly continuing to follow them all closely, and will no doubt discuss them in upcoming Forrester studies.In the EDW Platforms Wave, I only mention the vendors that met the inclusion criteria and agreed to be evaluated in the Wave. It's not designed or intended to be a broad market survey.

re: The Forrester Wave™: Enterprise Data Warehousing (EDW) Plat

By the way, those inclusion criteria were published in the Wave doc. At Forrester, we maintain total transparency on our inclusion and evaluation criteria. In the Wave, we do not specifically call out why this or that vendor failed to make the cut.

re: The Forrester Wave™: Enterprise Data Warehousing (EDW) Plat

James, really appreciate your comments and explanations

re: The Forrester Wave™: Enterprise Data Warehousing (EDW) Plat

Luis:Sure. No problem. Initially, I wasn't sure how much detail you wanted. I've been faulted for being long-winded, but when I withhold details in the interest of brevity, I often find that I shouldn't. Somebody somewhere might find those details important.Jim