- Forrester Councils
- Councils Overview
- log in
Posted by James Kobielus on June 6, 2011
Enterprises have options. One of the questions I asked the firms I interviewed as Hadoop case studies for my upcoming Forrester report is whether they considered using the tried and true approach of a petabyte-scale enterprise data warehouse (EDW). It’s not a stretch, unless you are a Hadoop bigot and have willfully ignored the commercial platforms that already offer shared-nothing massively parallel processing for in-database advanced analytics and high-performance data management. If you need to brush up, check out my recent Forrester Wave™ for EDW platforms.
Many of the case study companies did in fact consider an EDW like those from Teradata and Oracle. But they chose to build out their Big Data initiatives on Hadoop for many good reasons. Most of those are the same reasons any user adopts any open-source platform: By using Apache Hadoop, they could avoid paying expensive software licenses; give themselves the flexibility to modify source code to meet their evolving needs; and avail themselves of leading-edge innovations coming from the worldwide Hadoop community.
But the basic fact is that Hadoop is not a radically new approach to processing extremely scalable data analytics. You can use a high-end EDW to do most of what you can do with Hadoop with all the core features — including petabyte scale-out, in-database analytics, mixed-workload support, cloud-based deployment, and complex data sources — that characterize most real-world Hadoop deployments. And the open-source Apache Hadoop code base, by its devotees’ own admission, still lacks such critical features as the real-time integration and robust high availability you find in EDWs everywhere.
So: Apart from being an open-source community with broad industry momentum, what is Hadoop good for that you can’t get elsewhere? The answer to that is a mouthful — but a powerful one. Essentially, Hadoop is vendor-agnostic in-database analytics in the cloud, leveraging an open, comprehensive, extensible framework for building complex advanced analytics and data management functions for deployment into cloud computing architectures. At the heart of that framework is MapReduce, which is the only industry framework for developing statistical analysis, predictive modeling, data mining, natural-language processing, sentiment analysis, machine learning, and other advanced analytics. Another linchpin of Hadoop, Pig, is a versatile language for building data integration processing logic.
The bottom line is that Hadoop is the future of the cloud EDW, and its footprint in companies’ core EDW architectures is likely to keep growing throughout this decade. The roles that Hadoop is likely to assume in your EDW strategy are the dominant applications it’s being used for here and now as a petabyte-scalable:
Hadoop is not a religion, any more than traditional EDWs are a religion — although some people seem to think of the latter as such, aligning themselves with this or that EDW architectural school. To promote itself to enterprise prime time, the Hadoop industry needs to focus on what its up-and-coming approach does better than EDWs or does best within the context of a traditional EDW architecture.