Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data. Forrester believes that Hadoop will become must-have infrastructure for large enterprises. If you have lots of data, there is a sweet spot for Hadoop in your organization. Here are five reasons firms should adopt Hadoop today:
Build a data lake with the Hadoop file system (HDFS). Firms leave potentially valuable data on the cutting-room floor. A core component of Hadoop is its distributed file system, which can store huge files and many files to scale linearly across three, 10, or 1,000 commodity nodes. Firms can use Hadoop data lakes to break down data silos across the enterprise and commingle data from CRM, ERP, clickstreams, system logs, mobile GPS, and just about any other structured or unstructured data that might contain previously undiscovered insights. Why limit yourself to wading in multiple kiddie pools when you can dive for treasure chests at the bottom of the data lake?
Enjoy cheap, quick processing with MapReduce. You’ve poured all of your data into the lake — now you have to process it. Hadoop MapReduce is a distributed data processing framework that brings the processing to the data in a highly parallel fashion to process and analyze data. Instead of serially reading data from files, MapReduce pushes the processing out to the individual Hadoop nodes where the data resides. The result: Large amounts of data can be processed in parallel in minutes or hours rather than in days. Now you know why Hadoop’s origins stem from monstrous data processing use cases at Google and Yahoo.
Big data gurus have said that data quality isn’t important for big data. Good enough is good enough. However, business stakeholders still complain about poor data quality. In fact, when Forrester surveyed customer intelligence professionals, the ability to integrate data and manage data quality are the top two factors holding customer intelligence back.
So, do big data gurus have it wrong? Sort of . . .
I had the chance to attend and present at a marketing event put on by MITX last week in Boston that focused on data science for marketing and customer experience. I recommend all data and big data professionals do this. Here is why. How marketers and agencies talk about big data and data science is different than how IT talks about it. This isn’t just a language barrier, it’s a philosophy barrier. Let’s look at this closer:
Data is totals. When IT talks about data, it’s talking of the physical elements stored in systems. When marketing talks about data, it’s referring to the totals and calculation outputs from analysis.
Quality is completeness. At the MITX event, Panera Bread was asked, how do they understand customers that pay cash? This lack of data didn’t hinder analysis. Panera looked at customers in their loyalty program and promotions that paid cash to make assumptions about this segment and their behavior. Analytics was the data quality tool that completed the customer picture.
Data rules are algorithms. When rules are applied to data, these are more aligned to segmentation and status that would be input into personalized customer interaction. Data rules are not about transformation to marketers.