In the past three decades, management information systems, data integration, data warehouses (DWs), BI, and other relevant technologies and processes only scratched the surface of turning data into useful information and actionable insights:
Organizations leverage less than half of their structured data for insights. The latest Forrester data and analytics survey finds that organizations use on average only 40% of their structured data for strategic decision-making.
Unstructured data remains largely untapped. Organizations are even less mature in their use of unstructured data. They tap only about a third of their unstructured data sources (28% of semistructured and 31% of unstructured) for strategic decision-making. And these percentages don’t include more recent components of a 360-degree view of the customer, such as voice of the customer (VoC), social media, and the Internet of Things.
BI architectures continue to become more complex. The intricacies of earlier-generation and many current business intelligence (BI) architectural stacks, which usually require the integration of dozens of components from different vendors, are just one reason it takes so long and costs so much to deliver a single version of the truth with a seamlessly integrated, centralized enterprise BI environment.
Existing BI architectures are not flexible enough. Most organizations take too long to get to the ultimate goal of a centralized BI environment, and by the time they think they are done, there are new data sources, new regulations, and new customer needs, which all require more changes to the BI environment.
The explosion of data and fast-changing customer needs have led many companies to a realization: They must constantly improve their capabilities, competencies, and culture in order to turn data into business value. But how do Business Intelligence (BI) professionals know whether they must modernize their platforms or whether their main challenges are mostly about culture, people, and processes?
"Our BI environment is only used for reporting — we need big data for analytics."
"Our data warehouse takes very long to build and update — we were told we can replace it with Hadoop."
These are just some of the conversations that Forrester clients initiate, believing they require a big data solution. But after a few probing questions, companies realize that they may need to upgrade their outdated BI platform, switch to a different database architecture, add extra nodes to their data warehouse (DW) servers, improve their data quality and data governance processes, or other commonsense solutions to their challenges, where new big data technologies may be one of the options, but not the only one, and sometimes not the best. Rather than incorrectly assuming that big data is the panacea for all issues associated with poorly architected and deployed BI environments, BI pros should follow the guidelines in the Forrester recent report to decide whether their BI environment needs a healthy dose of upgrades and process improvements or whether it requires different big data technologies. Here are some of the findings and recommendations from the full research report:
The business has an insatiable appetite for data and insights. Even in the age of big data, the number one issue of business stakeholders and analysts is getting access to the data. If access is achieved, the next step is "wrangling" the data into a usable data set for analysis. The term "wrangling" itself creates a nervous twitch, unless you enjoy the rodeo. But, the goal of the business isn't to be an adrenalin junky. The goal is to get insight that helps them smartly navigate through increasingly complex business landscapes and customer interactions. Those that get this have introduced a softer term, "blending." Another term dreamed up by data vendor marketers to avoid the dreaded conversation of data integration and data governance.
The reality is that you can't market message your way out of the fundamental problem that big data is creating data swamps even in the best intentioned efforts. (This is the reality of big data's first principle of a schema-less data.) Data governance for big data is primarily relegated to cataloging data and its lineage which serve the data management team but creates a new kind of nightmare for analysts and data scientist - working with a card catalog that will rival the Library of Congress. Dropping a self-service business intelligence tool or advanced analytic solution doesn't solve the problem of familiarizing the analyst with the data. Analysts will still spend up to 80% of their time just trying to create the data set to draw insights.
Last year I published a reasonably well-received research document on Hadoop infrastructure, “Building the Foundations for Customer Insight: Hadoop Infrastructure Architecture”. Now, less than a year later it’s looking obsolete, not so much because it was wrong for traditional (and yes, it does seem funny to use a word like “traditional” to describe a technology that itself is still rapidly evolving and only in mainstream use for a handful of years) Hadoop, but because the universe of analytics technology and tools has been evolving at light-speed.
If your analytics are anchored by Hadoop and its underlying map reduce processing, then the mainstream architecture described in the document, that of clusters of servers each with their own compute and storage, may still be appropriate. On the other hand, if, like many enterprises, you are adding additional analysis tools such as NoSQL databases, SQL on Hadoop (Impala, Stinger, Vertica) and particularly Spark, an in-memory-based analytics technology that is well suited for real-time and streaming data, it may be necessary to begin reassessing the supporting infrastructure in order to build something that can continue to support Hadoop as well as cater to the differing access patterns of other tools sets. This need to rethink the underlying analytics plumbing was brought home by a recent demonstration by HP of a reference architecture for analytics, publicly referred to as the HP Big Data Reference Architecture.
At the China Hadoop Summit 2015 in Beijing this past weekend, I talked with various big data players, including large consumers of big data China Unicom, Baidu.com, JD.com, and Ctrip.com; Hadoop platform solution providers Hortonworks, RedHadoop, BeagleData, and Transwarp; infrastructure software vendors like Sequotia.com; and Agile BI software vendors like Yonghong Tech.
The summit was well-attended — organizers planned for 1,000 attendees and double that number attended — and from the presentations and conversations it’s clear that big data ecosystems are making substantial progress. Here are some of my key takeaways:
Telcos are focusing on optimizing internal operations with big data.Take China Unicom, one of China’s three major telcos, for example. China Unicom has completed a comprehensive business scenario analysis of related data across each segment of internal business operations, including business and operations support systems, Internet data centers, and networks (fixed, mobile, and broadband). It has built a Hadoop-based big data platform to process trillions of mobile access records every day within the mobile network to provide practical guidelines and progress monitoring on the construction of base stations.
By now you have at least seen the cute little elephant logo or you may have spent serious time with the basic components of Hadoop like HDFS, MapReduce, Hive, Pig and most recently YARN. But do you have a handle on Kafka, Rhino, Sentry, Impala, Oozie, Spark, Storm, Tez… Giraph? Do you need a Zookeeper? Apache has one of those too! For example, the latest version of Hortonworks Data Platform has over 20 Apache packages and reflects the chaos of the open source ecosystem. Cloudera, MapR, Pivotal, Microsoft and IBM all have their own products and open source additions while supporting various combinations of the Apache projects.
After hearing the confusion between Spark and Hadoop one too many times, I was inspired to write a report, The Hadoop Ecosystem Overview, Q4 2104. For those that have day jobs that don’t include constantly tracking Hadoop evolution, I dove in and worked with Hadoop vendors and trusted consultants to create a framework. We divided the complex Hadoop ecosystem into a core set of tools that all work closely with data stored in Hadoop File System and extended group of components that leverage but do not require it.
In the past, enterprise architects could afford to think big picture and that meant treating Hadoop as a single package of tools. Not any more – you need to understand the details to keep up in the age of the customer. Use our framework to help, but please read the report if you can as I include a lot more there.
Hadoop adoption and innovation is moving forward at a fast pace, playing a critical role in today's data economy. But, how fast and far will Hadoop go heading into 2015?
Prediction 1: Hadooponomics makes enterprise adoption mandatory. The jury is in. Hadoop has been found not guilty of being an over-hyped open source platform. Hadoop has proven real enterprise value in any number of use cases including data lakes, traditional and advanced analytics, ETL-less ETL, active-archive, and even some transactional applications. All these use cases are powered by what Forrester calls “Hadooponomics” — its ability to linearly scale both data storage and data processing.
What it means: The remaining minority of dazed and confused CIOs will make Hadoop a priority for 2015.
The data economy — or the system that provides for the exchange of digitized information for the purpose of creating insights and value — grew in 2014, but in 2015 we’ll see it leap forward significantly. It will grow from a phenomenon that mainstream enterprises view at arm’s length as interesting to one that they embrace as a part of business as usual. The number of business and technology leaders telling us that external data is important to their business strategy has been growing rapidly -- from one-third in 2012 to almost half in 2014.
Why? It’s a supply-driven phenomenon made possible by widespread digitization, mobile technology, the Internet of Things (IoT), and Hadooponomics. With countless new data sources and powerful new tools to wrest insights from their depths, organizations will scramble to use them to know their customers better and to optimize their operations beyond anything they could have done before. And while the exploding data supply will spur demand, it will also spur additional supply. Firms will be taking a hard look at their “data exhaust” and wondering if there is a market for new products and services based on their unique set of data. But in many cases, the value in the data is not that people will be willing to pay money for bulk downloads or access to raw data, but in data products that complement a firm’s existing offerings.
But Avoid Ending Up With A Zoo Of Individual Big Data Solutions
We are beyond the point of struggling over the definition of big data. That doesn’t mean that we've resolved all of the confusion that surrounds the term, but companies today are instead struggling with the question of how to actually get started with big data.
28% of all companies are planning a big data project in 2014.
According to Forrester's Business Technographics™ Global Data And Analytics Survey, 2014, 28% of the more than 1600 responding companies globally are planning a Big Data project this year. More details and how this splits between IT and Business driven projects can be found in our new Forrester Report ‘Reset On Big Data’.
Or join our Forrester Forum For Technology Leaders in London, June 12&13, 2014 to hear and discuss with us directly what Big Data projects your peers are planning, what challenges they are facing and what goals they target to achieve.