In the past three decades, management information systems, data integration, data warehouses (DWs), BI, and other relevant technologies and processes only scratched the surface of turning data into useful information and actionable insights:
  • Organizations leverage less than half of their structured data for insights. The latest Forrester data and analytics survey finds that organizations use on average only 40% of their structured data for strategic decision-making. 
  • Unstructured data remains largely untapped. Organizations are even less mature in their use of unstructured data. They tap only about a third of their unstructured data sources (28% of semistructured and 31% of unstructured) for strategic decision-making. And these percentages don’t include more recent components of a 360-degree view of the customer, such as voice of the customer (VoC), social media, and the Internet of Things. 
  • BI architectures continue to become more complex. The intricacies of earlier-generation and many current business intelligence (BI) architectural stacks, which usually require the integration of dozens of components from different vendors, are just one reason it takes so long and costs so much to deliver a single version of the truth with a seamlessly integrated, centralized enterprise BI environment.
  • Existing BI architectures are not flexible enough. Most organizations take too long to get to the ultimate goal of a centralized BI environment, and by the time they think they are done, there are new data sources, new regulations, and new customer needs, which all require more changes to the BI environment.
  • Partial solutions — Agile BI and big data — address only parts of the challenges. Agile BI addresses the challenges of complexity and timeliness, and big data makes more data available to decision-makers. But only a few leading organizations leverage the best of both worlds by using their Agile BI tools to seamlessly access, process, and analyze data staged in Hadoop-based data hubs.
Six Steps To Analyzing Data On Hadoop
 
New technologies and best-practice processes that combine Agile BI with big data into systems of insight (SOI) will help enterprises get closer to the 360-degree view of their customers.  To get there, BI pros should stage all of their structured, unstructured, and semistructured data that is not ready for EDW into a Hadoop-based data hub.  Then make that data available for reporting, analysis, and exploration with your current enterprise BI platform or Hadoop-specific BI tools. But note that exploring, querying, and analyzing Hadoop-based data sets calls for different technologies and approaches compared with BI, which is based on highly structured, integrated, and cleansed relational database management systems (RDBMSes). To make Hadoop-based data ready to be explored and analyzed, BI pros must follow six steps: ingest, discover, enrich, explore, move, and analyze.
 
Step 1 — Ingest: Copy Your Data Into A Hadoop Platform
 
The difference between analyzing data in a data warehouse based on RDBMS versus Hadoop starts with the ingestion process. While Hadoop is often used as a data management platform, it’s not a database. Therefore, many of the typical database management mechanisms don’t come out of the box with an open source community version of Apache Hadoop. BI pros looking to automate and control the Hadoop data ingestion process will need to:
 
  • Manage delta updates with custom coding or commercial ETL platforms.
  • Consider commercial SQL-for-Hadoop engines to handle referential integrity.
  • Deploy commercial platforms to supplement missing rollback capabilities.
  • Recognize that OLTP is not a native capability of any file system, including HDFS.
  • Custom-code transactional controls..

Step 2 — Discover: Find What Data Exists In Your Source Systems And Hadoop

This step is optional if you are dealing with only a few data sources or if all data sources are well-documented. But most large global organizations typically deal with thousands of data sources, many of which are not well-documented and don’t have rich descriptive metadata, if they have any at all. Additionally, many modern applications based on NoSQL databases bury business logic in the application, not the data layer. As a result, many data source discovery methods lack the automation and self-service capabilities needed to effectively profile the large, expanding number of data sources. Typical methods for discovering and profiling data sources include asking subject matter experts or reviewing source system documentation. This dependency on others to identify the right data for context-specific analytics requirements causes huge delays in the discovery process, as it often requires multiple opinions and trial-and-error to find the best data fit. AD&D pros working on BI-on-Hadoop initiatives can help their business colleagues accelerate discovery by deploying tools that can help you:
 
  • Discover and profile your data sources before they are ingested into Hadoop.
  • Discover and profile HDFS-based files.
  • Provide data lineage and impact analysis.

For more details on the 6 steps to analyze Hadoop based data, plus overview of specific use cases such as building new BI apps on Hadoop vs. porting existing BI apps to Hadoop, as well as evaluation of key BI on Hadoop vendors such as Alation, Apache Kylin, Arcadia Data, AtScale, Attivio, Datameer, DataTorrent, JethroData, Kyvos Insights, Oracle Big Data Discovery, Platfora, SAP Lumira, Splunk, Tamr, Teradata Loom, Waterline Data, and Zoomdata please see the detailed research report.