Continuing Trend Of Data Exploration - Tableau Closes One Of The Remaining Gaps In Its Portfolio, Acquires HyPer

Boris Evelson
One of the reasons for only a portion of enterprise and external (about a third of structured and a quarter of unstructured -) data being available for insights is a restrictive architecture of SQL databases. In SQL databases data and metadata (data models, aka schemas) are tightly bound and inseparable (aka early binding, schema on write). Changing the model often requires at best just rebuilding an index or an aggregate, at worst - reloading entire columns and tables. Therefore many analysts start their work from data sets based on these tightly bound models, where DBAs and data architects have already built business requirements (that may be outdated or incomplete)  into the models. Thus the data delivered to the end-users already contains inherent biases, which are opaque to the user and can  strongly influence their analysis. As part of the natural evolution of Business Intelligence (BI) platforms data exploration now addresses this challenge. How? BI pros can now take advantage of ALL raw data available in their enterprises by:
 
Read more

What Qualifies A BI Vendor As A Native Hadoop BI Platform?

Boris Evelson

With the incredible popularity of big data and Hadoop every Business Intelligence (BI) vendor wants to also be known as a "BI on Hadoop" vendor. But what they really can do is limited to a) querying HDFS data organized in HIVE tables using HiveQL or b) ingest any flat file into memory and analyze the data there. Basically, to most of the BI vendors Hadoop is just another data source. Let's now see what qualifies a BI vendor as a "Native Hadoop BI Platform". If we assume that all BI platforms have to have data extraction/integration, persistence, analytics and visualization layers, then "Native Hadoop/Spark BI Platforms" should be able to (ok, yes, I just had to add Spark)

 

  • Use Hadoop/Spark as the primary processing platform for MOST of the aforementioned functionality. The only exception is visualization layer which is not what Hadoop/Spark do.
  • Use distributed processing frameworks natively, such as
    • Generation of MapReduce and/or Spark jobs
    • Management of distributed processing framework jobs by YARN, etc
    • Note, generating Hive or SparkSQL queries does not qualify
  • Do declarative work in the product’s main user interface interpreted and executed on Hadoop/Spark directly. Not via a "pass through" mode.
  • Natively support Apache Sentry and Apache Ranger security
 
Did I miss anything?

Hadoop, Spark, and the emerging big data landscape

Paul Miller

Not very long ago, it would have been almost inconceivable to consider a new large-scale data analysis project in which the open source Apache Hadoop did not play a pivotal role.

Every Hadoop blog post needs a picture of an elephant. (Source: Paul Miller)

Then, as so often happens, the gushing enthusiasm became more nuanced. Hadoop, some began (wrongly) to mutter, was "just about MapReduce." Hadoop, others (not always correctly) suggested, was "slow."

Then newer tools came along. Hadoop, a growing cacophony (innacurately) trumpeted, was "not as good as Spark."

But, in the real world, Hadoop continues to be great at what it's good at. It's just not good at everything people tried throwing in its direction. We really shouldn't be surprised by this. And yet, it seems, so many of us are.

For CIOs asked to drive new programmes of work in which big data plays a part (and few are not), the competing claims in this space are both unhelpful and confusing. Hadoop and Spark are not, despite some suggestions, directly equivalent. In many cases, asking "Hadoop or Spark" is simply the wrong question.

Read more

Make Your BI Environment More Agile With BI on Hadoop

Boris Evelson
In the past three decades, management information systems, data integration, data warehouses (DWs), BI, and other relevant technologies and processes only scratched the surface of turning data into useful information and actionable insights:
  • Organizations leverage less than half of their structured data for insights. The latest Forrester data and analytics survey finds that organizations use on average only 40% of their structured data for strategic decision-making. 
  • Unstructured data remains largely untapped. Organizations are even less mature in their use of unstructured data. They tap only about a third of their unstructured data sources (28% of semistructured and 31% of unstructured) for strategic decision-making. And these percentages don’t include more recent components of a 360-degree view of the customer, such as voice of the customer (VoC), social media, and the Internet of Things. 
  • BI architectures continue to become more complex. The intricacies of earlier-generation and many current business intelligence (BI) architectural stacks, which usually require the integration of dozens of components from different vendors, are just one reason it takes so long and costs so much to deliver a single version of the truth with a seamlessly integrated, centralized enterprise BI environment.
  • Existing BI architectures are not flexible enough. Most organizations take too long to get to the ultimate goal of a centralized BI environment, and by the time they think they are done, there are new data sources, new regulations, and new customer needs, which all require more changes to the BI environment.
Read more

Don't Throw Hadoop At Every BI Challenge

Boris Evelson

The explosion of data and fast-changing customer needs have led many companies to a realization: They must constantly improve their capabilities, competencies, and culture in order to turn data into business value. But how do Business Intelligence (BI) professionals know whether they must modernize their platforms or whether their main challenges are mostly about culture, people, and processes?

"Our BI environment is only used for reporting — we need big data for analytics."

"Our data warehouse takes very long to build and update — we were told we can replace it with Hadoop."

These are just some of the conversations that Forrester clients initiate, believing they require a big data solution. But after a few probing questions, companies realize that they may need to upgrade their outdated BI platform, switch to a different database architecture, add extra nodes to their data warehouse (DW) servers, improve their data quality and data governance processes, or other commonsense solutions to their challenges, where new big data technologies may be one of the options, but not the only one, and sometimes not the best. Rather than incorrectly assuming that big data is the panacea for all issues associated with poorly architected and deployed BI environments, BI pros should follow the guidelines in the Forrester recent report to decide whether their BI environment needs a healthy dose of upgrades and process improvements or whether it requires different big data technologies. Here are some of the findings and recommendations from the full research report:

1) Hadoop won't solve your cultural challenges

Read more

Rethinking Analytics Infrastructure

Richard Fichera

Last year I published a reasonably well-received research document on Hadoop infrastructure, “Building the Foundations for Customer Insight: Hadoop Infrastructure Architecture”. Now, less than a year later it’s looking obsolete, not so much because it was wrong for traditional (and yes, it does seem funny to use a word like “traditional” to describe a technology that itself is still rapidly evolving and only in mainstream use for a handful of years) Hadoop, but because the universe of analytics technology and tools has been evolving at light-speed.

If your analytics are anchored by Hadoop and its underlying map reduce processing, then the mainstream architecture described in the document, that of clusters of servers each with their own compute and storage, may still be appropriate. On the other hand, if, like many enterprises, you are adding additional analysis tools such as NoSQL databases, SQL on Hadoop (Impala, Stinger, Vertica) and particularly Spark, an in-memory-based analytics technology that is well suited for real-time and streaming data, it may be necessary to begin reassessing the supporting infrastructure in order to build something that can continue to support Hadoop as well as cater to the differing access patterns of other tools sets. This need to rethink the underlying analytics plumbing was brought home by a recent demonstration by HP of a reference architecture for analytics, publicly referred to as the HP Big Data Reference Architecture.

Read more

Time To Reset Your Knowledge Of Big Data Ecosystems In China

Charlie Dai

At the China Hadoop Summit 2015 in Beijing this past weekend, I talked with various big data players, including large consumers of big data China Unicom, Baidu.com, JD.com, and Ctrip.com; Hadoop platform solution providers Hortonworks, RedHadoop, BeagleData, and Transwarp; infrastructure software vendors like Sequotia.com; and Agile BI software vendors like Yonghong Tech.

The summit was well-attended — organizers planned for 1,000 attendees and double that number attended — and from the presentations and conversations it’s clear that big data ecosystems are making substantial progress. Here are some of my key takeaways:

  • Telcos are focusing on optimizing internal operations with big data.Take China Unicom, one of China’s three major telcos, for example. China Unicom has completed a comprehensive business scenario analysis of related data across each segment of internal business operations, including business and operations support systems, Internet data centers, and networks (fixed, mobile, and broadband). It has built a Hadoop-based big data platform to process trillions of mobile access records every day within the mobile network to provide practical guidelines and progress monitoring on the construction of base stations.
Read more

Elephants, Pigs, Rhinos and Giraphs; Oh My! – It's Time To Get A Handle On Hadoop

Brian  Hopkins

By now you have at least seen the cute little elephant logo or you may have spent serious time with the basic components of Hadoop like HDFS, MapReduce, Hive, Pig and most recently YARN. But do you have a handle on Kafka, Rhino, Sentry, Impala, Oozie, Spark, Storm, Tez… Giraph? Do you need a Zookeeper? Apache has one of those too! For example, the latest version of Hortonworks Data Platform has over 20 Apache packages and reflects the chaos of the open source ecosystem. Cloudera, MapR, Pivotal, Microsoft and IBM all have their own products and open source additions while supporting various combinations of the Apache projects.

After hearing the confusion between Spark and Hadoop one too many times, I was inspired to write a report, The Hadoop Ecosystem Overview, Q4 2104. For those that have day jobs that don’t include constantly tracking Hadoop evolution, I dove in and worked with Hadoop vendors and trusted consultants to create a framework. We divided the complex Hadoop ecosystem into a core set of tools that all work closely with data stored in Hadoop File System and extended group of components that leverage but do not require it.

In the past, enterprise architects could afford to think big picture and that meant treating Hadoop as a single package of tools. Not any more – you need to understand the details to keep up in the age of the customer. Use our framework to help, but please read the report if you can as I include a lot more there.

Read more

Ignore Digital Experience Delivery Technologies At Your Own Peril

Stephen Powers
Ignore digital experience delivery platforms in 2015, and you’ll spend all of 2016 playing catch up.
 
Since 2013, no fewer than eight vendors announced enterprise-class solutions vying to offer integrated, business-centric tools to create, deliver, measure, and optimize digital experiences. Just this week, French advertising giant Publicis Groupe acquired Sapient for $3.7 billion, and the second bullet of its press release, announced Publicis.Sapient, a new platform “focused exclusively on digital transformation and the dynamics of an always-on world across marketing, omni-channel commerce, consulting and technology.”
 
In our new document, “Predictions 2015: Digital Experience Delivery Platforms Become Flexible Or Lose Momentum,”  we share why we think that 2015 is the year that application development and delivery (AD&D) and digital marketers’ worlds collide – shared platforms, customer data, budgets, and priorities will emerge within B2C and progressive B2B enterprises. Now is the time for progressive digital customer experience technology leadership — from all corners of the organization — to come together to end the patchwork strategies of the past.
 
Read more

Forrester’s Hadoop Predictions 2015

Mike Gualtieri
Hadoop adoption and innovation is moving forward at a fast pace, playing a critical role in today's data economy. But, how fast and far will Hadoop go heading into 2015? 
 
Prediction 1: Hadooponomics makes enterprise adoption mandatory. The jury is in. Hadoop has been found not guilty of being an over-hyped open source platform. Hadoop has proven real enterprise value in any number of use cases including data lakes, traditional and advanced analytics, ETL-less ETL, active-archive, and even some transactional applications. All these use cases are powered by what Forrester calls “Hadooponomics” — its ability to linearly scale both data storage and data processing.
 
What it means: The remaining minority of dazed and confused CIOs will make Hadoop a priority for 2015.
 
Predictions 2 and 3: Forrester clients can read the full text of all 8 Hadoop Predictions.
 
Read more