Cloudera IPO Highlights The Big Data And Hadoop Opportunity

Jennifer Adams

Last week, Cloudera successfully completed an IPO, raising $259 million of equity capital, including the over-allotment option. Shares were priced at $15 per share and traded up to over $18 per share on the first day of trading, giving investors a 20%+ return.

Cloudera describes itself as a company that “empowers organizations to become data‑driven enterprises in the newly hyperconnected world.” Cloudera, founded in 2008, was the first commercial Hadoop player and is a Leader in Mike Gualtieri and Noel Yuhanna’s The Forrester Wave™: Big Data Hadoop Distributions, Q1 2016.

Last August, Forrester published its first Big Data Management Solutions Forecast, 2016 To 2021 (Global). In our forecast, we highlighted Hadoop as the fastest-growing sector, at a 32.9% CAGR over the 2016 to 2021 period. We estimate that firms will spend nearly $800 million on Hadoop and Hadoop-related services in 2017 and that this will grow to $2.3 billion by 2021.

In its S-1 filing, Cloudera reported revenues of $109 million, $166 million, and $261 million in the years ending January 31, 2015, 2016, and 2017, respectively. This represents 52% year-over-year growth in 2016, accelerating to 57% year-over-year growth in 2017. Cloudera’s customer base is primarily Global 8000 companies, accounting for 73% of revenues.

Read more

The Cloud Is Disrupting Hadoop

Brian  Hopkins

Forrester has seen unprecedented adoption of Hadoop in the last three years. We estimate that firms will spend $800 million in Hadoop software and related services in 2017. Not surprisingly, Hadoop vendors have capitalized on this — Cloudera, Hortonworks, and MapR have gone from a “Who?” to “household” brands in the same period of time.

But like any good run, times change. And the major force exerting pressure on Hadoop is the cloud. In a recent report, The Cloudy Future Of Hadoop, Mike Gualtieri and I examine the impact the cloud is having on Hadoop. Here are a few highlights:

●     Firms want to use more public cloud for big data, and Hadoop seems like a natural fit. We cover the reasons in the report, but the match seems made in heaven. Until you look deeper . . .

●     Hadoop wasn’t designed for the cloud, so vendors are scurrying to make it relevant. In the words of one insider, “Had we really understood cloud, we would not have designed Hadoop the way we did.” As a result, all the Hadoop vendors have strategies, and very different ones, to make Hadoop relevant in the cloud, where object stores and abstract “services” rule.

Read more

On-Premise Hadoop Just Got Easier With These 8 Hadoop-Optimized Systems

Mike Gualtieri

Enterprises agree that speedy deployment of big data Hadoop platforms has been critical to their success, especially as use cases expand and proliferate. However, deploying Hadoop systems is often difficult, especially when supporting complex workloads and dealing with hundreds of terabytes or petabytes of data. Architects need a considerable amount of time and effort to install, tune, and optimize Hadoop. Hadoop-optimized systems (aka appliances) make on-premises deployments virtually instant and blazing fast to boot. Unlike generic hardware infrastructure, Hadoop-optimized systems are preconfigured and integrated hardware and software components to deliver optimal performance and support various big data workloads. They also support one or many of the major distros such as Cloudera, Hortonworks, IBM BigInsights, and MapR.  As a result, organizations spend less time installing, tuning, troubleshooting, patching, upgrading, and dealing with integration- and scale-related issues.

Choose From Among 8 Hadoop-Optimized Systems Vendors

Noel Yuhanna and me published Forrester Wave: Big Data Hadoop-Optimized Systems, Q2 2016  where we evaluated 7 of the 8 options in the market. HP Enterprise's solution was not evaluated in this Wave, but Forrester also considers HPE a key player in the market for Hadoop-Optimized Systems along with the 7 vendors we did evaluate in the Wave. 

Read more

Big Data Vendors See The Internet Of Things (IoT) Opportunity, Pivot Tech And Message To Compete

Paul Miller

Picture of a stream flowing over boulders.

(Source: http://www.publicdomainpictures.net/pictures/90000/velka/waterfall-stream-over-boulders.jpg)

Open source big data technologies like Hadoop have done much to begin the transformation of analytics. We're moving from expensive and specialist analytics teams towards an environment in which processes, workflows, and decision-making throughout an organisation can - in theory at least - become usefully data-driven. Established providers of analytics, BI and data warehouse technologies liberally sprinkle Hadoop, Spark and other cool project names throughout their products, delivering real advantages and real cost-savings, as well as grabbing some of the Hadoop glow for themselves. Startups, often closely associated with shepherding one of the newer open source projects, also compete for mindshare and custom.

And the opportunity is big. Hortonworks, for example, has described the global big data market as a $50 billion opportunity. But that pales into insignificance next to what Hortonworks (again) describes as a $1.7 trillion opportunity. Other companies and analysts have their own numbers, which do differ, but the step-change is clear and significant. Hadoop, and the vendors gravitating to that community, mostly address 'data at rest'; data that has already been collected from some process or interaction or query. The bigger opportunity relates to 'data in motion,' and to the internet of things that will be responsible for generating so much of this.

Read more

Continuing Trend Of Data Exploration - Tableau Closes One Of The Remaining Gaps In Its Portfolio, Acquires HyPer

Boris Evelson
One of the reasons for only a portion of enterprise and external (about a third of structured and a quarter of unstructured -) data being available for insights is a restrictive architecture of SQL databases. In SQL databases data and metadata (data models, aka schemas) are tightly bound and inseparable (aka early binding, schema on write). Changing the model often requires at best just rebuilding an index or an aggregate, at worst - reloading entire columns and tables. Therefore many analysts start their work from data sets based on these tightly bound models, where DBAs and data architects have already built business requirements (that may be outdated or incomplete)  into the models. Thus the data delivered to the end-users already contains inherent biases, which are opaque to the user and can  strongly influence their analysis. As part of the natural evolution of Business Intelligence (BI) platforms data exploration now addresses this challenge. How? BI pros can now take advantage of ALL raw data available in their enterprises by:
 
Read more

What Qualifies A BI Vendor As A Native Hadoop BI Platform?

Boris Evelson

With the incredible popularity of big data and Hadoop every Business Intelligence (BI) vendor wants to also be known as a "BI on Hadoop" vendor. But what they really can do is limited to a) querying HDFS data organized in HIVE tables using HiveQL or b) ingest any flat file into memory and analyze the data there. Basically, to most of the BI vendors Hadoop is just another data source. Let's now see what qualifies a BI vendor as a "Native Hadoop BI Platform". If we assume that all BI platforms have to have data extraction/integration, persistence, analytics and visualization layers, then "Native Hadoop/Spark BI Platforms" should be able to (ok, yes, I just had to add Spark)

 

  • Use Hadoop/Spark as the primary processing platform for MOST of the aforementioned functionality. The only exception is visualization layer which is not what Hadoop/Spark do.
  • Use distributed processing frameworks natively, such as
    • Generation of MapReduce and/or Spark jobs
    • Management of distributed processing framework jobs by YARN, etc
    • Note, generating Hive or SparkSQL queries does not qualify
  • Do declarative work in the product’s main user interface interpreted and executed on Hadoop/Spark directly. Not via a "pass through" mode.
  • Natively support Apache Sentry and Apache Ranger security
 
Did I miss anything?

Hadoop, Spark, and the emerging big data landscape

Paul Miller

Not very long ago, it would have been almost inconceivable to consider a new large-scale data analysis project in which the open source Apache Hadoop did not play a pivotal role.

Every Hadoop blog post needs a picture of an elephant. (Source: Paul Miller)

Then, as so often happens, the gushing enthusiasm became more nuanced. Hadoop, some began (wrongly) to mutter, was "just about MapReduce." Hadoop, others (not always correctly) suggested, was "slow."

Then newer tools came along. Hadoop, a growing cacophony (innacurately) trumpeted, was "not as good as Spark."

But, in the real world, Hadoop continues to be great at what it's good at. It's just not good at everything people tried throwing in its direction. We really shouldn't be surprised by this. And yet, it seems, so many of us are.

For CIOs asked to drive new programmes of work in which big data plays a part (and few are not), the competing claims in this space are both unhelpful and confusing. Hadoop and Spark are not, despite some suggestions, directly equivalent. In many cases, asking "Hadoop or Spark" is simply the wrong question.

Read more

Make Your BI Environment More Agile With BI on Hadoop

Boris Evelson
In the past three decades, management information systems, data integration, data warehouses (DWs), BI, and other relevant technologies and processes only scratched the surface of turning data into useful information and actionable insights:
  • Organizations leverage less than half of their structured data for insights. The latest Forrester data and analytics survey finds that organizations use on average only 40% of their structured data for strategic decision-making. 
  • Unstructured data remains largely untapped. Organizations are even less mature in their use of unstructured data. They tap only about a third of their unstructured data sources (28% of semistructured and 31% of unstructured) for strategic decision-making. And these percentages don’t include more recent components of a 360-degree view of the customer, such as voice of the customer (VoC), social media, and the Internet of Things. 
  • BI architectures continue to become more complex. The intricacies of earlier-generation and many current business intelligence (BI) architectural stacks, which usually require the integration of dozens of components from different vendors, are just one reason it takes so long and costs so much to deliver a single version of the truth with a seamlessly integrated, centralized enterprise BI environment.
  • Existing BI architectures are not flexible enough. Most organizations take too long to get to the ultimate goal of a centralized BI environment, and by the time they think they are done, there are new data sources, new regulations, and new customer needs, which all require more changes to the BI environment.
Read more

Don't Throw Hadoop At Every BI Challenge

Boris Evelson

The explosion of data and fast-changing customer needs have led many companies to a realization: They must constantly improve their capabilities, competencies, and culture in order to turn data into business value. But how do Business Intelligence (BI) professionals know whether they must modernize their platforms or whether their main challenges are mostly about culture, people, and processes?

"Our BI environment is only used for reporting — we need big data for analytics."

"Our data warehouse takes very long to build and update — we were told we can replace it with Hadoop."

These are just some of the conversations that Forrester clients initiate, believing they require a big data solution. But after a few probing questions, companies realize that they may need to upgrade their outdated BI platform, switch to a different database architecture, add extra nodes to their data warehouse (DW) servers, improve their data quality and data governance processes, or other commonsense solutions to their challenges, where new big data technologies may be one of the options, but not the only one, and sometimes not the best. Rather than incorrectly assuming that big data is the panacea for all issues associated with poorly architected and deployed BI environments, BI pros should follow the guidelines in the Forrester recent report to decide whether their BI environment needs a healthy dose of upgrades and process improvements or whether it requires different big data technologies. Here are some of the findings and recommendations from the full research report:

1) Hadoop won't solve your cultural challenges

Read more

3 Ways Data Preparation Tools Help You Get Ahead Of Big Data

Michele Goetz

The business has an insatiable appetite for data and insights.  Even in the age of big data, the number one issue of business stakeholders and analysts is getting access to the data.  If access is achieved, the next step is "wrangling" the data into a usable data set for analysis.  The term "wrangling" itself creates a nervous twitch, unless you enjoy the rodeo.  But, the goal of the business isn't to be an adrenalin junky.  The goal is to get insight that helps them smartly navigate through increasingly complex business landscapes and customer interactions.  Those that get this have introduced a softer term, "blending."  Another term dreamed up by data vendor marketers to avoid the dreaded conversation of data integration and data governance.  

The reality is that you can't market message your way out of the fundamental problem that big data is creating data swamps even in the best intentioned efforts. (This is the reality of big data's first principle of a schema-less data.)  Data governance for big data is primarily relegated to cataloging data and its lineage which serve the data management team but creates a new kind of nightmare for analysts and data scientist - working with a card catalog that will rival the Library of Congress. Dropping a self-service business intelligence tool or advanced analytic solution doesn't solve the problem of familiarizing the analyst with the data.  Analysts will still spend up to 80% of their time just trying to create the data set to draw insights.  

Read more