The Year Ahead In Big Data? Big, Cool, New Stuff Looms Large!

Big data was inescapable in 2011. Without a doubt, it was the paramount banner story in data management, advanced analytics, and business intelligence (BI). The hype has been relentless, but it’s been driven by substantial innovations on many fronts.

The big data mania will intensify even further in the coming year. Here are some of the highlights that Forrester foresees in this exciting space in 2012:

  • Enterprise Hadoop deployments will expand at a rapid clip. Many enterprises have spent the past year or two kicking the tires of Hadoop, the emerging open source approach for scaling data analytics into the stratosphere of petabyte volumes, real-time velocities, and polystructured varieties. The market for enterprise-grade Hadoop solutions has grown by leaps and bounds and now includes several dozen vendors. Users all over the world and in most industries have invested aggressively in the technology and stand poised to bring their Hadoop clusters on line in the coming year. The size of the in-deployment clusters will almost certainly grow at least tenfold in 2012 as companies roll new data sources, new analytic challenges, and new business applications into their Hadoop initiatives.
  • In-memory analytics platforms will grow their footprint. Both startups and established vendors are rolling out BI and advanced analytics tools that are either entirely in-memory or persist many terabytes of working data in fast dynamic random access memory (DRAM). In 2012, enterprise adoption of in-memory BI/analytics tools and platforms will boom, owing not just to the increased availability of these technologies but also to the need for data scientists to interactively explore complex data sets in real time. As the cost of DRAM continues to decline, all-in-memory analytics will become the predominant architecture for all users, uses, and data. Big data will increasingly occupy huge pool of virtualized memory that spans many servers in the cloud.
  • Graph databases will come into vogue. One key gap in the Hadoop ecosystem is for graph databases, which support rich mining and visualization of relationships, influence, and behavioral propensities. The market for graph databases will boom in 2012 as companies everywhere adopt them for social media analytics, marketing campaign optimization, and customer experience fine-tuning. We will see VCs put big money behind graph database and analytics startups. Many big data platform and tool vendors will acquire the startups to supplement their expanding Hadoop, NoSQL, and enterprise data warehousing (EDW) portfolios. Social graph analysis, although not a brand-new field, will become one of the most prestigious specialties in the data science arena, focusing on high-powered drilldown into polystructured behavioral data sets.

This list is not exhaustive. We see plenty of other new big data developments in the coming year, such as the mainstreaming of cloud/SaaS EDW. Stay tuned as, in coming posts, we share our evolving research agenda for 2012.

Comments

Great insight James! You

Great insight James! You point out a key gap in the Hadoop ecosystem is for graph databases. It’s worth mentioning HPCC Systems already has use cases in this area. A whitepaper was recently published on how the HPCC Systems platform was used for social network analysis to help identify relationships and interactions into high-value clusters of interest in the health care industry. More at http://bit.ly/uul7Nw

Big Data Analytics & IT in 2012

Hello James - definitely interested in seeing your additional posts on this critical topic. I think it's especially interesting to consider architectures that are flexible enough to leverage all available data in analytic routines or to leverage analytics up-0front in the process to determine the most relevant data to factor into analytics. There is much more on this topic and the implications for IT in 2012 here - http://bit.ly/tMj7Ea

Thanks,

Mark Troester
SAS

Thanks, I like your blog too

Mark:

Great food for thought. Are you going to present at the SAS analyst summit in Steamboat Springs CO in late February? It would be great to chat with you there.

Jim

SAS Analyst Summit at Steamboat

Hello Jim - I'm not sure that I will be at Steamboat, but Mike Ames from product management will be there and I've been working closely with him. We're also scheduled to brief you on our Hadoop and Big Data Analytics plan in early January. Looking forward to the discussion! Thanks, Mark.

I look forward to SAS' Hadoop Big Data Analytics briefing

Mark:

I look forward to that briefing. SAS is conspicuously missing from the Hadoop market. I trust you'll have good stuff to discuss.

Jim

SAS Hadoop Strategy

Yes, true to our nature we are pretty conservative about highlighting new capabilities but we are blogging quite a bit about big data analytics and Hadoop on our blog - http://blogs.sas.com/content/datamanagement/ - but we have an extensive plan for Hadoop that complements what we are already doing with big data with our High Performance Computing capabilities that include grid, In-DB and in-memory capabilities. Our capabilites go beyond the basic data integration, parsing approach that allows data to be moved, cleansed, reformatted, etc., between Hadoop and other data sources to include leveraging Hadoop as an analytics processing platform (much like we support Teradata, EMC Greenplum, etc.), support for Pig, Hive, MapReduce, and we have plans that will allow SAS processing to be distributed to the Hadoop nodes using the Hadoop distribution capability. Hope that peaks your interest! Mark.

Definitely piques my interest!

Mark:

I've already received some briefings from SAS on all that. Looking forward to the fresh update in the new year.

Jim

nice "drift" to this blog

I notice that most blogs spend an inordinate time on "pipes and plumbing" instead of spending time on the real "elephant in the room" which is what is going to fill those pipes.

The first item is all P and P.

The second is also P and P. You are quite right that response times dictate that in memory analytics will be the standard.

The last category - graph databases - is a leap to the real issue. For that you have my compliments.

Here is the real issue. the "web of relations" among and between all the elements and fragments of all the documents and resources that you are addressing expands exponentially, as you increase the number of fragments linearly.

For 1 million documents, with 100 fragments each, has a possible combination at factorial 100 million. I believe that when you get to about factorial 100, you already exceed ten to the 78th power - which is an estimate of the number of protons in the known universe.

Iterating along a graph (imposed by schema or ontology) will run into this "volume roadblock" soon enough.

Unless and until you have a form of "base" that can exhaustively examine the web of relations among all the fragments that you are addressing, you will have substituted one limited technology paradigm for another, albeit with higher limits.

Relationship graph analysis is almost inherently "Big Data"

Carl:

Good discussion. As you note, the combinatorial explosion in any relationship graph analysis can quickly produce "Big Data" loads: greater volume of links, greater velocity of new combinations among links, and an order-of-magnitude greater variety in the relationships and their attributes. All of which quickly outstrips limited storage, processing, memory, and bandwidth resources. Graph analysis is the true killer app for Big Data.

Jim