What's Your Big Data Score?

If you think the term "Big Data" is wishy washy waste, then you are not alone. Many struggle to find a definition of Big Data that is anything more than awe-inspiring hugeness. But Big Data is real if you have an actionable definition that you can use to answer the question: "Does my organization have Big Data?" Proposed is a definition that takes into account both the measure of data and the activities performed with the data. Be sure to scroll down to calculate your Big Data Score.

Big Data Can Be Measured

Big Data exhibits extremity across one or many of these three alliterate measures:

  • Volume. Metric prefixes rule the day when it comes to defining Big Data volume. In order of ascending magnitude: kilobyte, megabyte, gigabyte, terabyte, petabyte, exabyte, zettabyte, and yottabyte. A yottabyte is 1,000,000,000,000,000,000,000,000 bytes = 10 to the 24th power bytes. Wow! The first hard disk drive from IBM stored 3.75MB in 1956. That’s chump change compared to the 3TB Seagate harddrive I can buy at Amazon for $165.86. And, that’s just personal storage, IBM, Oracle, Teradata, and others have huge, fast storage capabilities for files and databases.
  • Velocity. Big data can come fast. Imagine dealing with 5TB per second as Akamai does on its content delivery and acceleration network. Or, algorithmic trading engines that must detect trade buy/sell patterns in which complex event processing platforms such as Progress Apama have 100 microseconds to detect trades coming in at 5,000 orders per second. RFID, GPS, and many other data can arrive fast enough to require technologies such as SAP Hana and  Rainstor to capture it fast enough.
  • Variety. There are thirty flavors of Pop-Tarts. Flavors of data can be just as shocking because combinations of relational data, unstructured data such as text, images, video, and every other variation can cause complexity in storing, processing, and querying that data. NoSQL databases such as MongoDB and Apache Cassandra are key-value stores that can store unstructured data with ease. Distributed processing engines like Hadoop can be used to process variety and volume (but, not velocity). Distributed in-memory caching platforms like VMware vFabric GemFire can store a variety of objects and is fast to boot because of in-memory.

Volume, velocity, and variety are fine measures of Big Data, but they are open-ended. There is no specific volume, velocity, or variety of data that constitutes big. If a yottabyte is Big Data, then doesn’t that mean a petabyte is not? So, how do you know if your organization has Big Data?
 

The Big Data Theory of Relativity

Big Data is relative. One organization’s Big Data is another organization’s peanut. It all comes down to how well you can handle these three Big Data activities:

  • Store. Can you store all the data, whether it is persistent or transient?
  • Process. Can you cleanse, enrich, calculate, translate, or run algorithms, analytics, or otherwise against the data?
  • Query. Can you search the data?

Calculate Your Big Data Score
For each combination of Big Data measures (volume, velocity, variety) and activities (store, process, query) in the table below enter a score:

  • 5 = Handled perfectly or not required
  • 3 = Handled ok but could be improved
  • 1 = Handled poorly and frequently results in negative business impact
  • 0 = Need exists but not handled.

Add up your scores in the points column and then sum at the bottom to get your Big Data score.

Once you have tallied your score, look in the table below to find out what it means.

I hope this helps and by all means, let me know how to improve this.

Mike Gualtieri, Principal Analyst, Forrester Research

 

Comments

Interactive tool

It would be great if someone could create an interactive tool to calculate Big Data score based on different scoring scales.

Great work, a couple of comments

Hi Mike - great post, moving to a more objective way to evaluate maturity is helpful.

I am wondering where notions like policy based storage and retention, and cross-system job pipelines would fall? Does cleanse and enrich encapsulate the notions of data linage and MDM capabilities (not separate but cross system)? Is there a dimension of analytics and processing in-flight rather than when "just" persisted?

Based on our work can I suggest that there be a dimension for non-expert user ability to leverage the data either through visual tools or automated data movement to more traditional environments for use of existing BI and reporting tools? Key capability for successful deployments beyond simple tasks like log analytics.

Again, great post Mike.

Big Data strategies...

Hi Tom,
I think policy based storage and retention are Big Data strategies. So, I think if firms have challenges with storage, processing, or query then they need to seek out products, architecture, and strategies to overcome those challenges.

Big Data solutions that can be managed by non-expert users or automated would be great.

A Forrester Wave that evaluates Big Data solutions would surely have many of your suggestions as criteria.

Mike

3Vs of Big Data

Great to see the industry finally adopting the "3V"s of big data over 11 years after Gartner first defined them. For future reference, and a copy of the original article I published in 2001, see: http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-.... --Doug Laney, VP Research, Gartner, @doug_laney

3V's is descriptive, but not actionable

Hi Doug,

Volume, velocity, and variety are great ways to measure Big Data, but as I described in my post "Volume, velocity, and variety are fine measures of Big Data, but they are open-ended.There is no specific volume, velocity, or variety of data that constitutes big"

I think this is why many people find it difficult to relate to Big Data in their own context. That is why I added the Store, Process, and Query dimension to determine if a firm actually has a Big Data problem.

e.g. What specific volume constitutes "big"? The answer is it is big if you can't handle it.

3V's is description which is why I included it in the post. But, it is not actionable. I have added 3 activities for each of the 3 measures to help firms assess their Big Data situation.

Also, thanks for posting the link!

Mike

I wonder if you have any

I wonder if you have any thoughts on the fourth 'V' -- Variability -- of Big Data.

More Vs

Don't know about Forrester, but the Gartner Big Data model has 12 total dimensions.

32 dimensions

We have 32 dimensions, but it is a closely guarded secret as to what they are. What are your 12 dimensions?

Variability increases complexity

Thanks for your question about variability. Big Data is essentially about complexity of using the data. The variability dimension can increase complexity because it can be hard to find extract the right data elements. Variability can be especially complex when their is a mixture of structured and unstructured data. For example, imagine data that consists of a relational database management system (RDBMS), one or more videos, a text stream (such as Twitter), and xml hierarchal data with no meta data.

Variability means data meaning is either:l
1) Changing rapidly
2) Has an unknown structure

Operational Intelligence

Great post, Mike.

Big Data refers to the massive amounts of highly-structured and loosely-structured data that is both “at rest” and “in motion.” The analysis of Big Data presents tremendous opportunity to gain competitive advantage through better business and customer insight. However, most Big Data approaches are only able to analyze Big Data when it is at rest (i.e., persistent data). This means that only a fraction of the available data is analyzed, to the exclusion of the insights that could be derived from Big Data in motion (i.e., streaming data). Big Data in motion includes data from smart grid meters, RSS feeds, computer networks and social media sites. Agile organizations require insight into all available data sources. Even more so, they need these insights in time to gain a competitive advantage.