Key Questions To Ask Yourself Before Embarking On A Big Data Journey

Do you think you are ready to tackle Big Data because you are pushing the limits of your data Volume, Velocity, Variety and Variability? Take a deep breath (and maybe a cold shower) before you plunge full speed ahead into unchartered territories and murky waters of Big Data. Now that you are calm, cool and collected, ask yourself the following key questions:

  • What’s the business use case? What are some of the business pain points, challenges and opportunities you are trying to address with Big Data? Are your business users coming to you with such requests or are you in the doomed-for-failure realm of technology looking for a solution?
  • Are you sure it’s not just BI 101Once you identify specific business requirements, ask whether Big Data is really the answer you are looking for. In the majority of my Big Data client inquiries, after a few probing questions I typically find out that it's really BI 101: data governance, data integration, data modeling and architecture, org structures, responsibilities, budgets, priorities, etc. Not Big Data.
  • Why can’t your current environment handle it? Next comes another sanity check. If you are still thinking you are dealing with Big Data challenges, are you sure you need to do something different, technology-wise? Are you really sure your existing ETL/DW/BI/Advanced Analytics environment can't address the pain points in question? Would just adding another node, another server, more memory (if these are all within your acceptable budget ranges) do the trick?
  • Are you looking for a different type of DBMS? Last, but not least. Do the answers to some of your business challenges lie in different types of databases (not necessarily Big Data) because relational or multidimensional DBMS models don’t support your business requirements (entity and attribute relationships are not relational)? Are you really looking to supplement RDBMS and MOLAP DBMS with hierarchical, object, XML, RDF (triple stores), graph, inverted index or associative DBMS?

Still think you need Big Data? Ok, let’s keep going. Which of the following two categories of Big Data use cases apply to you? Or is it both in your case?

  • Category 1. Cost reduction, containment, avoidance. Are you trying to do what you already do in your existing ETL/DW/BI/Advanced Analytics environment but just much cheaper (and maybe faster), using OSS technology like Hadoop (Hadoop OSS and commercial ecosystem is very complex, we are currently working on a landscape – if you have a POV on what it should look like, drop me a note)?
  • Category 2. Solving new problems. Are you trying to do something completely new, that you could not do at all before? Remember, all traditional ETL/DW/BI require a data model. Data models come from requirements. Requirements come from understanding of data and business processes. But in the world of Big Data you don’t know what’s out there until you look at it. We call this data exploration and discovery. It’s a step BEFORE requirements in the new world of Big Data.

Congratulations! Now you are really in the Big Data world. Problem solved? Not so fast. Even if you are convinced that are you need to solve new types of business problems with new technology, do you really know how to:

  • Manage it?
  • Secure it (compliance and risk officers and auditors hate Big Data!)?
  • Govern it?
  • Cleanse it?
  • Persist it?
  • Productionalize it?
  • Assign roles and responsibilities?

You may find that all of your best DW, BI, MDM practices for SDLC, PMO and Governance aren’t directly applicable to or just don’t work for Big Data. This is where the real challenge of Big Data currently lies. I personally have not seen a good example of best practices around managing and governing Big Data. If you have one, I’d love to see it!


Big data will become "data" overtime.

Nice points Boris, and I think underlying your questions is the idea that "big data" will cease to be interesting as a separate thing over time as it we adjust to dealing with new scales and add more tools to our tool bag. So what you are saying, is make sure you don't go to Sears and by a whole new socket set before ensuring the tools you have won't get the job done.

All in the definition

Certainly can appreciate this -- the only caution is a very common one it seems, which is that cultures define things differently. One org's big data is another's twitter stream and down the st. it's 40 years worth of global R&D.

Certainly for those who define big data as unstructured and semi-structured it makes perfect sense, and I think you are asking the really important questions at a very good time -- a glance at a couple of big data LI groups with thousands of members tells a pretty scary story, especially if paired with SIs that may be finding fresh blood difficult (sharks do strike me as a good metaphor at the moment--for many--even the well branded).

I think the answer is yes that most are--or should be, seeking to supplement current systems with a competitive advantage that returns completely different types of answers. Most of the historic financial BI seems to be fairly well served, but of course that represents the past, not the future, and even for those with the best P&L reading ability it generally informs too late--even in real-time, for trends that started long ago.

For tapping into the wealth of knowledge left on the table in any enterprise, understanding relationships, expediting discoveries and predictive work, it would be very wise to look at alts--something I know you do on behalf of clients consistently which I suspect has paid off well for some, as well as for others who may not be aware-- simply via healthier market farming resulting in greater innovation.

Thanks for the read- have a good weekend, MM

Big Data start with Big Question

agree, Big Data should start with big question, in order to solve business problems such as cost/process optimization or help business growth, mainly help capture insight in business decision making, customer perception and talent prediction., etc:

Now the biggest challenge for business is about some distraction caused by Big Data, lack of mature technology eco-system or industry best practice & standard, invest more, but not achieve expected result. thanks

Big Data is a pre processor

Good points Boris, agree.
Big Data can also be seen as a pre processor step(part of data integration) , it helps in getting to a 'structured data' format from 'unstructured data' , the output of this can then be prepared and integrated with traditional EDW or vice versa. What unstructured data the organization wants to handle can vary.

Its like a Google search which can help us locate relevant information faster...but for interpretation and usage of the search results, it will require us to apply a structural thought process.


Big Data Info...

I love these best development techniques. I would foreword to all my dedicated friends of Big Data Researcher and Thanks a lot.

Patricia Hall -