Rowan Curran, Research Associate and TechnoPolitics producer, hosts this episode to ask me (your regular host) about The Pragmatic Definition Of Big Data. Listen (5 mins) to hear the genesis of this new definition of big data and why it is pragmatic and actionable for both business and IT professionals.
Podcast: The Pragmatic Definition Of Big Data Explained (5 mins)
Big data is driving disruptive change across the economy in business such as healthcare, retail, communications, and entertainment. The potential for firms to use big data to create permanent relationships with customers is huge, and the time to get onboard is now. Big data is driving disruptive change across the economy in business such as healthcare, retail, communications, and entertainment. The potential for firms to use big data to create permanent relationships with customers is huge, and the time to get onboard is now. I was thrilled to be featured in the first episode on a new series, Big Thinkers In Big Data, hosted by TWit network's Sarah
In the face of rising data volume and complexity and increased need for self-service, enterprises need an effective business intelligence (BI) reference architecture to utilize BI as a key corporate asset for competitive differentiation. BI stakeholders — such as project managers, developers, data architects, enterprise architects, database administrators, and data quality specialists — may find the myriad choices and constant influx of new business requirements overwhelming. Forrester's BI reference architecture provides a framework with architectural patterns and building blocks to guide these BI stakeholders in managing BI strategy and architecture.
Enterprise information management (EIM) is complex — from a technical, organizational, and operational standpoint. But to business users, all that complexity is behind the scenes. What they need is BI, an interface to enterprise data — whether it's structured, semistructured, or unstructured. Our June 2011 Global Technology Trends Online Survey showed that BI topped even mobility — the frontrunner in recent years — as the technology most likely to provide business value over the next three years.
A key role of IT operations is to keep a complex portfolio of applications running and performing. "Traditional monitoring dashboards generate lots of pretty charts and graphs but don't really tell IT operations professionals a whole lot," says Forrester Principal Analyst Glenn O'Donnell. Big data analytics will change that because sophisticated algorithms can "look for the little tremors that tell us something big is about to happen."
High Availability And Performance Are Top Goals For IT Ops
Asked what 5 nines (99.999%) of availability means, Glenn replies immediately, "5 nines of availability is 26 seconds of downtime per month." He adds "If you want to capture just one 26 second event, you have to be polling every 13 seconds." Glenn knows his stuff. Listen to find out from Glenn how big data has a big place in the future of IT operations.
There was lots of feedback on the last blog (“Risk Data, Risky Business?”) that clearly indicates the divide between definitions in trust and quality. It is a great jumping off point for the next hot topic, data governance for big data.
The comment I hear most from clients, particularly when discussing big data, is, “Data governance inhibits agility.” Why be hindered by committees and bureaucracy when you want freedom to experiment and discover?
Current thinking: Data governance is freedom from risk.The stakes are high when it comes to data-intensive projects, and having the right alignment between IT and the business is crucial. Data governance has been the gold standard to establish the right roles, responsibilities, processes, and procedures to deliver trusted secure data. Success has been achieved through legislative means by enacting policies and procedures that reduce risk to the business from bad data and bad data management project implementation. Data governance was meant to keep bad things from happening.
Today’s data governance approach is important and certainly has a place in the new world of big data. When data enters the inner sanctum of an organization, management needs to be rigorous.
Yet, the challenge is that legislative data governance by nature is focused on risk avoidance. Often this model is still IT led. This holds progress back as the business may be at the table, but it isn’t bought in. This is evidenced by committee and project management style data governance programs focused on ownership, scope, and timelines. All this management and process takes time and stifles experimentation and growth.
Every year the Center For Digital Strategies at Tuck chooses a technology topic to "provide MBA candidates and the Tuck and Darthmouth communities with insights into how changes in technology affect individuals, impact enterprises and reshape industries." This academic year the topic is "Big Data: The Information Explosion That Will Reshape Our World". I had the honor and privilege to kick off the series about big data at the Tuck School of Business at Dartmouth. I am thrilled that our future business leaders are considering how big data can help companies, communities, and government make smarter decisions and provide better customer experiences. The combination of big data and predictive analytics is already changing the world. Below is the edited video of my talk on big data predictive analytics at Tuck in Hanover, NH.
There's certainly a lot of hype out there about big data. As I previously wrote, some of it is indeed hype, but there are still many legitimate big data cases - I saw a great example during my last business trip. Hadoop certainly plays a key role in the big data revolution, so all business intelligence (BI) vendors are jumping on the bandwagon and saying that they integrate with Hadoop. But what does that really mean? First of all, Hadoop is not a single entity; it's a conglomeration of multiple projects, each addressing a certain niche within the Hadoop ecosystem, such as data access, data integration, DBMS, system management, reporting, analytics, data exploration, and much much more. To lift the veil of hype, I recommend that you ask your BI vendors the following questions
Which specific Hadoop projects do you integrate with (HDFS, Hive, HBase, Pig, Sqoop, and many others)?
Do you work with the community edition software or with commercial distributions from MapR, EMC/Greenplum, Hortonworks, or Cloudera? Have these vendors certified your Hadoop implementations?
Do you have tools, utilities to help the client data into Hadoop in the first place (see comment from Birst)?
Are you querying Hadoop data directly from your BI tools (reports, dashboards) or are you ingesting Hadoop data into your own DBMS? If the latter:
Are you selecting Hadoop result sets using Hive?
Are you ingesting Hadoop data using Sqoop?
Is your ETL generating and pushing down Map Reduce jobs to Hadoop? Are you generating Pig scripts?
So, this blog is dedicated to stepping outside the comfort zone once again and into the world of chaos. Not only may you not want to persist in your data quality transformations, but you may not want to cleanse the data.
Current thinking: Purge poor data from your environment. Put the word “risk” in the same sentence as data quality and watch the hackles go up on data quality professionals. It is like using salt in your coffee instead of sugar. However, the biggest challenge I see many data quality professionals face is getting lost in all the data due to the fact that they need to remove risk to the business caused by bad data. In the world of big data, clearly you are not going to be able to cleanse all that data. A best practice is to identify critical data elements that have the most impact on the business and focus efforts there. Problem solved.
Not so fast. Even scoping the data quality effort may not be the right way to go. The time and effort it takes as well as the accessibility of the data may not meet business needs to get information quickly. The business has decided to take the risk, focusing on direction rather than precision.
I recently had both the privilege and pleasure to do a deep dive into the cold and warm BI waters in Russia and Israel. Cold - because some of my experiences were sobering. Warm - because the reception could not have been more pleasant. My presentations were well attended (sponsored by www.in4media.ru in Russia and www.matrix.co.il in Israel), showing high levels of BI interest, adoption, experience, and expertise. Challenges remain the same, as Russian and Israeli businesses struggle with BI governance, ownership, SDLC and PMO methodologies, data, and app integration just like the rest of the world. I spent long evening hours with a large global company in Israel that grew rapidly by M&A and is struggling with multiple strategic challenges: centralize or localize BI, vendor selection, end user empowerment, etc. Sound familiar?
But it was not all business as usual. A few interesting regional peculiarities did come out. For example, the "BI as a key competitive differentiator" message fell on mostly deaf ears in Russia, as Russian companies don't really compete against each other. Territories, brands, markets, and spheres of influence are handed top down from the government or negotiated in high-level deals behind closed doors. That is not to say, however, that BI in Russia is only used for reporting - multiple businesses are pushing BI to the limits such as advanced customer segmentation for better upsell/cross-sell rates.
I was also pleasantly surprised and impressed a few times (and for those of you who know me well, you know that it's pretty hard to impress the old veteran):
We last spoke about how to reboot our thinking on master data to provide a more flexible and useful structure when working with big data. In the structured data world, having a model to work from provides comfort. However, there is an element of comfort and control that has to be given up with big data, and that is our definition and the underlying premise for data quality.
Current thinking: Persistence of cleansed data.For years data quality efforts have focused on finding and correcting bad data. We used the word “cleansing” to represent the removal of what we didn’t want, exterminating it like it was an infestation of bugs or rats. Knowing what your data is, what it should look like, and how to transform it into submission defined the data quality handbook. Whole practices were stood up to track data quality issues, establish workflows and teams to clean the data, and then reports were produced to show what was done. Accomplishment was the progress and maintenance of the number of duplicates, complete records, last update, conformance to standards, etc. Our reports may also be tied to our personal goals. Now comes big data — how do we cleanse and tame that beast?
Reboot: Disposability of data quality transformation. The answer to the above question is, maybe you don’t. The nature of big data doesn’t allow itself to traditional data quality practices. The volume may be too large for processing. The volatility and velocity of data change too frequently to manage. The variety of data, both in scale and visibility, is ambiguous.