Yellow Elephants and Pink Unicorns Don't Tell The Real Big Data Story

Big data and Hadoop (Yellow Elephants) are so synonymous that you can easily overlook the vast landscape of architecture that goes into delivering on big data value. Data scientists (Pink Unicorns) are also raised to god status as the only real role that can harness the power of big data -- making insights obtainable from big data as far away as a manned journey to Mars. However, this week, as I participated at the DGIQ conference in San Diego and colleagues and friends attended the Hadoop Summit in Belgium, it has become apparent that organizations are waking up to the fact that there is more to big data than a "cool" playground for the privileged few.

The perspective that the insight supply chain is the driver and catalyst of actions from big data is starting to take hold. Capital One, for example, illustrated that if insights from analytics and data from Hadoop were going to influence operational decisions and actions, you need the same degree of governance as you established in traditional systems. A conversation with Amit Satoor of SAP Global Marketing talked about a performance apparel company linking big data to operational and transactional systems at the edge of customer engagement and that it had to be easy for application developers to implement.

Hadoop distribution, NoSQL, and analytic vendors need to step up the value proposition to be more than where the data sits and how sophisticated you can get with the analytics. In the end, if you can't govern quality, security, and privacy for the scale of edge end user and customer engagement scenarios, those efforts to migrate data to Hadoop and the investment in analytic tools cost more than dollars; they cost you your business.

Read more

3 Ways Data Preparation Tools Help You Get Ahead Of Big Data

The business has an insatiable appetite for data and insights.  Even in the age of big data, the number one issue of business stakeholders and analysts is getting access to the data.  If access is achieved, the next step is "wrangling" the data into a usable data set for analysis.  The term "wrangling" itself creates a nervous twitch, unless you enjoy the rodeo.  But, the goal of the business isn't to be an adrenalin junky.  The goal is to get insight that helps them smartly navigate through increasingly complex business landscapes and customer interactions.  Those that get this have introduced a softer term, "blending."  Another term dreamed up by data vendor marketers to avoid the dreaded conversation of data integration and data governance.  

The reality is that you can't market message your way out of the fundamental problem that big data is creating data swamps even in the best intentioned efforts. (This is the reality of big data's first principle of a schema-less data.)  Data governance for big data is primarily relegated to cataloging data and its lineage which serve the data management team but creates a new kind of nightmare for analysts and data scientist - working with a card catalog that will rival the Library of Congress. Dropping a self-service business intelligence tool or advanced analytic solution doesn't solve the problem of familiarizing the analyst with the data.  Analysts will still spend up to 80% of their time just trying to create the data set to draw insights.  

Read more

Beyond Big Data's Vs: Fast Data Is More Than Data Velocity

When you hear the term fast data the first thought is probably the velocity of the data.  Not unusual in the realm of big data where velocity is one of the V's everyone talked about.  However, fast data encompasses more than a data characteristic, it is about how quickly you can get and use insight.  

Working with Noel Yuhanna on an upcoming report on how to develop your data management roadmap, we found speed was a continuous theme to achieve. Clients consistently call out speed as what holds them back.  How they interpret what speed means is the crux of the issue.

Technology management thinks about how quickly data is provisioned.  The solution is a faster engine - in-memory grids like SAP HANA become the tool of choice.  This is the wrong way to think about it.  Simply serving up data with faster integration and a high performance platform is what we have always done - better box, better integration software, better data warehouse.  Why use the same solution that in a year or two runs against the same wall? 

The other side of the equation is that sending data out faster ignores what business stakeholders and analytics teams want.  Speed to the business encompasses self-service data acquisition, faster deployment of data services, and faster changes.  The reason, they need to act on the data and insights.    

The right strategy is to create a vision that orients toward business outcomes.  Today's reality is that we live in a world where it is no longer about first to market, we have to be about first to value.  First to value with our customers, and first to value with our business capabilities.  The speed at which insights are gained and ultimately how they are put to use is your data management strategy.  

Read more

The Theory of Data Trust Relativity

Since the dawn of big data data quality and data governance professionals are yelling on rooftops about the impact of dirty data.  Data scientists are equally yelling back that good enough data is the new reality.  Data trust at has turned relative.

Consider these data points from recent Forrester Business Technographics Survey on Data and Analytics and our Online Global Survey on Data Quality and Trust:

  • Nearly 9 out of 10 data professionals rate data quality as a very important or important aspect of information governance
  • 43% of business and technology management professionals are somewhat confident in their data, and 25% are concerned
Read more

When CRM Fails On Customer Information

Early this year a host of inquires were coming in about data quality challenges in CRM systems.  This led to a number of joint inquires between myself and CRM expert Kate Legget, VP and Principal Analyst in our application development and delivery team.  Seems that the expectations that CRM systems could provide a single trusted view of the customer was starting to hit a reality check.  There is more to collecting customer data and activities, you need validation, cleansing, standardization, consolidation, enrichment and hierarchies.  CRM applications only get you so far, even with more and more functionality being added to reduce duplicate records and enforce classifications and groups.  So, what should companies do?

Read more

Data Before Technology: IBM Watson's Vision

I sat down with Steve Cowley, General Manager for IBM Watson, on Tuesday at IBM Insights to talk about Watson successes, challenges since the January launch, and what is in store.  While the potential has always intrigued me, the initial use cases and message gave me more than a bit of pause: the daunting task to develop and train the corpus, the narrowness of the use cases, what would this actually cost?  Jump ahead nine months and the IBM Watson world is in a very different place.

IBM is clearly in its market building phase.  It is as much about what IBM Watson is and how IBM overall is repositioning itself as it is about changing the business model for selling technology.  However, it is easy to get negative very fast on this strategy as seen with the tremors on Wall Street as IBM's stock has gone from a 52 week high of $199 to $164 at close on Friday 10/31, much of that happening in the past month since earnings release. Wall Street may not like company uncertainty during transitional periods, but enterprise architects care about what will make their organizations successful, make development and management of technology easier, and making sure costs don't sky rocket when new bright shiny objects come in. And, that is where IBM is headed with an eye toward changing the game.

IBM Watson delivers on information over technology.

Steve surprised me with this statement, "[With] traditional programmed systems, the system is at its best when it is deployed, because it is closest to the business need it was written for. Over time these systems get further and further away from the shifting business need and so either they fall in effectiveness, or require a great deal or maintenance." Steve pointed out that data is what is changing the game.*

Read more

Creating the Data Governance Killer App

One of the biggest stumbling blocks is getting business resources to govern data.  We've all heard it:

"I don't have time for this."

"Do you really need a full time person?"

"That really isn't my job."

"Isn't that an IT thing?"

"Can we just get a tool or hire a service company to fix the data?"

Let's face it, resources are the data governance killer even in the face of organizations trying to take on enterprise lead data governance efforts.

What we need to do is rethink the data governance bottlenecks and start with the guiding principle that data can only be governed when you have the right culture throughout the organization.  The point being, you need accountability with those that actually know something about the data, how it is used, and who feels the most pain.  That's not IT, that's not the data steward.  It's the customer care representative, the sales executive, the claims processor, the assessor, the CFO, and we can go on.  Not really the people you would normally include regularly in your data governance program.  Heck, they are busy!

But, the path to sustainable effective data governance is data citizenship - where everyone is a data steward.  So, we have to strike the right balance between automation, manual governance, and scale.  This is even more important as out data and system ecosystems are exploding in size, sophistication, and speed.  In the world of MDM and data quality vendors are looking specifically at how to get around these challenges.  There are five (5) areas of innovation:

Read more

Disruption Coming For MDM - The Hub of Context

Spending time at the MDM/DG Summit in NYC this week demonstrated the wide spectrum of MDM implementations and stories out in the market.  It certainly coincides with our upcoming MDM inquriry analysis where:

  • Big data is influencing MDM strategies and plans
  • Moving from MDM silos to enterprise MDM hubs
  • Linking MDM to business outcomes and initiatives
  • Cloud, cloud, cloud
Read more

Quality, Trusted, Fit for Purpose Data?

Often lagging in priorities when it comes to data strategy, it appears that data quality is coming back in favor. As organizations expand beyond data exploration and discovery to putting real action in place organization wide based on these insights, the risk of getting the answer wrong or having dirty data is higher.  

But, this may be anecdotal supposition, even in light of the wide conversations I've had with clients.   What we do know quantitatively is:

1) Data quality is the most important topic for information governance according to our recent Business Technographics research for data and analytics.  In fact,

2) We see an uptick in data quality inquiries from last year.  

3) Vendors are introducing data preparation tools that infuse data quality and governance into analytic and BI processes

Anecdotal evidence and quantiative evidence leads me to the thought that there is a bigger shift happening in how we think about data quality, how we act upon it, and what doing so does for our buisnesses.  When things are a-changing it always make my brain itch to get more data, more stories, and more evidence.  And, while I'm curious, I'm assuming you are too. It is great to see that something in influencing change - and we want to know what that is in order to determine if we too need to change.  However, what is more important is what are organizations doing and which are standing out in terms of success and improved ways of thinking and execution?  In the end, do we need to write a new playbook* for data quality?  Has the bar been reset and we need to rebenchmark?

Read more

Cognitive Computing Forum: 7 Things You Need To Know

Day one of the first Cognitive Computing Forum in San Jose, hosted by Dataversity, gave a great perspective on the state of cognitive computing; promising, but early.  I am here this week with my research director Leslie Owens and analyst colleague Diego LoGudice.  Gathering research for a series of reports for our cognitive engagement coverage, we were able to debrief tonight on what we heard and the questions these insights raise.  Here are some key take-aways:

1)  Big data mind shift to explore and accept failure is a heightened principle.  Chris Welty, formerly at IBM and a key developer of Watson and it's Jeoapardy winning solution, preached restraint.  Analytic pursuit of perfect answers delivers no business value.  Keep your eye on the prize and move the needle on what matters, even if your batting average is only .300 (30%).  The objective is a holistic pursuit of optimization.

2)  The algorithms aren't new, the platform capabilities and greater access to data allow us to realize cognitive for production uses.  Every speaker from academic, vendor, and expert was in agreement that the algorithms created decades ago are the same.  Hardware and the volume of available data have made neural networks and other machine learning algorithms both possible and more effective.  

Read more