When it comes to data investment, data management is still asking the wrong questions and positioning the wrong value. The mantra of - It's About the Business - is still a hard lesson to learn. It translates into what I see as the 7 Deadly Sins of Data Management. Here are the are - not in any particular order - and an example:
Hubris: "Business value? Yeah, I know. Tell me something I don't know."
Blindness: "We do align to business needs. See, we are building a customer master for a 360 degree view of the customer."
Vanity: "How can I optimize cost and efficiency to manage and develop data solutions?"
Gluttony: "If I build this cool solutions the business is gonna love it!"
Alien: "We need to develop an in-memory system to virtualize data and insight that materializes through business services with our application systems...[blah, blah, blah]"
Begger: "If only we were able to implement a business glossary, all our consistency issues are solved!"
Educator: "If only the business understood! I need to better educate them!."
IBM launched on January 9, 2014 its first business unit in 19 years to bring Watson, the machine that beat two Jeopardy champions in 2011, to the rest of us. IBM posits that Watson is the start of a third era in computing that started with manual tabulation, progressed to programmable, and now has become cognitive. Cognitive computing listens, learns, converses, and makes recommendations based on evidence.
IBM is placing big bets and big money, $1 billion, on transforming computer interaction from tabulation and programming to deep engagement. If they succeed, our interaction with technology will truly be personal through interactions and natural conversations that are suggestive, supportive, and as Terry Jones of Kayak explained, "makes you feel good" about the experience.
There are still hurdles for IBM and organizations, such as expense, complexity, information access, coping with ambiguity and context, the supervision of learning, and the implications of suggestions that are unrecognized today. To work, the ecosystem has to be open and communal. Investment is needed beyond the platform for applications and devices to deliver on Watson value. IBM's commitment and leadership are in place. The question is if IBM and its partners can scale Watson to be something more than a complex custom solution to become a truly transformative approach to businesses and our way of life.
Forrester believes that cognitive computing has the potential to address important problems that are unmet with today’s advanced analytics solutions. Though the road ahead is unmapped, IBM has now elevated its commitment to bring cognitive computing to life through this new business unit and the help of one third of its research organization, an ecosystem of partners, and pioneer companies willing to teach their private Watsons.
I had the opportunity to speak and participate in a panel on data governance as it pertained to big data. My presentation was based on recently completed research sponsored by IBM to understand, what does data governance look like by firms embarking/executing on big data? The overarching theme was that data governance is about protect and serve. Manage security and privacy while delivering trusted data.
Yet, when you look at data governance and what it means to the data practice, not the technology, protect and serve is also a credo. In business terms it represents:
Protect the reputation and mitigate risk associated with inappropriate use or dirty data.
Serve information needs of the business to have information fast and stay agile to market conditions.
Big data gurus have said that data quality isn’t important for big data. Good enough is good enough. However, business stakeholders still complain about poor data quality. In fact, when Forrester surveyed customer intelligence professionals, the ability to integrate data and manage data quality are the top two factors holding customer intelligence back.
So, do big data gurus have it wrong? Sort of . . .
I had the chance to attend and present at a marketing event put on by MITX last week in Boston that focused on data science for marketing and customer experience. I recommend all data and big data professionals do this. Here is why. How marketers and agencies talk about big data and data science is different than how IT talks about it. This isn’t just a language barrier, it’s a philosophy barrier. Let’s look at this closer:
Data is totals. When IT talks about data, it’s talking of the physical elements stored in systems. When marketing talks about data, it’s referring to the totals and calculation outputs from analysis.
Quality is completeness. At the MITX event, Panera Bread was asked, how do they understand customers that pay cash? This lack of data didn’t hinder analysis. Panera looked at customers in their loyalty program and promotions that paid cash to make assumptions about this segment and their behavior. Analytics was the data quality tool that completed the customer picture.
Data rules are algorithms. When rules are applied to data, these are more aligned to segmentation and status that would be input into personalized customer interaction. Data rules are not about transformation to marketers.
Sometimes getting the data quality right is just hard, if not impossible. Even after implementing data quality tools, acquiring third-party data feeds, and implementing data steward remediation processes, often the business is still not satisfied with the quality of the data. Data is still missing and considered old or irrelevant. For example: Insurance companies want access to construction data to improve catastrophe modeling. Food chains need to incorporate drop-off bays and instructions for outlets in shopping malls and plazas to get food supplies to the prep tables. Global companies need to validate address information in developing countries that have incomplete or fast-changing postal directories for logistics. What it takes to complete the data and improve it has now entered the realm of hands-on processes.
Crowdflower says they have the answer to the data challenges listed above. It has a model of combining a crowdsourcing model and data stewardship platform to manage the last mile in data quality. The crowd is a vast network of people around the globe that are notified of data quality tasks through a data stewardship platform. If they can help with the data quality need within the time period requester, the contributor accepts the task and get to work. The crowd can use all resources and channels available to them to complete tasks such as web searches, visits, and phone inquiries. Quality control is performed to validate crowdsourced data and improvements. If an organization has more data quality tasks, machine learning is applied to analyze and optimize crowd sourcing based on the scores and results of contributors.
I had a conversation recently with Brian Lent, founder, chairman, and CTO of Medio. If you don’t know Brian, he has worked with companies such as Google and Amazon to build and hone their algorithms and is currently taking predictive analytics to mobile engagement. The perspective he brings as a data scientist not only has ramifications for big data analytics, but drastically shifts the paradigm for how we architect our master data and ensure quality.
We discussed big data analytics in the context of behavior and engagement. Think shopping carts and search. At the core, analytics is about the “closed loop.” It is, as Brian says, a rinse and repeat cycle. You gain insight for relevant engagement with a customer, you engage, then you take the results of that engagement and put them back into the analysis.
Sounds simple, but think about what that means for data management. Brian provided two principles:
It is easy to get caught up in the source and target paradigm when implementing master data management. The logical model looms large to identify where master data resides for linkage and makes the project -- well -- logical.
If this is the first step in your customer MDM endeavor and creating a master data definition based on identifying relevant data elements, STOP!
The first step is to articulate the story that customer MDM will support. This is the customer MDM blueprint.
For example, if the driving business strategy is to create a winning customer experience, customer MDM puts the customer definition at the center of what the customer experience looks like. The customer experience is the story. You need to understand and have data points for elements such as preferences, sentiment, lifestyle, and friends/relationships. These elements may be available within your CRM system, in social networks, with partners, and third-party data providers. The elements may be discrete or derived from analytics. If you only look for name, address, phone, and email, there is nothing about this definition that helps determine how you place that contact into context of engagement.
Ultimately, isn’t that what the business is asking for when they want the promised 360-degree view of the customer? Demands for complete, relevant, and timely are not grounded in the databases, data dictionaries, and integration/transformation processes of your warehouses and applications; they are grounded in the story.
So, don’t start with the data. Start with the story you want to tell.
I’ve been presenting research on big data and data governance for the past several months where I show a slide of a businesswoman doing a backbend to access data in her laptop. The point I make is that data management has to be hyper-flexible to meet a wider range of analytic and consumption demands than ever before. Translated, you need to cross-train for data management to have cross-fit data.
The challenge is that traditional data management takes a one-size fits-all approach. Data systems are purpose built. If organizations want to reuse a finance warehouse for marketing and sales purposes, it often isn’t a match and a new warehouse is built. If you want to get out of this cycle and go from data couch potato to data athlete, a cross-fit data training program should focus on:
Context first. Understanding how data is used and will provide value drives platform design. Context indicates more than where data is sourced from and where it will be delivered. Context answers: operations or analytics, structured or unstructured, persistent or disposable? These guide decisions around performance, scale, sourcing, cost, and governance.
Data governance zones. Command and control data governance creates a culture of “no” that stifles innovation and can cause the business to go around IT for data needs. The solution is to create policies and processes that give permission as well as mitigate risk. Loosen quality and security standards in projects and scenarios that are in contained environments. Tighten rules and create gates when called for by regulation, where there are ethical conflicts, or when data quality or access exposes the business to significant financial risk.
There is a shift underway with master data management (MDM) that can't be ignored. It is no longer good enough to master domains in a silo and think of MDM as an integration tool. First-generation implementations have provided success to companies seeking to manage duplication, establishing a master definition, and consolidating data into a data warehouse. All good things. However, as organizations embrace federated environments and put big data architectures into wider use, these built-for-purpose MDM implementations are too narrowly focused and at times as rigid as the traditional data management platforms they support.
Yet, it doesn't have to be that way. By nature, MDM is meant to provide flexibility and elasticity to managing both single and multiple master domains. First, MDM has to be redefined from a data integration tool to a data modeling tool. Then, MDM is better aligned to business patterns and information needs, as it is designed by business context.
Enter The Golden Profile
When the business wants to put master data to use it is about how to have a view of a domain. The business doesn't think in terms of records, it thinks about using the data to improve customer relationships, grow the business, improve processes, or any host of other business tasks and objectives. A golden profile fits this need by providing the definition and framework that flexes to deliver master data based on context. It can do so because it is driven by data relationships.
I met with a group of clients recently on the evolution of data management and big data. One retailer asked, “Are you seeing the business going to external sources to do Big Data?”
My first reaction was, “NO!” Yet, as I thought about it more and went back to my own roots as an analyst, the answer is most likely, “YES!”
Ignoring nomenclature, the reality is that the business is not only going to external sources for big data, but they have been doing it for years. Think about it; organizations that have considered data a strategic tool have invested heavily in big data going back to when mainframes came into vogue. More recently, banking, retail, consumer packaged goods, and logistics have marquis case studies on what sophisticated data use can do.
Before Hadoop, before massive parallel processing, where did the business turn? Many have had relationships with market research organizations, consultancies, and agencies to get them the sophisticated analysis that they need.
Think about the fact, too, that at the beginning of social media, it was PR agencies that developed the first big data analysis and visualization of Twitter, LinkedIn, and Facebook influence. In a past life, I worked at ComScore Networks, an aggregator and market research firm analyzing and trending online behavior. When I joined, they had the largest and fastest growing private cloud to collect web traffic globally. Now, that was big data.
Today, the data paints a split picture. When surveying IT across various surveys, social media and online analysis is a small percentage of business intelligence and analytics that is supported. However, when we look to the marketing and strategy clients at Forrester, there is a completely opposite picture.