Sometimes getting the data quality right is just hard, if not impossible. Even after implementing data quality tools, acquiring third-party data feeds, and implementing data steward remediation processes, often the business is still not satisfied with the quality of the data. Data is still missing and considered old or irrelevant. For example: Insurance companies want access to construction data to improve catastrophe modeling. Food chains need to incorporate drop-off bays and instructions for outlets in shopping malls and plazas to get food supplies to the prep tables. Global companies need to validate address information in developing countries that have incomplete or fast-changing postal directories for logistics. What it takes to complete the data and improve it has now entered the realm of hands-on processes.
Crowdflower says they have the answer to the data challenges listed above. It has a model of combining a crowdsourcing model and data stewardship platform to manage the last mile in data quality. The crowd is a vast network of people around the globe that are notified of data quality tasks through a data stewardship platform. If they can help with the data quality need within the time period requester, the contributor accepts the task and get to work. The crowd can use all resources and channels available to them to complete tasks such as web searches, visits, and phone inquiries. Quality control is performed to validate crowdsourced data and improvements. If an organization has more data quality tasks, machine learning is applied to analyze and optimize crowd sourcing based on the scores and results of contributors.
I had a conversation recently with Brian Lent, founder, chairman, and CTO of Medio. If you don’t know Brian, he has worked with companies such as Google and Amazon to build and hone their algorithms and is currently taking predictive analytics to mobile engagement. The perspective he brings as a data scientist not only has ramifications for big data analytics, but drastically shifts the paradigm for how we architect our master data and ensure quality.
We discussed big data analytics in the context of behavior and engagement. Think shopping carts and search. At the core, analytics is about the “closed loop.” It is, as Brian says, a rinse and repeat cycle. You gain insight for relevant engagement with a customer, you engage, then you take the results of that engagement and put them back into the analysis.
Sounds simple, but think about what that means for data management. Brian provided two principles:
It is easy to get caught up in the source and target paradigm when implementing master data management. The logical model looms large to identify where master data resides for linkage and makes the project -- well -- logical.
If this is the first step in your customer MDM endeavor and creating a master data definition based on identifying relevant data elements, STOP!
The first step is to articulate the story that customer MDM will support. This is the customer MDM blueprint.
For example, if the driving business strategy is to create a winning customer experience, customer MDM puts the customer definition at the center of what the customer experience looks like. The customer experience is the story. You need to understand and have data points for elements such as preferences, sentiment, lifestyle, and friends/relationships. These elements may be available within your CRM system, in social networks, with partners, and third-party data providers. The elements may be discrete or derived from analytics. If you only look for name, address, phone, and email, there is nothing about this definition that helps determine how you place that contact into context of engagement.
Ultimately, isn’t that what the business is asking for when they want the promised 360-degree view of the customer? Demands for complete, relevant, and timely are not grounded in the databases, data dictionaries, and integration/transformation processes of your warehouses and applications; they are grounded in the story.
So, don’t start with the data. Start with the story you want to tell.
I’ve been presenting research on big data and data governance for the past several months where I show a slide of a businesswoman doing a backbend to access data in her laptop. The point I make is that data management has to be hyper-flexible to meet a wider range of analytic and consumption demands than ever before. Translated, you need to cross-train for data management to have cross-fit data.
The challenge is that traditional data management takes a one-size fits-all approach. Data systems are purpose built. If organizations want to reuse a finance warehouse for marketing and sales purposes, it often isn’t a match and a new warehouse is built. If you want to get out of this cycle and go from data couch potato to data athlete, a cross-fit data training program should focus on:
Context first. Understanding how data is used and will provide value drives platform design. Context indicates more than where data is sourced from and where it will be delivered. Context answers: operations or analytics, structured or unstructured, persistent or disposable? These guide decisions around performance, scale, sourcing, cost, and governance.
Data governance zones. Command and control data governance creates a culture of “no” that stifles innovation and can cause the business to go around IT for data needs. The solution is to create policies and processes that give permission as well as mitigate risk. Loosen quality and security standards in projects and scenarios that are in contained environments. Tighten rules and create gates when called for by regulation, where there are ethical conflicts, or when data quality or access exposes the business to significant financial risk.