There was lots of feedback on the last blog (“Risk Data, Risky Business?”) that clearly indicates the divide between definitions in trust and quality. It is a great jumping off point for the next hot topic, data governance for big data.
The comment I hear most from clients, particularly when discussing big data, is, “Data governance inhibits agility.” Why be hindered by committees and bureaucracy when you want freedom to experiment and discover?
Current thinking: Data governance is freedom from risk.The stakes are high when it comes to data-intensive projects, and having the right alignment between IT and the business is crucial. Data governance has been the gold standard to establish the right roles, responsibilities, processes, and procedures to deliver trusted secure data. Success has been achieved through legislative means by enacting policies and procedures that reduce risk to the business from bad data and bad data management project implementation. Data governance was meant to keep bad things from happening.
Today’s data governance approach is important and certainly has a place in the new world of big data. When data enters the inner sanctum of an organization, management needs to be rigorous.
Yet, the challenge is that legislative data governance by nature is focused on risk avoidance. Often this model is still IT led. This holds progress back as the business may be at the table, but it isn’t bought in. This is evidenced by committee and project management style data governance programs focused on ownership, scope, and timelines. All this management and process takes time and stifles experimentation and growth.
So, this blog is dedicated to stepping outside the comfort zone once again and into the world of chaos. Not only may you not want to persist in your data quality transformations, but you may not want to cleanse the data.
Current thinking: Purge poor data from your environment. Put the word “risk” in the same sentence as data quality and watch the hackles go up on data quality professionals. It is like using salt in your coffee instead of sugar. However, the biggest challenge I see many data quality professionals face is getting lost in all the data due to the fact that they need to remove risk to the business caused by bad data. In the world of big data, clearly you are not going to be able to cleanse all that data. A best practice is to identify critical data elements that have the most impact on the business and focus efforts there. Problem solved.
Not so fast. Even scoping the data quality effort may not be the right way to go. The time and effort it takes as well as the accessibility of the data may not meet business needs to get information quickly. The business has decided to take the risk, focusing on direction rather than precision.
We last spoke about how to reboot our thinking on master data to provide a more flexible and useful structure when working with big data. In the structured data world, having a model to work from provides comfort. However, there is an element of comfort and control that has to be given up with big data, and that is our definition and the underlying premise for data quality.
Current thinking: Persistence of cleansed data.For years data quality efforts have focused on finding and correcting bad data. We used the word “cleansing” to represent the removal of what we didn’t want, exterminating it like it was an infestation of bugs or rats. Knowing what your data is, what it should look like, and how to transform it into submission defined the data quality handbook. Whole practices were stood up to track data quality issues, establish workflows and teams to clean the data, and then reports were produced to show what was done. Accomplishment was the progress and maintenance of the number of duplicates, complete records, last update, conformance to standards, etc. Our reports may also be tied to our personal goals. Now comes big data — how do we cleanse and tame that beast?
Reboot: Disposability of data quality transformation. The answer to the above question is, maybe you don’t. The nature of big data doesn’t allow itself to traditional data quality practices. The volume may be too large for processing. The volatility and velocity of data change too frequently to manage. The variety of data, both in scale and visibility, is ambiguous.
What data do you trust? Increasingly, business stakeholders and data scientists trust the information hidden in the bowels of big data. Yet, how data is mined mostly circumvents existing data governance and data architecture due to speed of insight required and support data discovery over repeatable reporting.
The key to this challenge is a data quality reboot: rethink what matters, and rethink data governance.
Part 1 of our Data Quality Reboot Series is to rethink master data management (MDM) in a big data world.
Current thinking: Master data as a single data entity. A common theme I hear from clients is that master data is about the linked data elements for a single record. No duplication or variation exists to drive consistency and uniqueness. Master data in the current thinking represents a defined, named entity (customer, supplier, product, etc.). This is a very static view of master data and does not account for the various dimensions required for what is important within a particular use case. We typically see this approach tied tightly to an application (customer resource management, enterprise resource management) for a particular business unit (marketing, finance, product management, etc.). It may have been the entry point for MDM initiatives, and it allowed for smaller scope tangible wins. But, it is difficult to expand that master data to other processes, analysis, and distribution points. Master data as a static entity only takes you so far, regardless of whether big data is incorporated into the discussion or not.
Customer service leaders know that a good customer experience has a quantifiable impact on revenue, as measured by increased rates of repurchase, increased recommendations, and decreased willingness to defect from a brand. They also conceptually understand that clean data is important, but many can’t make the connection between how master data management and data quality investments directly improve customer service metrics. This means that IT initiates data projects more than two-thirds of the time, while data projects that directly affect customer service processes rarely get funded.
What needs to happen is that customer service leaders have to partner with data management pros — often working within IT — to reframe the conversation. Historically, IT organizations would attempt to drive technology investments with the ambiguous goal of “cleaning dirty customer data” within CRM, customer service, and other applications. Instead of this approach, this team must articulate the impact that poor-quality data has on critical business and customer-facing processes.
To do this, start by taking an inventory of the quality of data that is currently available:
Chart the customer service processes that are followed by customer service agents. 80% of customer calls can be attributed to 20% of the issues handled.
Understand what customer, product, order, and past customer interaction data are needed to support these processes.