Data Quality Reboot Series For Big Data: Part 1 Master Data

What data do you trust? Increasingly, business stakeholders and data scientists trust the information hidden in the bowels of big data. Yet, how data is mined mostly circumvents existing data governance and data architecture due to speed of insight required and support data discovery over repeatable reporting.

The key to this challenge is a data quality reboot: rethink what matters, and rethink data governance.

Part 1 of our Data Quality Reboot Series is to rethink master data management (MDM) in a big data world.

Current thinking: Master data as a single data entity. A common theme I hear from clients is that master data is about the linked data elements for a single record. No duplication or variation exists to drive consistency and uniqueness. Master data in the current thinking represents a defined, named entity (customer, supplier, product, etc.). This is a very static view of master data and does not account for the various dimensions required for what is important within a particular use case. We typically see this approach tied tightly to an application (customer resource management, enterprise resource management) for a particular business unit (marketing, finance, product management, etc.). It may have been the entry point for MDM initiatives, and it allowed for smaller scope tangible wins. But, it is difficult to expand that master data to other processes, analysis, and distribution points. Master data as a static entity only takes you so far, regardless of whether big data is incorporated into the discussion or not.

Reboot: Master data domains beyond entity.Let’s take a look at customer master data. In this context, big data is enticing because it provides an understanding of what drives behavior. The “who” becomes categorical vs. intrinsic. In most cases, this means shifting priority for data quality to transactional data and metadata over master data. Data that matters represents elements of “what,” “why,” and “when.” Master data can be expanded to classify the behavior, time, and intent domains.

What this means is that you move from a two-dimensional model to a multidimensional model of master data. Master data is all about the data model both in terms of relationships and hierarchies and how data elements are combined. Master data, metadata, and reference data converge under an MDM umbrella allowing for unlimited combinations determined by categories, definitions, and context. Because data has moved beyond structured and relational database constraints with big data, MDM must account for the structure and enforce business policies for a trusted holistic view.

So, think about the master data model over uniqueness and the data entity.

Keep following. More to come . . .

Reboot: Persistence vs. disposable.

Reboot: Data quality and acceptable risk.

Reboot: Business rules to the edges.

Reboot: Guardrails vs. controls.