An IT mindset has dominated the way organizations view and manage their data. Even as issues of quality and consistency raise their ugly head, the solution has often been to turn to the tool and approach data governance in a project oriented manner. Sustainability has been a challenge, relegated often to IT managing and updating data management tools (MDM, data quality, metadata management, information lifecycle management, and security). Forrester research has shown that less than 15% of organizations have business lead data governance that is linked to business initiatives, objectives and outcomes. But, this is changing. More and more organizations are looking toward data governance as a strategic enterprise competence as they adopt a data driven culture.
This shift from project to strategic program requires more than basic workflow, collaboration, and data profiling capabilities to institutionalize data governance policies and rules. The conversation can't start with data management technology (MDM, data quality, information lifecycle management, security, and metadata management) that will apply the policies and rules. It has to begin with what is the organization trying to achieve with their data; this is a strategy discussion and process. The implication - governing data requires a rethink of your operating model. New roles, responsibilities, and processes emerge.
The last Forrester Wave for MDM was released in 2008 and focused on the Customer Hub. Well, things have certainly changed since then. Organizations need enterprise scale to break down data silos. Data Governance is quickly becoming part of an organization's operating model. And, don't forget, the big elephant in the room, Big Data.
From 2008 to now there have been multiple analyst firm evaluations of MDM vendors. Vendors come, go or are acquired. But, the leaders are almost always the same. We also see inquiries and implementations tracking to the leaders. Our market overview report helped to identify the distinct segments of MDM vendors and found that MDM leaders were going big, leveraging a strategic perspective of data management, a suite of products, and pushing to support and create modern data management environments. What needed to be addressed, how do you make a decision between these vendors?
The Forrester Wave for the Multi-Platform MDM market segment gets to the heart of this question by pushing top vendors to differentiate amongst themselves and evaluating them at the highest levels of MDM strategy. There were things we learned that surprised us as well as where the line was drawn between marketing messaging and positioning and real capabilities. This was done by positioning the Wave process the way our clients would evaluate vendors, rigorously questioning and fact checking responses and demos.
For decades, firms have deployed applications and BI on independent databases and warehouses, supporting custom data models, scalability, and performance while speeding delivery. It’s become a nightmare to try to integrate the proliferation of data across these sources in order to deliver the unified view of business data required to support new business applications, analytics, and real-time insights. The explosion of new sources, driven by the triple-threat trends of mobile, social, and the cloud, amplified by partner data, market feeds, and machine-generated data, further aggravates the problem. Poorly integrated business data often leads to poor business decisions, reduces customer satisfaction and competitive advantage, and slows product innovation — ultimately limiting revenue.
Forrester’s latest research reveals how leading firms are coping with this explosion using data virtualization, leading us to release a major new version of our reference architecture, Information Fabric 3.0. Since Forrester invented the category of data virtualization eight years ago with the first version of information fabric, these solutions have continued to evolve. In this update, we reflect new business requirements and new technology options including big data, cloud, mobile, distributed in-memory caching, and dynamic services. Use information fabric 3.0 to inform and guide your data virtualization and integration strategy, especially where you require real-time data sharing, complex business transactions, more self-service access to data, integration of all types of data, and increased support for analytics and predictive analytics.
Information fabric 3.0 reflects significant innovation in data virtualization solutions, including:
Big data gurus have said that data quality isn’t important for big data. Good enough is good enough. However, business stakeholders still complain about poor data quality. In fact, when Forrester surveyed customer intelligence professionals, the ability to integrate data and manage data quality are the top two factors holding customer intelligence back.
So, do big data gurus have it wrong? Sort of . . .
I had the chance to attend and present at a marketing event put on by MITX last week in Boston that focused on data science for marketing and customer experience. I recommend all data and big data professionals do this. Here is why. How marketers and agencies talk about big data and data science is different than how IT talks about it. This isn’t just a language barrier, it’s a philosophy barrier. Let’s look at this closer:
Data is totals. When IT talks about data, it’s talking of the physical elements stored in systems. When marketing talks about data, it’s referring to the totals and calculation outputs from analysis.
Quality is completeness. At the MITX event, Panera Bread was asked, how do they understand customers that pay cash? This lack of data didn’t hinder analysis. Panera looked at customers in their loyalty program and promotions that paid cash to make assumptions about this segment and their behavior. Analytics was the data quality tool that completed the customer picture.
Data rules are algorithms. When rules are applied to data, these are more aligned to segmentation and status that would be input into personalized customer interaction. Data rules are not about transformation to marketers.
Sometimes getting the data quality right is just hard, if not impossible. Even after implementing data quality tools, acquiring third-party data feeds, and implementing data steward remediation processes, often the business is still not satisfied with the quality of the data. Data is still missing and considered old or irrelevant. For example: Insurance companies want access to construction data to improve catastrophe modeling. Food chains need to incorporate drop-off bays and instructions for outlets in shopping malls and plazas to get food supplies to the prep tables. Global companies need to validate address information in developing countries that have incomplete or fast-changing postal directories for logistics. What it takes to complete the data and improve it has now entered the realm of hands-on processes.
Crowdflower says they have the answer to the data challenges listed above. It has a model of combining a crowdsourcing model and data stewardship platform to manage the last mile in data quality. The crowd is a vast network of people around the globe that are notified of data quality tasks through a data stewardship platform. If they can help with the data quality need within the time period requester, the contributor accepts the task and get to work. The crowd can use all resources and channels available to them to complete tasks such as web searches, visits, and phone inquiries. Quality control is performed to validate crowdsourced data and improvements. If an organization has more data quality tasks, machine learning is applied to analyze and optimize crowd sourcing based on the scores and results of contributors.
I had a conversation recently with Brian Lent, founder, chairman, and CTO of Medio. If you don’t know Brian, he has worked with companies such as Google and Amazon to build and hone their algorithms and is currently taking predictive analytics to mobile engagement. The perspective he brings as a data scientist not only has ramifications for big data analytics, but drastically shifts the paradigm for how we architect our master data and ensure quality.
We discussed big data analytics in the context of behavior and engagement. Think shopping carts and search. At the core, analytics is about the “closed loop.” It is, as Brian says, a rinse and repeat cycle. You gain insight for relevant engagement with a customer, you engage, then you take the results of that engagement and put them back into the analysis.
Sounds simple, but think about what that means for data management. Brian provided two principles:
I recently had a client ask about MDM measurement for their customer master. In many cases, the discussions I have about measurement is how to show that MDM has "solved world hunger" for the organization. In fact, a lot of the research and content out there focused on just that. Great to create a business case for investment. Not so good in helping with the daily management of master data and data governance. This client question is more practical, touching upon:
what about the data do you measure?
how do you calculate?
how frequently do you report and show trends?
how do you link the calculation to something the business understands?
I just came back from a Product Information Management (PIM) event this week had had a lot of discussions about how to evaluate vendors and their solutions. I also get a lot of inquiries on vendor selection and while a lot of the questions center around the functionality itself, how to evaluate is also a key point of discussion. What peaked my interest on this subject is that IT and the Business have very different objectives in selecting a solution for MDM, PIM, and data quality. In fact, it can often get contentious when IT and the Business don't agree on the best solution.
General steps to purchase a solution seem pretty consistent: create a short list based on the Forrester Wave and research, conduct an RFI, narrow down to 2-3 vendors for an RFP, make a decision. But, the devil seems to be in the details.
Is a proof of concept required?
How do you make a decision when vendors solutions appear the same? Are they really the same?
How do you put pricing into context? Is lowest really better?
What is required to know before engaging with vendors to identify fit and differentiation?
When does meeting business objectives win out over fit in IT skills and platform consistency?
Joining in on the spirit of all the 2013 predictions, it seems that we shouldn't leave data quality out of the mix. Data quality may not be as sexy as big data has been this past year. The technology is mature and reliable. The concept easy to understand. It is also one of the few areas in data management that has a recognized and adopted framework to measure success. (Read Malcolm Chisholm's blog on data quality dimensions) However, maturity shouldn't create complancency. Data quality still matters, a lot.
Yet, judgement day is here and data quality is at a cross roads. It's maturity in both technology and practice is steeped in an old way of thinking about and managing data. Data quality technology is firmly seated in the world of data warehousing and ETL. While still a significant portion of an enterprise data managment landscape, the adoption and use in business critical applications and processes of in-memory, Hadoop, data virtualization, streams, etc means that more and more data is bypassing the traditional platform.
The options to manage data quality are expanding, but not necessarily in a way that ensures that data can be trusted or complies with data policies. Where data quality tools have provided value is in the ability to have a workbench to centrally monitor, create and manage data quality processes and rules. They created sanity where ETL spaghetti created chaos and uncertainty. Today, this value proposition has diminished as data virtualization, Hadoop processes, and data appliances create and persist new data quality silos. To this, these data quality silos often do not have the monitoring and measurement to govern data. In the end, do we have data quality? Or, are we back where we started from?
I often see two ends of the extreme when I talk to clients who are trying to deal with data confidence challenges. One group typically sees it as a problem that IT has to address, while business users continue to use spreadsheets and other home-grown apps for BI. At the other end of the extreme, there's a strong, take-no-prisoners, top-down mandate for using only enterprise BI apps. In this case, a CEO may impose a rule that says that you can't walk into my office, ask me to make a decision, ask for a budget, etc., based on anything other than data coming from an enterprise BI application. This may sound great, but it's not often very practical; the world is not that simple, and there are many shades of grey in between these two extremes. No large, global, heterogeneous, multi-business- and product-line enterprise can ever hope to clean up all of its data - it's always a continuous journey. The key is knowing what data sources feed your BI applications and how confident you are about the accuracy of data coming from each source.
For example, here's one approach that I often see work very well. In this approach, IT assigns a data confidence index (an extra column attached to each transactional record in your data warehouse, data mart, etc.) during ETL processes. It may look something like this:
If data is coming from a system of record, the index = 100%.
If data is coming from nonfinancial systems and it reconciles with your G/L, the index = 100%. If not, it's < 100%.