For decades, firms have deployed applications and BI on independent databases and warehouses, supporting custom data models, scalability, and performance while speeding delivery. It’s become a nightmare to try to integrate the proliferation of data across these sources in order to deliver the unified view of business data required to support new business applications, analytics, and real-time insights. The explosion of new sources, driven by the triple-threat trends of mobile, social, and the cloud, amplified by partner data, market feeds, and machine-generated data, further aggravates the problem. Poorly integrated business data often leads to poor business decisions, reduces customer satisfaction and competitive advantage, and slows product innovation — ultimately limiting revenue.
Forrester’s latest research reveals how leading firms are coping with this explosion using data virtualization, leading us to release a major new version of our reference architecture, Information Fabric 3.0. Since Forrester invented the category of data virtualization eight years ago with the first version of information fabric, these solutions have continued to evolve. In this update, we reflect new business requirements and new technology options including big data, cloud, mobile, distributed in-memory caching, and dynamic services. Use information fabric 3.0 to inform and guide your data virtualization and integration strategy, especially where you require real-time data sharing, complex business transactions, more self-service access to data, integration of all types of data, and increased support for analytics and predictive analytics.
Information fabric 3.0 reflects significant innovation in data virtualization solutions, including:
Big data gurus have said that data quality isn’t important for big data. Good enough is good enough. However, business stakeholders still complain about poor data quality. In fact, when Forrester surveyed customer intelligence professionals, the ability to integrate data and manage data quality are the top two factors holding customer intelligence back.
So, do big data gurus have it wrong? Sort of . . .
I had the chance to attend and present at a marketing event put on by MITX last week in Boston that focused on data science for marketing and customer experience. I recommend all data and big data professionals do this. Here is why. How marketers and agencies talk about big data and data science is different than how IT talks about it. This isn’t just a language barrier, it’s a philosophy barrier. Let’s look at this closer:
Data is totals. When IT talks about data, it’s talking of the physical elements stored in systems. When marketing talks about data, it’s referring to the totals and calculation outputs from analysis.
Quality is completeness. At the MITX event, Panera Bread was asked, how do they understand customers that pay cash? This lack of data didn’t hinder analysis. Panera looked at customers in their loyalty program and promotions that paid cash to make assumptions about this segment and their behavior. Analytics was the data quality tool that completed the customer picture.
Data rules are algorithms. When rules are applied to data, these are more aligned to segmentation and status that would be input into personalized customer interaction. Data rules are not about transformation to marketers.
Sometimes getting the data quality right is just hard, if not impossible. Even after implementing data quality tools, acquiring third-party data feeds, and implementing data steward remediation processes, often the business is still not satisfied with the quality of the data. Data is still missing and considered old or irrelevant. For example: Insurance companies want access to construction data to improve catastrophe modeling. Food chains need to incorporate drop-off bays and instructions for outlets in shopping malls and plazas to get food supplies to the prep tables. Global companies need to validate address information in developing countries that have incomplete or fast-changing postal directories for logistics. What it takes to complete the data and improve it has now entered the realm of hands-on processes.
Crowdflower says they have the answer to the data challenges listed above. It has a model of combining a crowdsourcing model and data stewardship platform to manage the last mile in data quality. The crowd is a vast network of people around the globe that are notified of data quality tasks through a data stewardship platform. If they can help with the data quality need within the time period requester, the contributor accepts the task and get to work. The crowd can use all resources and channels available to them to complete tasks such as web searches, visits, and phone inquiries. Quality control is performed to validate crowdsourced data and improvements. If an organization has more data quality tasks, machine learning is applied to analyze and optimize crowd sourcing based on the scores and results of contributors.
I had a conversation recently with Brian Lent, founder, chairman, and CTO of Medio. If you don’t know Brian, he has worked with companies such as Google and Amazon to build and hone their algorithms and is currently taking predictive analytics to mobile engagement. The perspective he brings as a data scientist not only has ramifications for big data analytics, but drastically shifts the paradigm for how we architect our master data and ensure quality.
We discussed big data analytics in the context of behavior and engagement. Think shopping carts and search. At the core, analytics is about the “closed loop.” It is, as Brian says, a rinse and repeat cycle. You gain insight for relevant engagement with a customer, you engage, then you take the results of that engagement and put them back into the analysis.
Sounds simple, but think about what that means for data management. Brian provided two principles:
I recently had a client ask about MDM measurement for their customer master. In many cases, the discussions I have about measurement is how to show that MDM has "solved world hunger" for the organization. In fact, a lot of the research and content out there focused on just that. Great to create a business case for investment. Not so good in helping with the daily management of master data and data governance. This client question is more practical, touching upon:
what about the data do you measure?
how do you calculate?
how frequently do you report and show trends?
how do you link the calculation to something the business understands?
I just came back from a Product Information Management (PIM) event this week had had a lot of discussions about how to evaluate vendors and their solutions. I also get a lot of inquiries on vendor selection and while a lot of the questions center around the functionality itself, how to evaluate is also a key point of discussion. What peaked my interest on this subject is that IT and the Business have very different objectives in selecting a solution for MDM, PIM, and data quality. In fact, it can often get contentious when IT and the Business don't agree on the best solution.
General steps to purchase a solution seem pretty consistent: create a short list based on the Forrester Wave and research, conduct an RFI, narrow down to 2-3 vendors for an RFP, make a decision. But, the devil seems to be in the details.
Is a proof of concept required?
How do you make a decision when vendors solutions appear the same? Are they really the same?
How do you put pricing into context? Is lowest really better?
What is required to know before engaging with vendors to identify fit and differentiation?
When does meeting business objectives win out over fit in IT skills and platform consistency?
Joining in on the spirit of all the 2013 predictions, it seems that we shouldn't leave data quality out of the mix. Data quality may not be as sexy as big data has been this past year. The technology is mature and reliable. The concept easy to understand. It is also one of the few areas in data management that has a recognized and adopted framework to measure success. (Read Malcolm Chisholm's blog on data quality dimensions) However, maturity shouldn't create complancency. Data quality still matters, a lot.
Yet, judgement day is here and data quality is at a cross roads. It's maturity in both technology and practice is steeped in an old way of thinking about and managing data. Data quality technology is firmly seated in the world of data warehousing and ETL. While still a significant portion of an enterprise data managment landscape, the adoption and use in business critical applications and processes of in-memory, Hadoop, data virtualization, streams, etc means that more and more data is bypassing the traditional platform.
The options to manage data quality are expanding, but not necessarily in a way that ensures that data can be trusted or complies with data policies. Where data quality tools have provided value is in the ability to have a workbench to centrally monitor, create and manage data quality processes and rules. They created sanity where ETL spaghetti created chaos and uncertainty. Today, this value proposition has diminished as data virtualization, Hadoop processes, and data appliances create and persist new data quality silos. To this, these data quality silos often do not have the monitoring and measurement to govern data. In the end, do we have data quality? Or, are we back where we started from?
I often see two ends of the extreme when I talk to clients who are trying to deal with data confidence challenges. One group typically sees it as a problem that IT has to address, while business users continue to use spreadsheets and other home-grown apps for BI. At the other end of the extreme, there's a strong, take-no-prisoners, top-down mandate for using only enterprise BI apps. In this case, a CEO may impose a rule that says that you can't walk into my office, ask me to make a decision, ask for a budget, etc., based on anything other than data coming from an enterprise BI application. This may sound great, but it's not often very practical; the world is not that simple, and there are many shades of grey in between these two extremes. No large, global, heterogeneous, multi-business- and product-line enterprise can ever hope to clean up all of its data - it's always a continuous journey. The key is knowing what data sources feed your BI applications and how confident you are about the accuracy of data coming from each source.
For example, here's one approach that I often see work very well. In this approach, IT assigns a data confidence index (an extra column attached to each transactional record in your data warehouse, data mart, etc.) during ETL processes. It may look something like this:
If data is coming from a system of record, the index = 100%.
If data is coming from nonfinancial systems and it reconciles with your G/L, the index = 100%. If not, it's < 100%.
The number one reason I hear from IT organizations for why they want to embark on MDM is for consolidation or integration of systems. Then, the first question I get, how do they get buy-in from the business to pay for it?
My first reaction is to cringe because the implication is that MDM is a data integration tool and the value is the matching capabilities. While matching is a significant capability, MDM is not about creating a golden record or a single source of truth.
My next reaction is that IT missed the point that the business wants data to support a system of engagement. The value of MDM is to be able to model and render a domain to fit a system of engagement. Until you understand and align to that, your MDM effort will not support the business and you won’t get the funding. If you somehow do get the funding, you won’t be able to appropriately select the MDM tool that is right for the business need, thus wasting time, money, and resources.
Here is why I am not a fan of the “single source of truth” mantra. A person is not one-dimensional; they can be a parent, a friend, or a colleague, and each has different motivations and requirements depending on the environment. A product is as much about the physical aspect as it is about the pricing, message, and sales channel it is sold through. Or, it is also faceted by the fact that it is put together from various products and parts from partners. In no way is a master entity unique or has a consistency depending on what is important about the entity in a given situation. What MDM provides are definitions and instructions on the right data to use in the right engagement. Context is a key value of MDM.
There was lots of feedback on the last blog (“Risk Data, Risky Business?”) that clearly indicates the divide between definitions in trust and quality. It is a great jumping off point for the next hot topic, data governance for big data.
The comment I hear most from clients, particularly when discussing big data, is, “Data governance inhibits agility.” Why be hindered by committees and bureaucracy when you want freedom to experiment and discover?
Current thinking: Data governance is freedom from risk.The stakes are high when it comes to data-intensive projects, and having the right alignment between IT and the business is crucial. Data governance has been the gold standard to establish the right roles, responsibilities, processes, and procedures to deliver trusted secure data. Success has been achieved through legislative means by enacting policies and procedures that reduce risk to the business from bad data and bad data management project implementation. Data governance was meant to keep bad things from happening.
Today’s data governance approach is important and certainly has a place in the new world of big data. When data enters the inner sanctum of an organization, management needs to be rigorous.
Yet, the challenge is that legislative data governance by nature is focused on risk avoidance. Often this model is still IT led. This holds progress back as the business may be at the table, but it isn’t bought in. This is evidenced by committee and project management style data governance programs focused on ownership, scope, and timelines. All this management and process takes time and stifles experimentation and growth.