Make data confidence index part of your BI architecture

I often see two ends of the extreme when I talk to clients who are trying to deal with data confidence challenges. One group typically sees it as a problem that IT has to address, while business users continue to use spreadsheets and other home-grown apps for BI. At the other end of the extreme, there's a strong, take-no-prisoners, top-down mandate for using only enterprise BI apps. In this case, a CEO may impose a rule that says that you can't walk into my office, ask me to make a decision, ask for a budget, etc., based on anything other than data coming from an enterprise BI application. This may sound great, but it's not often very practical; the world is not that simple, and there are many shades of grey in between these two extremes. No large, global, heterogeneous, multi-business- and product-line enterprise can ever hope to clean up all of its data - it's always a continuous journey. The key is knowing what data sources feed your BI applications and how confident you are about the accuracy of data coming from each source.

For example, here's one approach that I often see work very well. In this approach, IT assigns a data confidence index (an extra column attached to each transactional record in your data warehouse, data mart, etc.) during ETL processes. It may look something like this:

  • If data is coming from a system of record, the index = 100%.
  • If data is coming from nonfinancial systems and it reconciles with your G/L, the index = 100%. If not, it's < 100%.
  • If the data is coming from an operational application, and ETL does some form of reconciliation with MDM, systems of record, etc., but the ETL process is not responsible for the accuracy of the source transaction itself ($, #s, dates, etc.), the index = 90%.
  • If the data is coming from an operational application via a simple load with no cleansing or reconciliation, the index = 80%.
  • If the data is coming from a nonproduction source such as a spreadsheet, the index =  50%.

These are just some of the examples. You should come up with more data confidence index scenarios that work well in your particular situation (and I'd love to see your examples).

Then all BI applications (reports, dashboards) can display the index column next to each detailed transaction (some clients built this feature into standard report templates). Or, for aggregate reporting, dashboards, etc., the index can be calculated as a sum of the products of multiplying the number of transactions that correspond to each index value by the index and calculating an average. Then that value can be displayed next to each aggregate or in a report header or footer. And when you walk into the office of your CEO with such a report, he will know exactly what he is looking at and feel confident (or not) whether he can make a decision based on the information he sees in front of him, or ask for a confirmation.

What do you think? Do you have a good example of someone using such a data confidence index? What are some of the best practices, dos, and don'ts?

Comments

Well done

Boris- I certainly agree that this is the right way to think about the problem. For some reason, perhaps working since 2am on an industry proposal (:-), it made me think of a market appreciation quotient I recommended to a certain board in IT running into antitrust problems a few years back!

Suppose people can bicker about the confidence index percentages, and well you may find one or two, but I can't envision veterans taking issue with the concept. I suspect it's how most of the minds have been working in stronger BI shops all along, even if they didn't necessarily formalize-- a workbook approach might help--we used to do those for CEOs in SMEs in my consultancy years ago and they were quite popular (not on BI, but management systems). .02-MM (PS- realize Forrester offers playbooks)

drive adoption

Great points. What I also like about this is to drive adoption of trusted sources. Transparency allows choice. If you want to get to the single source of truth for say KPIs, what better way than to provide examples of reports working off non-certified sources vs those that are managed. It also allows consumers of data and insight to manage the expectation of how to use the data. Low confidence indicates insight for directional use. High confidence indicates high trust and less risk in the decision.

Take thresholds out of back office governance and allow transparency at the edge of consumption. Boris, great example of responsible use of data. Thanks for the blog.