I often see two ends of the extreme when I talk to clients who are trying to deal with data confidence challenges. One group typically sees it as a problem that IT has to address, while business users continue to use spreadsheets and other home-grown apps for BI. At the other end of the extreme, there's a strong, take-no-prisoners, top-down mandate for using only enterprise BI apps. In this case, a CEO may impose a rule that says that you can't walk into my office, ask me to make a decision, ask for a budget, etc., based on anything other than data coming from an enterprise BI application. This may sound great, but it's not often very practical; the world is not that simple, and there are many shades of grey in between these two extremes. No large, global, heterogeneous, multi-business- and product-line enterprise can ever hope to clean up all of its data – it's always a continuous journey. The key is knowing what data sources feed your BI applications and how confident you are about the accuracy of data coming from each source.

For example, here's one approach that I often see work very well. In this approach, IT assigns a data confidence index (an extra column attached to each transactional record in your data warehouse, data mart, etc.) during ETL processes. It may look something like this:

  • If data is coming from a system of record, the index = 100%.
  • If data is coming from nonfinancial systems and it reconciles with your G/L, the index = 100%. If not, it's < 100%.
  • If the data is coming from an operational application, and ETL does some form of reconciliation with MDM, systems of record, etc., but the ETL process is not responsible for the accuracy of the source transaction itself ($, #s, dates, etc.), the index = 90%.
  • If the data is coming from an operational application via a simple load with no cleansing or reconciliation, the index = 80%.
  • If the data is coming from a nonproduction source such as a spreadsheet, the index =  50%.

These are just some of the examples. You should come up with more data confidence index scenarios that work well in your particular situation (and I'd love to see your examples).

Then all BI applications (reports, dashboards) can display the index column next to each detailed transaction (some clients built this feature into standard report templates). Or, for aggregate reporting, dashboards, etc., the index can be calculated as a sum of the products of multiplying the number of transactions that correspond to each index value by the index and calculating an average. Then that value can be displayed next to each aggregate or in a report header or footer. And when you walk into the office of your CEO with such a report, he will know exactly what he is looking at and feel confident (or not) whether he can make a decision based on the information he sees in front of him, or ask for a confirmation.

What do you think? Do you have a good example of someone using such a data confidence index? What are some of the best practices, dos, and don'ts?