Metrics, Metrics, Metrics – What Do Benchmarks Tell Us Anyway?

A common inquiry request to Forrester is asking for benchmarks for quality.  Testing groups are struggling to figure out how well they’re doing and if the processes they’re fighting for are making a difference.   QA value can be hard to define and to prove if development and project teams don’t regularly collect metrics to measure productivity or cost improvements.  Because of this, QA managers are often on the short end of the stick to justify their existence.  A familiar story to any QA manager (and I was one once myself) is the one where during a planning meeting, an IT executive turns to him or her and says, “You guys sure are expensive and yet we still have bugs.  Why should I invest anymore?”  That then sends the manager scurrying to find out the best way to measure their effectiveness.  Without having a baseline or a ready understanding of current measures, one of the most visible ways is to compare your organization to industry benchmarks.   Typical requests include:

  • Defect density
  • Average cost of defects
  • Average cost to repair defects
  • Defect removal efficiency
  • Defect detection by phase
  • Defect origin
  • Defect density

There are a number of studies out there that provide “benchmark” data that tells us the average number of defects in a project (but doesn’t tell us the scope of the projects surveyed), the average defect density and the average defect by phase.  All of this data is interesting and it provides good bullets on a slide when communicating the importance of testing and quality assurance practices, but what it doesn’t tell us is just as important, if not more so.

Industry benchmarks look across a wide number of projects, but can they be compared to your projects?    Many benchmarks, especially those published by Capers Jones provide great information, but they are based on function points.  Agreed, function points provide a very measurable way to determine scope and size of your applications, but how many organizations have standardized on function points to benchmark their applications?  Industry benchmarks should not be looked at for direct comparison, but as a guide, a way to provide a framework for looking at how you should measure your effectiveness. 

To truly measure effectiveness, it’s not an overnight process – and you must measure by what matters to your organization – are fewer defects being released? Are your customers or stakeholders happy with the quality?  Are you cutting down on your rework?  Are you finding defects earlier in the cycle? Those measures are ones that should build your baseline. Then, you can start applying context through cost alignment to defect detection and repair.  Benchmarking against the industry is helpful but you have to be realistic that it’s not an apples to oranges comparison. Benchmarking against yourself is the most effective way.

Comments

re: Metrics, Metrics, Metrics – What Do Benchmarks Tell Us Anyw

Margo,

It seems to me that you mix up the intrinsic quality of a software and the performance of a project. They are 2 different area of research I would say. The most difficult one in my opinion is to assess and benchmark the quality of a given software as quality is a very subjective matter.

I've tried to come up with my own model which works for our organization. It's far from being perfect but it does the job for us.

http://www.fredberinger.com/software-quality-metrics-and-model/

Cheers,

re: Metrics, Metrics, Metrics – What Do Benchmarks Tell Us Anyw

Margo, some metrics are important - like how much money am I making vesus spending. In terms of software quality one can clearly identify defects and measure them at the various phases and look at how to improve the ratio of effort to bugs. In principle this is irrelevant like so much other data being gathered and compared.

Looking at it from the ultimate goal of subjective quality you don't want to be coding at all then you won't have any SW bugs. Most and the worst errors are analysis/design errors. Our users want to have a solution where they can adapt it to needs without analysis and coding. I rather have software that is buggy but does what I need, rather than software that wastes my time perfectly!

The expectation that software ought to error free because neither production nor developer engineers want to be talking to users has really hurt the innovation cycle in businesses. On the Internet we do accept that things don't work and we try again. Look at all the great stuff that comes out of that much more dynamic model!

It is bit like looking at the quality of spelling and grammar when judging literature than how much the story grips my emotions. Yes, I rather have less faulty products, but a lot of people drive fairly buggy cars either for the fun of it (lets say a Ferrari or a classic) or because they can't afford better ones. And what is wrong with that?