Hopefully this isn't too subjective. I am curious to know what measures to use to assess the quality of a given software release. The measures would need to be applied to multiple different projects within the organisation and so the measure have be reasonably generic.

Ideally I would like to use the measures to help move release deadlines when necessary as well as to push back to developers when I feel enough unit testing is not being done.

So far I have thought of the following that could be potentially measured:

Defect rate per hour of test time.

Defects per hour of dev time.

% of defects that are coding errors.

% of functionality with no outstanding defects (assuming the testing cycle is complete).

5 Answers
5

The problem I can see with counting defects, is that all defects are not equal. You might move a release deadline when you find a single bug - a showstopper resulting in complete data loss for the customer, but you'd be unlikely to do so for even a few dozen cosmetic bugs.

On the other hand - you can't just discount cosmetic bugs either. What if that cosmetic bug is a misspelling of your company's name, or a typo that renders a sentence unintentionally obscene, bang in the middle of your home page? Suddenly, all cosmetic defects are not equal either.

Equally, you might be "95% done" - but the remaining 5% is a critical business function, and it's been blocked by one bug after another for the last five months, and you're now having to explain to senior management that just because the percentages look great, that doesn't mean the project is in good shape and you really really need more time. You need to factor in not just how many bugs, but where they are, and where you haven't been able to test yet. If most of the app is inaccessible due to a handful of blocking bugs, or this particular project requires tedious and lengthy data setup, you might also have really low defects per tester hour.

If you're going to be basing release (or rather, not-release) decisions on these measures, you need to be very sure that the measures you use genuinely reflect what you want to measure - as we can see, mere defect counts don't do that well.

It's time to trot out my favourite paper: Kaner and Bond's "Software Engineering Metrics: What Do They Measure and How Do We Know?" Reading this, and applying the advice within will help you to evaluate the measures you choose. Poor metrics are extremely dangerous - they will become the rope used to hang you. I've worked on a team that was suffering from years of reporting on a measure that didn't reflect the risk - we got pressured to make the numbers better, even though we knew sometimes that made the testing, and the product, worse. :( It's really tough to argue against an established measure though.

So, what other options are there? Well, Gojko Adzic has listed some useful suggestions. I really like the idea of heat maps - either to measure which files are associated with most defects (it's just occurred to me that we could implement this at work by gathering the information we have on JIRA - we have subversion checkins associated to each issue, so if we can separate the issues that are bugs, we might have some interesting data. No idea how much work that will be though!), or a functional measure to show how much testing you've conducted in an area of the application compared with how much you feel is needed. The attribute-component-capability matrix is also interesting - being able to identify that there are blocking bugs in areas that are key to the business is a compelling argument.

In terms of passing a build back to developers, here are some approaches I've seen:

Identify key end-to-end paths through the app. Create a carefully selected set of acceptance tests that cover key paths through the app - for an ecommerce site, one test might be that a customer can order and checkout. ALL tests must pass before you start testing. (It can be a small set, if you pick carefully - a small set is much easier to get agreement on.) If you frequently have integration bugs that block large areas of functionality, this is worth using.

You find X number of bugs of severity Y within the first few days of testing. This one didn't work, I never saw a build returned and it led to endless hours wasted wrangling over just how severe a bug was.

More than X% of planned test cases blocked. This is one of those that seems objective, but it's actually very subjective - maybe one project has a test lead that writes lots of short simple tests, another one has a test lead who tends to write long complex tests.

I hope some of the ideas above are useful to you. This is an area I find really interesting, but unfortunately, I have a lot more questions than answers. (I think testers should approach software engineering metrics with just as much skepticism as to the systems they test).

Excellent response. I already have the Kaner paper. It is good. I doubt the metrics will be used to hang me, at least in the near future, however there is a risk that they can be misinterpreted. I think education is key here to ensure everyone knows what the metrics mean.
–
NickMay 18 '11 at 6:48

Absolutely agree that education is the key - most of the problems with metrics come when people misinterpret them as being the last word, rather than a useful but still fallible tool that helps you figure out where the problems might be lurking. And glad to hear you're not in an environment where your metrics will be used against you!
–
testerab♦May 18 '11 at 7:36

Also, talking to the team. As one of the other posts stated, not all bugs are equal in magnitude. Encourage your testers to speak up when the quality is going down the drain. They should know better than anyone or any metric.

I think SLoret is referring to measuring longer term quality by the amount of ongoing maintenance, not by cutting support. I like that as a long term measure. How do you measure the production support? Is it by the number of defects?
–
NickMay 18 '11 at 19:38

if you mean cutting down support through features and bug fixes --- yes. If you mean cutting the support budget and going to india for support or not picking up the phone --- no. I couldn't tell if you were joking or serious.
–
SLoretMay 18 '11 at 19:40

:) I should clarify that I was indeed joking. Rather, the inverse is true - happy customers means you can consider scaling back support (unless it's operating at capacity, of course.)
–
corsiKa♦Jun 24 '11 at 20:09

"If you want to build a ship, don't drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea." - Antoine de Saint-Exupery

Be wary of what you measure. By measuring something and reporting on it, you will incentive behavior for good or ill.

The measures you select will create a system that can easily be gamed. A very few bad actors will (possibly unconsciously) cook the books and cause moral problems for the people who are trying to play by the rules.

The best defect fixes are the ones that you never see. When a unit tests catches a developer early, they don't report it, they just fix the bugs. By (implicitly) rewarding testers for finding bugs you will increase your cycle time, perhaps unwittingly.

Here is what you should be measuring.

Team autonomy, technical challenge and feeling of purpose

Customer Satisfaction

The bug cycle time (from defect found to checked in and signed off)

The average feature cycle time

Agility

Your ability to ship good software is a function of the health of the organization.

This is not about rewarding testers for finding bugs. It is purely to asses the quality of individual software releases. We already have customer satisfaction surveys and these measures would hopefully be used to complement that.
–
NickMay 18 '11 at 6:53

Those are a pretty good base, but if you are focusing on unit test then you should monitor:

Code Coverage (from your unit testing)

The problem with code coverage through is that it is hard to pick a percentage that gives you true value from your unit tests. 100% coverage usually means that you are testing areas of code that have little of no risk of failure. Usually around 70% is a good number that will find most defects for value. However, you can figure that number yourself out over time, if the coverage is dropping and your finding more defects then you can ask for more unit testing. However, if the figure is high and you are not finding any more defects then you might be tempted to allow it to drop a little. It's really a trial an error process in my experience.

Code coverage only tells us "The code didn't crash just because we ran it." Useful to know, but once you reach 100% code coverage, you can't assert there are no functional bugs left in the code. That said, I would rather not begin other testing until the unit test coverage is at least 60%.
–
Dustin AndrewsJun 13 '11 at 20:58