Code coverage, anyone?

Code coverage is very scientific. It counts if the program has visited a line of code or not. So it can’t be wrong, right?

Remember it’s not a wrong metric, it’s what we do with it.

0% code coverage is not good, even I agree.

How about 100%?

Not achievable, you say? Sometimes, but let’s say we can do it. Do we want to be 100% covered? Because that means a lot of effort. Testing, like everything in the universe, goes by the 80-20 rule. That means that most of the effort will be put into the last 20% of the code. It may not be the most important code to test, because not all code is created equal.

Ok, so 100% coverage may be achievable, but costly. How about 80%? How about not letting anyone check-in code if it’s not 80% covered?

Which 80%? Does it include auto-generated code? Does it include the important, risky, bug-ridden, not reviewed code?

What does 80% (or any number) actually mean?

Not a lot by itself. If you take other considerations into it, the number’s meaning gets clearer, and that gives a better basis for decision making. That’s the problem with straight metrics – we focus on them and forget other things that can help us make better decisions.

Always ask “what does this number really mean?” and “do I need to consider something more?”. Then make decisions.

One last thing…

I’m giving a talk on Coverage Lies in the next DevCon TLV, June 20th, and in a Typemcok webinar on June 27th. If you want to learn more about these tricks, you’re invited!

Not all lines of code are created equal — a simple property (getter/setter, possibly even automatic compiler generated code) counts code points as much as that fiddly algorithm. And while it’s trivial to test that property code to 100% coverage, it’s not a good use of dev cycles to even write those tests.

My preferred approach is to combine raw coverage data with static analysis that weeds out the trivial bits of code, and declarative admissions of uncoverage, so that code is either covered, or there is a reviewable reason as to why it isn’t (e.g. generated by a system tool, is the database calling code mocked elsewhere, subject to static analysis for correctness, whatever), rather than guessing a coverage percentage and hoping.