Flaws in coverage measurement

Coverage testing is a great way to find out what parts of your code are not
tested by your test suite. You turn on coverage.py,
then run your tests. At the end, coverage can show you which lines were never
executed, either by line number or visually in an annotated source file.

When your test coverage is less than 100%, coverage testing works well: it points you
to the lines in your code that are never run, showing the way to new tests to
write. The ultimate goal, of course, is to get your test coverage to 100%.

But then you have problems, because 100% test coverage doesn’t really mean
much. There are dozens of ways your code or your tests could still broken, but
now you aren’t getting any directions. The measurement coverage.py provides is
more accurately called statement coverage, because it tells you which statements
were executed. Statement coverage testing has taken you to the end of
its road, and the bad news is, you aren’t at your destination, but you’ve run out
of road.

By way of illustration, here are a few examples of 100% statement coverage of buggy code.

Combinations of paths

With multiple branches in a function, there may be combinations that aren’t
tested, even though each individual line is covered by a test:

Although we have 100% coverage, we never found out that due to a typo,
the second condition on line 3 will divide by zero.

Conditionals can also be hidden inside functions that aren’t being measured
in the first place.

deffix_url(u):# If we're an https url, make it http.returnu.replace('https://','xyzzyWRONG:')

# 100% coverage:fix_url('http://foo.com')=='http://foo.com'

The replace method here is essentially a big if statement on the condition that
the string contains the substring being replaced. Our test never takes
that path, but the if is hidden from us, so our coverage testing doesn’t
help us find the missed coverage.

Incomplete tests

Just because your tests execute the code doesn’t mean they properly test
the results.

defmy_awesome_sort(l):# Magic mumbo-jumbo that will sort the list (NOT!)l.reverse()returnl

Here our “sort” routine passes all the tests, and the coverage is 100%. But,
oops, we forgot to check that the list returned is really sorted.

Real world

Of course, these examples are absurd. It’s easy to see where we went wrong
in each of them. Most likely, though, your tests have the same underlying problems,
but in ways that are much more difficult to find.

Improved tools could help some of these cases, but not all. Some C-based tools
provide branch analysis that could help with the path problems above.
But no tool can guarantee there aren’t path problems (what if a loop works incorrectly if
executed a prime number of times?), and
no tool will point out that your tests aren’t checking the important things
about results.

For more on the problems of coverage testing, the wikipedia article on
Code Coverage has a
number of fine jumping-off points. Cem Kaner has a depressingly exhaustive
overview of the Measurement of the Extent of Testing.
After perusing it, you may wonder why you bother with puny statement coverage testing
at all!

Statement coverage testing is a good measure of what isn’t being tested in
your code. It’s a good start for understanding the completeness of your tests. Brian Merick’s
How to Misuse Code Coverage sums it
up best: “Coverage tools are only helpful if they’re used to enhance thought, not replace it.”

True. Moreover, there's no way to to measure which variables are used at all. I was concerned that I was spending effort creating and updating a number of unneeded self. variables. Coverage.py told me that I am most certainly executing all code that does so. :-) But it would be nice to eliminate that code if the variables are not used.

No disparagement implied. Great program and great examples of "things that can go wrong." But, in case someone knows, is there a Py tool that will flag unused variables -- and perhaps even unused portions of data structures?

I believe another example of a limitation of coverage testing is lambdas. The body of a lambda is not treated as a "line" for coverage purposes. Coverage is noted when the lambda is defined, but the body of the lambda may never be executed, so an error embedded in a lambda may not be found despite "100%" (in lines) test coverage.

Add a comment:

Name:

Email:

Ignore this:

Leave this empty:

Web site:

Name is required. Either email or web are required.
Email won't be displayed and I won't spam you.
Your web site won't be indexed by search engines.