When 100% coverage gives us a false sense of security

by Алекс Руис on April 12, 2010

I use code coverage tools on a regular basis only to ensure that the most complex areas of the code base are properly tested. I tend to bravely fight the temptation to get excited by this metric and waste valuable time adding meaningless tests (e.g. getters and setters) just to keep the numbers going up. I fell into temptation once, and I got in trouble.

While working on the latest version of the FEST’s JavaFX Maven plugin, I reached an impressive 100% code coverage that made me feel pretty confident that the plugin worked as expected. The truth is that I almost released this plugin without realizing it didn’t work at all.

100% code coverage may sound fantastic, but in this case it gave me a false sense of security. The underlying problem was not the number of tests or the coverage tool itself, but the assumptions made when writing those tests. The test suite consisted exclusively of unit tests testing classes in isolation, using mock or stub implementations of their collaborators. I worked hard to have it cover every single line of code I wrote, verifying that my assumptions of how the program should work were correct. In fact, the test results clearly reflected that the code was doing what I expected it to do.

The test suite didn’t include any functional tests, because I couldn’t figure out how to automate a test involving a command-line tool (Maven.) I originally thought it was an overkill to go into Maven internals and see how everything works, since the plugin I’m writing is pretty simple. I was actually being lazy.

Once I looked at the code coverage report, I felt pretty good about the health of the code base. I assumed everything will work, and I was ready to release. Just for curiosity, I launched a command prompt and typed “mvn fest-javafx:compile” expecting it to succeed. Sadly, it never worked.

After reading my code I finally realized the problem. I had two instance variables with the same name, just different case. One was JavaFXHome, a string that receives the value of the JavaFX home directory specified by a user (and injected by Maven.) The other one was javaFxHome, an instance of the class that figures out the path of the JavaFX home directory if it is not specified by the user. Maven was getting confused by the variable names and was trying to inject a value in the wrong variable. This failure proves that I made a wrong assumption of how Maven works, and my test suite didn’t cover it.

This functional test, even thought is manual, saved me from releasing faulty software.

Lesson learned

This was an interesting experience, and I got to learn something useful:

a high percentage of code coverage does not reflect the health of the code base

unit tests only verify that the programmer’s assumptions are correct

functional tests are a must to ensure that an application, as a whole, works as expected in front of the user1

I do agree that attaining a 100% coverage is no guarantee that your code interfaces well with the rest of the world. But it does tell you that your code adheres to a certain contract.

This contract is the reason I often write test for getters and setters. I make assumptions in my code about the arguments my objects receive i.e. being non-null. By testing these conditions I make my assumptions explicit.

good point! I’ve been thinking about it and even if the getters and setters are generated by an IDE, it’d be a good idea to have tests for them, as a safety net, just in case somebody changes their implementation and breaks the contract. I’ll update the blog entry shortly.

I think testing getters and setters is a good idea; however, testing them manually instead of reflectively (with a marker annotation or some such for anything you don’t want automatically tested) seems like a waste of time and effort.

Great points, Alex. 100% code coverage is nice in that you can be quite confident that a disturbance in underlying system behavior will be caught — but in the controlled, comfortable environs of a unit test. You need other tests too, ones that more accurately mimic the environment in which the app/library will be used. You just don’t want all your tests to be that way, otherwise you’ll tend not to want to run them all in response to changes.

I really like your lessons learned, more programmers should pay attention to them. The current fad is to believe that unit test equates to testing everything, but as you correctly point out, just because the pieces are working by themselves doesn’t mean that they will work together, or that they are even the pieces that the user really wanted. Sometimes unit tests can be useful, sometimes not.

100% code coverage does not guarantee that your code is defect free, but that doesn’t mean it is a wasted effort. Having the coverage is still better than not having it, unless you fall into the trap of having a false sense of security and fail to do proper integration testing.

Another thing I learned about code coverage is that code that is executed during a test isn’t necessarily code that is tested. If your classes are tightly coupled, then you might find that certain code is only executed and never actually targeted by a unit test. The code is still “covered” in the sense that it is executed during one or more unit tests, but it’s still untested code.

One aspect of applications (especially those JEE and Spring based) that is very hard to test: transaction propagation and persistence session management. Because unit tests should (from their definition) test only single unit (class), it is impossible to test proper transaction demarcation across many units (classes, EJBs, etc.)

Also unit tests won’t protect you from lazy initialized exceptions. Unit testing Wicket web pages is great, although doing so with mocked DAOs and services will never expose this type of bugs.

Functional testing is a must, web testing frameworks and Fitnesse are great tools for automating that.

Very sensible article. I’m not impressed that Andy Glover (or anyone else) made the point first, though of course I appreciate the link. A good blog is a journal of discovery. Its key virtue is honesty. Nice work.

As we increasingly depend for our own correctness on the correctness of big lumps of foreign code like Maven, like hundreds of other examples, Tests Take Too Long is a real issue – it’s not to strong to call it a blind spot – I hope you’ll find time to explore.