Design

Chasing Code-Coverage Baubles

Code-coverage targets are misleading and rarely drive the right behavior

In his guest editorial earlier this month, Cédric Beust inveighed against setting code coverage targets  a position I strongly agree with. Beust's point elicited several interesting answers from readers. The general view of code coverage is probably best articulated by poster clayton7510, who wrote, "Code coverage is essentially a confidence metric with a limit that approaches 100% as a practical measure. For example, code coverage of 80% means that there is about 80% confidence that the tested source code does what the developer expects it to do. I would think that would be fine for most software. Of course as code coverage approaches 100% the question of cost/value ratio begins to approach 0. Therefore, the correct amount of code coverage depends largely on risk and cost/value acceptance."

This appears to be a balanced and sane approach, but in my view, retains the fundamental misperception that striving for a specific coverage number is a good thing as it has inherent meaning. Code coverage of 70% vs. 80% or even 90% conveys very little information to me about the quality of the code or of the thoroughness of the testing.

What does speak to me is the coverage of difficult code. A comment from dleppik554 in response to clayton7510's position gets it exactly right: "Where I am most careful to write tests is where the code was hard to write, and thus easy to get wrong, or where the user relies on the computer's accuracy, such as reading a file or applying a formula (e.g., statistical significance or interest rate)."

When examining code bases and assessing testing coverage, I zoom in on the parts that have the code with the greatest complexity. (Parsers are a typical example.) In these sections, I am very definitely looking for coverage that is far greater than 100%. I am expecting to see some execution paths tested literally dozens of times with different edge values, unusual cases, and so on. Telling me that the code is 80% or 90% covered is a useless piece of information and suggests to me the coder doesn't appreciate what he needs to do.

In 2007, Agitar a company that sells unit-test generation tools, created a now-shuttered website to which code and tests could be uploaded. It would scan methods, compute their cyclomatic complexity (CCR or McCabe), and compute what level of coverage the methods required. The software would then flag functions with insufficient test coverage. In the lower CCR numbers, it required no coverage. In methods with CCR above 28, it averred that no amount of coverage could provide sufficient testing. I think this approach gets things roughly right. (The one great impediment, which I'll save for a future editorial, is that CCR is a fallible measure of complexity.)

There is a tendency to believe that occasionally you just have to write highly complex code that will fail Agitar's guidelines because of internal complexity. In nearly all cases, refactoring is a crucial skill in reducing the complexity and increasing testability. This brings me to my second point, which is that code bases should be examined for testability. It's no secret that the more twisted the code, the less testable it is. So, when I come across dense logic that has sub-100% coverage, two questions open before me: Did the developer simply skip testing for edge cases and other important values? Or was he not capable of writing tests due to the inherent complexity of the code? In other words, it's important to not only get the coverage levels right, but to identify the reasons when they're too low. All of this, as I trust you'll agree, argues against setting specific coverage targets.

Albert Savoia, who headed Agitar and is now a senior testing guru at Google, used to point out that while coverage goals per se did not make sense, that did not mean no measures were valid. As he often went to customer sites (generally customers with very large code bases), he would get a good feel for the maturity of the testing process and for the organization's commitment to developer testing by comparing the size of the code base to the size of the unit test sets (using LOCs). The closer the latter was to the size of the former, the better the organization's commitment to testing was likely to be. In most cases, though, unit tests were far, far smaller than the corresponding code bases. I give this rule of thumb because it helps an organization do a rough self-assessment without using code coverage as a happy goal or key measure.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!