Now I know people could consider this question duplicate or asked many times, in which case I would appreciate a link to relevant questions with answer to my question.

I have been recently in disagreement with some folks about code coverage. I have a group of people who want our team to drop looking at code coverage altogether based on the argument that 100% coverage does not mean good quality tests and thus good quality code.

I have been able to push back by selling the argument that Code Coverage tells me what has not been tested for sure and help us focus on those areas.

The argument from these folks is - then team would react by quickly creating low quality tests and thus waste time while adding no significant quality.

While I understand their point of view, I am searching for a way to make a more robust case for code coverage by introducing more robust tools/frameworks that take care of more coverage criteria(Functional, Statement,Decision, Branch, Condition, State, LCSAJ, path, jump path, entry/exit, Loop, Parameter Value etc).

What I am looking for is suggestion for a combination of such code coverage tools and practices/processes to go with them which can help me counter such arguments while feeling comfortable about my recommendation.

I would also welcome any accompanying comments/suggestions based on your experience/knowledge on how to counter such an argument, because while subjective, code coverage has helped my team be more conscious of code quality and value of testing.

Edit: To reduce any confusion about my understanding of weakness of typical code coverage, I want to point out that I am not referring toStatement Coverage(or lines of code executed) tools(there are plenty). In fact here is a good article on everything that is wrong with it: http://www.bullseye.com/statementCoverage.html

I was looking for more than just statement or line coverage, going more into multiple coverage criteria and levels.

The idea is that if a tool can tell us our coverage based on multiple criteria then that becomes a reasonable automated assessment of test quality. I by no means am trying to say that line coverage is a good assessment. In fact that's the premise of my question.

Edit:
Ok, maybe I projected it a bit too dramatically, but you get the point. The problem is about setting processes/policies in general across all teams in a homogeneous/consistent fashion. And the fear is general that how do you ensure quality of tests, how do you allocate guaranteed time without having any measure to it. Thus I like having a measurable feature that when backed up with appropriate processes and the right tools would allow us to improve code quality while knowing that time is not being force spent in wasteful processes.

EDIT: So far what I have from the answers:

Code reviews should cover tests to ensure quality of tests

Test First strategy helps avoid tests that are written after the fact to simply increase coverage %

Thanks. And I already know and acknowledge that. And also people tend to associate 100% code coverage with 100% Statement(or line) coverage typically. Also could not resist - Jimmy, they were looking for you all over the place.
– MickJJun 20 '13 at 17:47

3

"then team would react by quickly creating low quality tests and thus waste time while adding no significant quality" - great team!
– Piotr PerakJun 20 '13 at 20:26

Ok, maybe I projected it a bit too dramatically, but you get the point. The problem is about setting processes/policies in general across all teams in a homogenous/consistent fashion. And the fear is general that how do you ensure quality of tests, how do you allocate guaranteed time without having any measure to it. Thus I like having a measurable feature that when backed up with appropriate processes and the right tools would allow us to improve code quality while knowing that time is not being force spent in wasteful processes.
– MickJJun 20 '13 at 20:32

10 Answers
10

In my experience, code coverage is as useful as you make it. If you write good tests that cover all of your cases, then passing those tests means that you have met your requirements. In fact that's the exact idea that Test Driven Development uses. You write the tests before the code without knowing anything about the implementation (Sometimes this means another team entirely writes the tests). These tests are set up to verify that the final product does everything that your specifications says it done, and THEN you write the bare minimum code to pass those tests.

The problem here, obviously, is that if your tests aren't strong enough, you will miss edge cases or unforeseen problems and write code which doesn't truly meet your specifications. If you are truly set on using tests to verify your code, then writing good tests is an absolute necessity, or you're really wasting your time.

I wanted to edit the answer here as I realized that it didn't truly answer your question. I would look at that wiki article to see some stated benefits of TDD. It really comes down to how your organization works best, but TDD is definitely something in use in the industry.

+1 for the suggestion that writing tests first improves quality of tests so a combination of Test first with Code Coverage is definitely helpful.
– MickJJun 20 '13 at 15:35

Yeah, I've always found that if you write tests after you've developed the code, you'll only test for things you know your code will pass, or you'll write the tests in a way that compliments the implementation, which really doesn't help anyone. If you write your tests independently of the code, you focus on what the code should do rather than what it does do.
– AmptJun 20 '13 at 15:36

Thanks @Ampt. Apart from reinforcing process with TDD, are there any code coverage tools that you recommend, which take care of more coverage criteria in a more exhaustive fashion thereby helping to validate quality of the tests written at least to a some extent?
– MickJJun 20 '13 at 16:33

I may be understanding you incorrectly, but are you suggesting that different tools will tell you different coverage for your tests? It's always been my experience that coverage tools just watch the tests run and record which lines of code get executed. Switching tools should then have no impact on coverage, as the number of lines executed remains the same. I would be wary of a tool that gives more coverage for the same test. That said, I don't feel that hitting every line of code is a good assessment of test quality as it is thoroughness. Good tests come from good requirements.
– AmptJun 20 '13 at 16:50

Thanks. What you are referring to is Statement Coverage(or lines of code executed). I was looking for more than just statement or line coverage, going more into multiple coverage criteria and levels. See: en.wikipedia.org/wiki/Code_coverage#Coverage_criteria and en.wikipedia.org/wiki/Linear_Code_Sequence_and_Jump. The idea is that if a tool can tell us our coverage based on multiple criteria then that becomes a reasonable automated assessment of test quality. I by no means am trying to say that line coverage is a good assessment. In fact that's the premise of my question.
– MickJJun 20 '13 at 17:04

Once you have that (or something similar) in place, you can start to look at your own data more closely:

are more bugs found in poorly covered projects?

are more bugs found in poorly covered classes/methods?

etc.

I'd expect that your data will support your position on code coverage; that has certainly been my experience. If it doesn't, however, then maybe your organization can succeed with lower code coverage standards than you'd like. Or maybe your tests aren't very good. The task will hopefully focus effort on producing software with fewer defects, regardless of the resolution of the code coverage disagreement.

I really like this answer. Thanks for the tools suggestions and more importantly, I like the idea of data backed approach to justify code coverage. Although I tend to take the teams word on value of it and would not question that anyway. It can possibly help me build a more solid case for our experience so far. Thanks!
– MickJJun 21 '13 at 13:39

Because they know the functionality of the code is the bottom-line, so: testing, documenation, comments, reviews, etc. can be sacrificed without any immediate consequences; although I agree, it is a bad sign.
– JeffOJun 20 '13 at 20:28

Ok, maybe I projected it a bit too dramatically, but you get the point. The problem is about setting processes/policies in general across all teams in a homogenous/consistent fashion. And the fear is general that how do you ensure quality of tests, how do you allocate guaranteed time without having any measure to it. Thus I like having a measurable feature that when backed up with appropriate processes and the right tools would allow us to improve code quality while knowing that time is not being force spent in wasteful processes.
– MickJJun 20 '13 at 20:30

Ok, maybe I projected it a bit too dramatically, but you get the
point. The problem is about setting processes/policies in general
across all teams in a homogeneous/consistent fashion.

I think that's the problem. Developers don't care (and often for excellent reasons) about consistent or global policies, and want the freedom to do what they think is right rather than comply to corporate policies.

Which is reasonable unless you prove that global processes and measures have value and a positive effect on quality and speed of development.

+1 I have seen that way too many times to disagree. Im my case though conversation goes a bit this way. dev: hey, look - I added code coverage metrics to our dashboard, ain't that great? manager: sure, anything that you guys think improves quality is great. Managers boss' boss: I think we need to have one process across teams. I think code coverage is pointless unless we can gurantee value from the cost spent.
– MickJJun 20 '13 at 21:06

In my experience, there's a few things to combine with code coverage to make the metric worthwhile:

Code Reviews

If you can punt bad tests back to the developer, it can help limit the number of bad tests that are providing this meaningless coverage.

Bug Tracking

If you have a bunch of code coverage on a module, but still get many/severe bugs in that area, then it might indicate a problem where that developer needs improvement with their tests.

Pragmatism

Nobody is going to get to 100% with good tests on non-trivial code. If you as the team lead look at the code coverage, but instead of saying "we need to get to N%!" you identify gaps and ask people to "improve coverage in module X" that achieves your goal without providing people an opportunity to game the system.

Blocks Covered/# of Tests

Most code coverage tools list blocks covered vs blocks not covered. Combining this with number of actual tests lets you get a metric indicating how 'broad' tests are, either indicating bad tests or coupled design. This is more useful as a delta from one sprint to another, but the idea is the same - combine code coverage with other metrics to gain more insight.

+1 Good suggestions, I especially like the Blocks Covered/# of tests and code reviews. Although we already do code reviews, it would be helpful to stress the importance of reviewing tests themselves more closely.
– MickJJun 20 '13 at 17:51

There are many practices that have received a lot of attention recently because they can bring benefits to software development. However, some developers apply those practices blindly: they are convinced that applying a methodology is like executing an algorithm and that after performing the correct steps one should get the wanted result.

Some examples:

Write unit tests with 100% code coverage and you will get better code quality.

Apply TDD systematically and you will get better design.

Do pair programming and you will improve code quality and reduce development time.

I think the basic problem with the above statements is that humans are not computers and writing software is not like executing an algorithm.

Unit tests catch lots of errors and code coverage indicates which parts of the code are tested, but testing trivial things is useless. For example, if by clicking on a button the corresponding dialog opens up, the whole logic sending the button event to the component that opens the dialog can be tested by a simple manual test (click on the button): does it pay off to unit test this logic?

While TDD is a good design tool, it does not work well if the developer has a poor understanding of the problem domain (see e.g. this famous post).

Pair programming is effective if two developers can work together, otherwise it is a disaster. Also, experienced developers may prefer to briefly discuss the most important issues and then code separately: spending many hours discussing lots of details that they both already know can be both boring and a big waste of time.

Going back to code coverage.

I have been able to push back by selling the argument that Code
Coverage tells me what has not been tested for sure and help us focus
on those areas.

I think you have to judge from case to case if it is worthwhile to have 100% coverage for a certain module.

Does the module perform some very important and complicated computation? Then I would like to test every single line of code but also write meaningful unit tests (unit tests that make sense in that domain).

Does the module perform some important but simple task like opening a help window when clicking on a button? A manual test will probably be more effective.

The argument from these folks is - then team would react by quickly
creating low quality tests and thus waste time while adding no
significant quality.

In my opinion they are right: you cannot enforce code quality by only requiring 100% code coverage. Adding more tools to compute the coverage and make statistics will also not help. Rather, you should discuss which parts of the code are more sensitive and should be tested extensively and which ones are less error-prone (in the sense that an error can be discovered and fixed much more easily without using unit tests).

If you push 100% code coverage onto the developers, some will start to write silly unit tests to fulfill their obligations instead of trying to write sensible tests.

how do you allocate guaranteed time without having any measure to it

Maybe it is an illusion that you can measure human intelligence and judgment.
If you have competent colleagues and you trust their judgment, you can
accept when they tell you "for this module, increasing the code coverage will bring very little benefit. so let's not spend any time on it" or, "for this module we need as much coverage as we can get, we need one extra week to implement sensible unit tests.".

So (again, these are my 2 cents): do not try to find a process and set parameters like code coverage that must fit all teams, for all projects and for all modules. Finding such a general process is an illusion and I believe that when you have found one it will suboptimal.

All true and I am already in agreement. If you notice I do not support 100% code coverage. Its about improving its value by use of bettet techniques, tools and processes. It also helps to fully understand the code coverage criteria(most assume this to be line/statement). +1 for your excellent post.
– MickJJun 20 '13 at 21:47

"team would react by quickly creating low quality tests and thus waste time while adding no significant quality"

This is a real risk, not just theoretical.

Code overage alone is a dysfunctional metric. I learned that lesson the hard way. Once, I emphasized it without the availability of balancing metrics or practices. Hundreds of tests that catch and mask exceptions, and without assertions is an ugly thing.

"suggestion for a combination of such code coverage tools and practices/processes to go with them"

As you note, lack of code coverage is readily identifiable as a software quality risk. I teach that code coverage is a necessary, but insufficient, condition for software quality. We have to take a balanced scorecard approach to managing software quality.

I have always found that code coverage is easily susceptible to the Hawthorne Effect. This caused me to ask "why do we have any software metrics at all?" and the answer usually is to provide some high level understanding of the current state of the project, things like:

"how close are we to done?"

"how is the quality of this system?"

"how complicated are these modules?"

Alas, there will never be a single metric that can tell you how good or bad the project is, and any attempt to derive that meaning from a single number will necessarily over simplify. While metrics are all about data, interpreting what they mean is a much more emotional/psychological task and as such probably cant be applied generically across teams of different composition or problems of different domains.

In the case of coverage I think it is often used as a proxy for code quality, albiet a crude one. And the real problem is that it boils down an awfully complicated topic to a single integer between 0 and 100 which will of course be used to drive potentially unhelpful work in an endless quest to achieve 100% coverage. Folks like Bob Martin will say that 100% coverage is the only serious goal, and I can understand why that is so, because anything else just seems arbitrary.

Of course there are lots of ways to get coverage that dont actually help me get any understanding of the codebase - e.g. is it valuable to test toString()? what about getters and setters for immutable objects? A team only has so much effort to apply in a fixed time and that time always seems to be less than the time required to do a perfect job, so in absence of perfect schedule we have to make do with approximations.

A metric I have found useful in making good approximations is Crap4J. It is now defunct but you can easily port/implement it yourself. Crap4J to attempts to relate code coverage to cyclomatic complexity by implying that code which is more complicated (ifs, whiles, fors etc.) should have higher test coverage. To me this simple idea really rang true. I want to understand where there is risk in my codebase, and one really important risk is complexity. So using this tool I can quickly assess how risky my code base is. If it is complicated the coverage had better go way up. If it isn't I dont need to waste time trying to get every line of code covered.

Of course this is but one metric and YMMV. You have to spend time with it to understand if it will make sense to you and if it will give your team a reasonably grok-able feeling of where the project is at.

Thanks for great suggestion of using cyclomatic complexity for picking code deserving coverage and Crap4J link. I also found a great article talking about squeezing awesomeness of crap4j into cobertura - schneide.wordpress.com/2010/09/27/…
– MickJJul 8 '13 at 20:59

I wouldn't say that going back and covering existing code is the best route forward. I would argue that it makes sense to write covering tests for any new code you write and or any code you change.

When bugs are found, write a test that fails because of that bug and fix the bug so that the test turns green. Put in the comments of the test what bug it's written for.

The goal is to have enough confidence in your tests that you can make changes without concern for unexpected side effects. Check out Working Effectively with Legacy Code for a good summary of approaches to taming untested code.