How do we use code coverage for Unity?

First and foremost code coverage is an ambitious thing and it should be used with caution. It would be an error to state that it measures quality of the test suites. You can easily write a test suite that doesn’t contain a single assertion, but produces 100% coverage. Code coverage gives us insight into the thoroughness of our test suites. In conjunction with other metrics this is extremely useful and valuable information.

Our first goal has been to see how much C++ code is exercised by our current test suites. We evaluated a lot of different C++ code coverage tools and eventually chose BullseyeCoverage. Reasons:

We love what it measures (see below).

Easy to integrate into our build system

Wide platform support

Licensing and pricing model that fits our needs

Good support quality

What do we measure

Statement coverage

The most widely known code coverage metric is probably statement coverage. We don’t use it, but a small example is still in place:

int* p = NULL;

if (condition) {

p = &variable;

}

*p = 123;

If “condition” is true we have 100% statement coverage meaning that each statement will be executed. But, is 100 % statement coverage really enough to test this code? How DO we thoroughly test it? The obvious answer is that we need at least two test cases, one for when the condition is true, another for when it is false.

Decision coverage

If the condition in the example above is evaluated to both ‘true’ and ‘false’ we tell that we have 100% decision coverage. Boolean expressions can, however, be more complex than a single variable. Decision coverage does not take this into consideration. This leads us to condition coverage.

Condition coverage

Consider the following boolean expression:

If (c1 && c2 && c3) {

statement 1;

} else {

statement 2;

}

This metric yields 100% condition coverage if all 2^3 combinations of c1, c2 and c3 have been evaluated to true and false. c1 , c2, c3 are conditions, while (c1 && c2 && c3) is the decision.

However, it’s important to note that 100% condition does not imply 100% decision coverage, and vice versa, as the following example shows:

If (c1 && c2) {

}

In this case setting c1=true, c2=false and c1=false, c2=true will satisfy condition coverage, but not decision coverage.

Condition/Decision Coverage

BullseyeCoverage measures a combination of condition and decision coverage (abbreviated C/D coverage in the following sections).

Function Coverage

We are also measuring function coverage which simply tells if each function has been invoked.

How do we use it

Track coverage over time

Having C/D and function coverage setup for C++, we track how they are changing over time, for each test suite and for all test suites combined. An example is shown below:

On X-axis revision dates, on Y-axis – percentage.

We will notice ups and downs in coverage immediately. The reasons why it can go down are:

New code without tests has been added

Test suites have been changed (test cases moved to other suites, deleted, etc.)

To identify missing coverage

When we have the coverage data it is very easy to see which parts of the Unity codebase are covered with tests:

This report gives nice coverage overview per area. It is available for each suite and for summary aggregated report, which we call ‘overall’ suite.

To optimize test suites without losing coverage

We have various test suites:

Native

Integration

Graphics

Runtime

others

One of our goals moving forward is to build a minimal and fast subset of tests from each suite preserving high coverage. For native tests this will be trivial (they are practically unit tests), but for other suites we need to be smart about it and here code coverage is an extremely valuable input, as it gives us insight into exactly which code paths a test is exercising – and what we lose by cutting it.

When converting slow integration tests to fast native tests

When converting selected high level tests to native C++ unit test, we can rely on coverage data to ensure that we exercise the same code before and after conversion.

To improve test- and code reviews

Developers and testers can use coverage data as one of the inputs when reviewing code and tests. It may give some unexpected insight. Our code coverage solution can be used when doing manual testing as well, to see exactly what parts of the codebase have been exercised.

Future steps

Moving ahead, an interesting challenge for us is to build a correspondence between test and code exercised by test. This will give us exciting possibilities:

Find tests which exercise the same code. Such tests are definitely subjects for review and possibly for elimination

Incremental coverage – how much coverage newly added code has

Analyze changes and run only tests which are related to these changes.

Track coverage for managed code.

We’re sure that having a solid code coverage solution in place will give us some very interesting insights and possibilities. We will share our findings as we move along.

2 코멘트

From my understanding boundary-interior coverage is subset of path testing, which limits loops, because loop can produce unbound number of paths. This is definitely very good metric. One of the disadvantage is that number of possible paths grows exponentially with number of conditions. On the other hand if you keep your code clean: all functions are small and there are no nested loops and conditions inside loops – C/D coverage is more than enough.
There are also many other advantages of keeping code clean :). So C/D coverage looks like a good compromise for our conditions.

Mutation testing is another interesting tool on the table that worth thinking about. If you follow TDD rules then
“You are not allowed to write any production code unless it is to make a failing unit test pass”. So you always start with a mutant :). See more here: http://butunclebob.com/ArticleS.UncleBob.TheThreeRulesOfTdd . Also As a part of code reviews we ask a person who wrote code to fail test by modifying code. This could be called ‘mutation’ :)

There is always ideal world and real world and part of everyday job is to find a good compromise between them.

that’s a very interesting and insightful article! When you listed the different code coverage criteria, is there a reason why you excluded – for example – boundary interior coverage? Or Mutant Testing? On my University, those were presented as the “latest hot stuff” in testing, but maybe they have some severe drawbacks our Professors didn’t tell us – that’s why I’m asking :)