Test Metrics

11 Test Automation Metrics and their Pros & Cons

Test automation is vital for maintaining software quality in a fast-paced Agile development environment.

However, test automation is also a huge investment. Without a way to measure the automation and its efficacy, that investment may well go to waste. Test automation metrics and KPIs provide a valuable way to determine your return on any investment, understand which parts of your test automation are and aren’t working, and improve on them.

This article explains what to watch out for in metrics, and will give you insight into the pros and cons of 11 different test automation metrics.

Challenges with Metrics

What do the metrics tell you? Test automation metrics can give a partial or misleading picture about automation and how effective it is.

Proper analysis. To get any benefit from a metric, you need to analyse it carefully to make judgements about software quality.

Irrelevant results. Metrics often include irrelevant results which skew the measurements. Examples include test failures from application changes, or data issues.

Measuring integration and acceptance tests. It’s easy to get stats about unit tests, but more complex tests require an effort to track effectively. In reality, many testing teams have good visibility of unit tests but poor visibility of other types of tests.

11 Test Automation Metrics: Pros and Cons

1. Total test duration

Total test duration measures how long it takes to run the automated tests.

Pros: Test duration is a significant metric because tests are commonly a bottleneck in the agile development cycle. With very frequent iterations to software, if tests don’t run fast, teams won’t run them at all.

Cons: Total test time doesn’t tell you anything about the quality of the tests performed, so it is not a measure of software quality.

2. Unit test coverage

Unit test coverage measures how much of the software code is covered by unit tests.

Pros: The unit test coverage metric gives a rough approximation for how well tested a software codebase is.

Cons: Unit tests are just that, a test of a single unit. All the units in a car might work perfectly, but that doesn’t guarantee the car will start. In software too, integration and acceptance tests are crucial to ensure software is functional, and unit test coverage does not take those tests into account. Furthermore, Unit test measures in most development languages only the code that is uploaded into the memory. In many cases, a meaningful portion of the code is not loaded to the memory and therefore it is not inspected, so the 100% might not represent the real code base.

3. Path coverage

The path coverage metric is a measurement of the linearly independent paths covered by the tests.

Pros: Path coverage requires very thorough testing that improves the quality of the testing process. Every statement in the program executes at least once with full path coverage.

Cons: The quantity of paths increases exponentially with the number of branches. So, adding one more if statement to a function with 11 statements changes the number of possible paths from 2048 to 4096.

4. Requirements coverage / test cases by requirement

Requirements coverage shows what features are tested, and how many tests are aligned with a user story or requirement.

Pros: This is a very important measure of the maturity of test automation, because it tracks how many of the features delivered to customers are covered by automation.

Cons: Requirements coverage is a vague metric which is difficult to quantify and difficult to measure on an ongoing basis. Test connected to a requirement might verify only a portion of the functionality, and actually provide very little value.

5. Percentage of tests passed or failed

This metric simply counts the number of tests that have recently passed or failed, as a percentage of total tests planned to run.

Pros: Counting the number of tests passed or failed gives an overview of testing progress. You can create a bar graph that shows passed test cases, failed tests, and tests that haven’t been run yet. You can compare figures across different releases and different days.

Cons: Counting test cases passed doesn’t say anything about the quality of those tests. For example, a test might pass because it checks a trivial condition, or because of an error in the test code, while the software itself is not functioning as desired. Also, this metric does not tell us what percentage of the software is actually covered by tests.

6. Number of defects found in testing

A measure of the number of valid defects encountered during the test execution phase.

Pros: The number of defects found is a simple measure of “how bad” a software release is compared to previous releases. The number of defects found is useful for predictive modelling, in which you can estimate the residual defects expected under certain coverage levels

Cons: This is a highly misleading metric, which can also be easily manipulated. A higher number of bugs might be a result of more comprehensive testing, but it could also mean the opposite. For example, a testing team rewarded by this metric, might be driven to discover many defects that do not have major significance.

7. Percentage automated test coverage of total coverage

This metric reports on the percentage of test coverage achieved by automated testing, as compared to manual testing. You calculate it by dividing automated coverage by total coverage.

Pros: This metric can be used by management to assess the progress of a test automation initiative.

Cons: A larger percentage of tests which are automated can hide test quality issues. Are the new automated tests as effective in discovering defects as the old manual tests?

8. Test execution

Test execution is a common metric displayed by test automation tools, which shows total tests executed as part of a build.

Pros: Test execution is a crucial statistic to understand if automated tests ran as expected and aggregate their result.

Cons: Because tests can have false positives and false negatives, the fact that tests ran, or that a certain percentage passed, does not guarantee a quality release.

9. Useful vs irrelevant results

This is a metric that compares useful results from automated tests against irrelevant results. The distinction between useful and irrelevant results is as follows:

Useful results: Either a test pass or a test failure. The test failure must be caused by a defect.

Irrelevant results: Test failures resulting from changes to the software or problems with the testing environment.

Pros: Irrelevant results highlight factors that reduce the efficiency of automation from an economic standpoint. You can compare irrelevant results with useful results with reference to a defined acceptable level. When the rate of irrelevant results is too high, you can investigate and get a better understanding of what has gone wrong, in order to improve automated testing.

Cons: This metric does not teach us about software quality, it can only be helpful in understanding problems in the tests themselves.

10. Defects in production

Many agile teams use this metric as the “bottom line” of automated testing efficiency – how many serious problems were found in production after the software was released.

Pros: Defects in production can expose holes in the test automation suite, and you can add automated tests that will help catch similar defects in future.

Cons: Many serious problems do not manifest themselves as defects in production. Also, it is undesirable for defects to manifest in production at all. This metric is a “last resort”, but teams should aim to discover defects much earlier in their development cycle.

11. Percentage of broken builds

In an agile development process, automated tests can “break” the build if they fail. This metric measures how many builds were broken because automated tests failed, and by extension, the quality of code committed by engineers to the shared codebase.

Pros: The percentage of broken builds is often taken as a signal of sound engineering practices and code quality. A decreasing percentage of broken builds indicates engineers are taking more responsibility for the accuracy and stability of their code.

Cons: Focusing on this metric can lead to “finger pointing” and a reluctance of developers to commit to the main branch. This causes defects to manifest themselves much later in the development cycle, with negative consequences.

A Complete Picture of Automated Tests

These 11 metrics represent a small sample of many possible automated test metrics. As we have seen throughout the discussion, while metrics are essential for tracking and understanding test automation, each of them shows an incomplete and sometimes misleading picture.