In the first blog entry of this two part series on common testing problems, I addressed the fact that testing is less effective, less efficient, and more expensive than it should be. This second posting of a two-part series highlights results of an analysis that documents problems that commonly occur during testing. Specifically, this series of posts identifies and describes 77 testing problems organized into 14 categories; lists potential symptoms by which each can be recognized; potential negative consequences, and potential causes; and makes recommendations for preventing them or mitigating their effects.

Why Testing is a Problem

A widely cited study for the National Institute of Standards & Technology (NIST) reports that inadequate testing methods and tools annually cost the U.S. economy between $22.2 billion and $59.5 billion, with roughly half of these costs borne by software developers in the form of extra testing and half by software users in the form of failure avoidance and mitigation efforts. The same study notes that between 25 percent and 90 percent of software development budgets are often spent on testing.

Despite the huge investment in testing mentioned above, recent data from Capers Jones shows that the different types of testing are relatively ineffective. In particular, testing typically only identifies from one-fourth to one-half of defects, while other verification methods, such as inspections, are typically more effective. Inadequate testing is one of the main reasons that software is typically delivered with approximately 2 to 7 defects per thousand lines of code (KLOC). While this may seem like a negligible number, the result is that major software-reliant systems are being delivered and placed into operation with hundreds or even thousands of residual defects. If software vulnerabilities (such as the CWE/SAN Top 25 Most Dangerous Software Errors) are counted as security defects, the rates are even more troubling.

Overview of Different Types of Testing Problems

The first blog entry in this series covered the following general types of problems that are not restricted to a single kind of testing:

test planning and scheduling problems

stakeholder involvement and commitment problems

management-related testing problems

test organization and professionalism problems

test process problems

test tools and environments problems

test communication problems

requirements-related testing problems

The remainder of this second post focuses on the following six categories of problems, each restricted to one of the following types of testing:

unit testing

integration testing

specialty engineering testing

system testing

system of system testing

regression testing

Unit testing problems primarily occur during the testing of individual software modules, typically by the same person who developed it in the first place. Design volatility could be causing excessive iteration of the unit test cases, drivers, and stubs. Unit testing could suffer from a conflict of interest as developers naturally want to demonstrate that their software works correctly while testers should seek to demonstrate that software fails. Finally, unit testing could be poorly and incompletely performed because the developers think it is relatively unimportant.

Integration testing problems occur during the testing of a set of units integrated into a component, a set of components into a subsystem, a set of subsystems into a system, or a set of systems into a system of systems. Integration testing concentrates on verifying the interactions between the parts of the whole. One potential problem is the difficulty of localizing defects to the correct part once the parts have been integrated. A second potential problem is inadequate, built-in test software that could help locate the cause of any failed test. Finally, a third problem is the potential lack of availability of the correct (versions of the) parts to integrate.

Specialty engineering testing problems occur when an inadequate amount of specialized testing of various quality characteristics and attribute testing takes place. More specifically, these problems involve inadequate capacity, concurrency, performance, reliability, robustness (e.g., error and fault tolerance), safety, security, and usability testing. While these are the most commonly occurring types of specialty engineering testing problems, other types of specialty testing problems may also exist depending on which quality characteristics and attributes are important (and thus the type of quality requirements that have been specified).

System testing problems occur during system-level testing and often cannot be eliminated because of the very nature of system testing. At best, recommended solutions can only mitigate these problems. Is it hard to test an integrated system’s robustness (support for error, fault, and failure tolerance) due to the challenges of triggering system-internal exceptions and tracing their handling. System-level testing can be hard because temporary test hooks have typically been removed so that one is testing the actual system to be delivered. As with integration testing problems, demonstrating that system tests provide adequate test coverage is hard because reaching a specific code (e.g., fault tolerance paths) by only using inputs to the black-box system is hard. Finally, there is often inadequate mission-thread-based testing of end-to-end capabilities because system-testing is often performed using use-case-based testing, which is typically restricted to interactions with only a single, primary, system-external actor.

System-of-Systems (SoS) testing problems are often the result of SoS governance problems (i.e., everything typically occurs at the system-level rather than SoS-level). For example, SoS planning may not adequately cover SoS testing. Often, no organization is made explicitly responsible for SoS testing. Funding is often focused at the system-level, leaving little/no funding for SoS testing. Scheduling is typically performed only at the individual system level, and system-level schedule-slippages make it hard to schedule SoS testing.

SoS requirements are also often lacking or of especially poor quality, making it hard to test the SoS against its requirements. The individual system-level projects rarely allocate sufficient resources to support SoS testing. Defects are typically tracked only at the system level, making it difficult to address SoS-level defects. Finally, there tends to be a lot of finger-pointing and shifting of blame when SoS testing problems arise and SoS testing uncovers SoS-level defects.

Note that a SoS almost always consists of independently governed systems that are developed, funded, and scheduled separately. SoS testing problems therefore do not refer to systems that are developed by a prime contractor or integrated by a system integrator, nor do they refer to subsystems developed by subcontractors or vendors.

Regression testing problems occur during the performance of regression testing, both during development and maintenance. Often, there is insufficient automation of regression testing, which makes regression testing too labor-intensive to perform repeatedly, especially when using an iterative- and incremental-development-cycle. This overhead is one of the reasons that regression testing may not be performed as often as it should be.

When regression testing is performed, its scope is too localized because software developers think that changes in one part of the system will not propagate to other parts, and thereby cause faults and failures. Low-level regression testing is commonly easier to perform than higher-level regression testing, which results in an over-reliance on low-level regression tests. Finally, the test resources created during development may not be delivered and thus may not be available to support regression testing during maintenance.

Addressing Test-type Specific Problems

For each testing problem described above, I have documented several types of information useful for understanding the problem and implementing a solution. This information will be appearing in an upcoming SEI technical report. As an example of what will appear in this report, the testing problem “Over-reliance on COTS Testing Tools” has been documented with the information described below

Description. Too few of the regression tests are automated.

Potential symptoms. Many or even most of the tests are being performed manually.

Potential consequences.

Manual regression testing takes so much time and effort that it is not done.

If performed, regression testing is rushed, incomplete, and inadequate to uncover sufficient number of defects.

Testers are making an excessive number of mistakes while manually performing the tests.

Defects introduced into previously tested subsystems/software while making changes may remain in the operational system.

Benefits of Using the Catalog of Common Testing Problems
This
analysis of commonly occurring testing problems—and recommended
solutions—can be used as training materials to better learn how to
avoid, identify, and understand testing problems and mitigate them. Like
anti-patterns, these problem categories can be used to improve
communication between testers and testing stakeholders. This list can
also be used to categorize problem types for metrics collection.
Finally, they can be used as a checklist when

producing test plans and related documentations

evaluating contractor proposals

evaluating test plans and related documentation (quality control)

evaluating as-performed test process (quality assurance)

identifying test-related risks and their mitigation approaches

Future Work

The framework of testing problems outlined in this series is the result of more than three decades of experience in assessments and my involvement in numerous projects and discussions with testing subject matter experts. Even after all this time, however, several unanswered questions remain that I intend to be the subject of future study:

Probabilities. Which of these problems occur most often? What is the probability distribution of these problems? Which problems tend to cluster together? Do different problems tend to occur with different probabilities in different application domains such as commercial versus governmental versus military and web versus information technology versus embedded systems, etc.)?

Severities. Which problems have the largest negative consequences? What are the probability distributions of harm caused by each problem?

Risk. Based on the above probabilities and severities, which of these problems cause the greatest risks? Given these risks, how should one prioritize the identification and resolution of these problems?

I am interested in turning my work on this topic thus far into an industry survey and perform a formal study to answer these questions. I welcome your feedback on my work to date in the comments section below.