Test Smarter, Not Harder – Part 1

This series is a reprint of an article by Scott Sehlhorst, written for developer.* in March 2006. A recent article on dailytech about “new” methods for software testing points to some very interesting research by the NIST (the National Institute of Standards and Technology) Information Technology Lab. We’ve split the original article into three articles to be more consistent in length with other Tyner Blain articles.

This is part 1 of a 3 part series.

Article Overview

Part 1 of this article explores the problem of getting good testing coverage of complex software.

Introduction

In this article we focus on a method to minimize the cost of quality. Regardless of the method you use for calculating the cost of poor quality, minimizing the cost of assuring good quality is relevant to you. We will discuss techniques for minimizing the cost of creating and maintaining an automated regression testing suite for a software application.

Automated testing changed the software development process as much as the assembly line changed the manufacturing industry. We are assuming in this article that we already have automated testing as a part of our development process.

We open with a discussion of the problem. We will then discuss approaches to automated testing that are progressively better at minimizing costs. We hope you enjoy the article and would love to hear back from you with feedback and contributions.

The problems

Here is the sound bite version of the problems this article is designed to address

We aren’t meeting our quality goals.

We have a development team, regularly releasing new versions of our software.

We have a suite of automated tests we run on our software.

Our automated test suite may be large, but out product is complex. There’s no way to test every possible scenario. Bugs are still getting released to the field.

Not solving the problems

Two solutions that we have to consider are to test nothing, and to test everything. We would consider testing nothing if we can’t afford to test the software. When people don’t appreciate the complexities of testing or the limitations of automated testing, they are inclined to want to test everything. Testing everything is much easier to said than done.

We can’t afford to test it

I was able to attend a training session with Kent Beck a few years ago. I was also honored to be able to enjoy a large steak and some cold beer with him that night after the training. When asked how he responds to people who complain about the cost of quality, Kent told us he has a very simple answer, “If testing costs more than not testing then don’t do it.” I agree.

There are few situations where the cost of quality exceeded the cost of poor quality. These are situations where the needed infrastructure, test-development time, and maintenance costs outweighed the expected cost of having a bug. The expected cost is the likelihood (as a percentage) of the bug manifesting in the field, multiplied by the cost of dealing with the bug.

The techniques described in this article are designed to reduce the cost of quality, to make it even less likely that not testing is the best answer.

Just test everything – it’s automated

We’ve all been on at least one project or team where our manager has said

“I demand full testing coverage of the software. Our policy is zero tolerance. We won’t have bad quality on my watch.”

What we struggle with here is the lack of appreciation for what it means to have full coverage or any other guarantee of a particular defect rate.

There are no absolutes in a sufficiently complex system – but that’s ok. There are statistics, confidence levels, and risk-management plans. As engineers and software developers, our brains are wired to deal with the expected, likely, and probable futures. We have to help our less-technical brethren understand these concepts – or at least put them in perspective.

We may get asked “Why can’t we just test every combination of inputs to make sure we get the right outputs? We have an automated test suite – just fill it up and run it!” And we need to resist the urge to respond by saying “Monkeys with typewriters will have completed the works of Shakespeare before we finish a single run of our test suite!”

Complexity leads to futility

Consider a web page for customizing a laptop purchase. Take a look at Dell’s customize it page for an entry level laptop, if you’ve never configured a laptop online before.

The web page presents eleven questions to the user that have from two to seven responses each. Specifically, each decision the user has to make presents (2,2,2,2,2,3,2,2,3,4,7) choices. This is a simple configuration problem. The number of possible laptop configurations that could be requested by the user is the product of all of the choices. In this very simple page, there are 32,256 possibilities. The page for customizing Dell’s high end laptop at the time of this writing has a not-dissimilar set of controls with more choices in each control – (3,3,3,2,4,2,4,2,2,3,7,4,4). The user of this page can request any of 2,322,432 different laptop configurations! If Dell were to add one more control presenting five different choices, there would be over ten million possible combinations!

Creating a test suite that tries all two million combinations for their high end laptop could be automated, but even if every test took one tenth of second to run, the suite would take over 64 hours to run! And Dell changes their product offering in less time than that.

If we use a server farm to distribute the test suite across ten machines we could run it in about 6 hours. Ignoring the fact that we would be running this type of test for each customization page Dell has, 6 hours is not unreasonable.

Validating the two million results is where the really big problem is waiting for us. We can’t rely on people to manually validate all of the outputs – it is just too expensive. We could write another program, which inspects those outputs and evaluates them using a rules-based system (“If the user selects 1GB of RAM, then the configuration must include 1GB of RAM” and “The price for the final system must be adjusted by the price-impact of 1GB of RAM relative to the base system price for this model.”)

There are some good rules-based validation tools out there, but they are either custom software, or so general as to require a large investment to make them applicable to a particular customer. With a rules-based inspection system, we have the cost of maintaining the rules.

The validation rules are going to have to be updated regularly, as Dell changes the way they position, configure, and price their laptops regularly.

Since we aren’t Dell, we don’t have the scale (tens of billions of dollars of revenue) to justify this level of investment. The bottom line for us is that we can’t afford to exhaustively test every combination.

Article Overview

Part 1 of this article explores the problem of getting good testing coverage of complex software.

Part 2 of this article discusses solution approaches (including those identified in the NIST research).

Part 3 of this article explores an approach to improving on the “best” solution identified in part 2.

Product Management Today

@sehlhorst on Twitter

Who Should Read Tyner Blain?

These articles are written primarily for product managers. Everyone trying to create great products can find something of use to them here. Hopefully they are helping you with thinking, doing, and learning. Welcome aboard!