Short notes and essays about stuff that interests me (mostly technical stuff).

Saturday, April 25, 2015

On testing strategies, and end-to-end testing

The straightforwardly-named Google Testing Blog is an instance of the "group blog" category, where a collection of people, all of them Google employees (as far as I know), publish articles about the art and science of software testing.

I read the Google Testing Blog faithfully, partly because I'm rather obsessive about software testing.

Computer programming is a profession that appeals to and rewards the obsessive, and within that broader group of people, the still more obsessive sorts like me often obsess about testing.

Computer programmers obsess about testing the way that car lovers obsess about oil changes, the way that NBA athletes obsess about free throws, the way that sushi chefs obsess about knife selection. Testing is a tool in the programmer's toolkit, but you when you see it wielded with experience and training, it is an amazingly powerful tool.

Testers can invest their time in writing many types of automated tests, including unit tests, integration tests, and end-to-end tests, but this strategy invests mostly in end-to-end tests that verify the product or service as a whole. Typically, these tests simulate real user scenarios.

I'm not sure where the author came up with this strawman, frankly. In all my decades of professional software development, it's been a long, long time since I've been around anyone who's suggested that we invest "mostly in end-to-end tests that verify the product or service as a whole." However, I'm sure there could be such people, and indeed we see that there are entire books on the subject (of course: there are entire books on any subject).

Anyway, back to the Google Testing Blog. The author then proceeds to relate a

composite sketch based on a collection of real experiences familiar to both myself and other testers

The article uses these "real experiences" to demolish the idea of end-to-end testing with a scenario so bizarre, so fanciful, so implausible that I can hardly believe it:

Let's assume the team already has some fantastic test infrastructure in place. Every night:

The latest version of the service is built.

This version is then deployed to the team's testing environment.

All end-to-end tests then run against this testing environment.

An email report summarizing the test results is sent to the team.

The deadline is approaching fast as our team codes new features for their next release. To maintain a high bar for product quality, they also require that at least 90% of their end-to-end tests pass before features are considered complete. Currently, that deadline is one day away:

Days Left Pass % Notes
1 5% Everything is broken! Signing in to the service is broken. Almost all tests sign in a user, so almost all tests failed.

Uhm, what?

This is wrong on so many levels that it's hard to know where to start.

Did any of these tests ever pass?

When was the last time they passed?

Did they all pass with 2 days left?

What did the team do on the day when the "email report summarizing the test results" first reported that "almost all tests failed"?

Whatever went wrong with this project, though, one thing is very clear to me:

The testing strategy is not the problem here.

Any team which allowed themselves to think they were at a point where the "deadline is one day away" and yet "almost all tests failed" is so poorly managed, so inexperienced, so lacking in commonsense that they aren't going to solve their problems by blaming the testers or their test strategy.

Tests are one barometer of project progress, but there are many other metrics that any successful software development project uses, combined with that innate sense that experienced software developers acquire that tells them just how close they are to something that is actually ready.

And if you choose to ignore the information that's available to you, that's your fault. If the testers had followed a different strategy, and produced a different set of tests, the team could just as well have ignored that data entirely, too.

Trying to put aside the invective of the article, and its caricatured depiction of a project wildly out of touch with reality, what is the article really trying to say?

I think the points they wish to make are:

The sooner you can receive feedback from your tests, the sooner you can act on it

Smaller, more focused tests are cheaper to write and faster to run

If your tests run fast, you can run them very often

If you run your tests very often, they will more clearly point to the instant when a problem was introduced into the code

But if you test only individual components or modules, problems can creep in where the modules and components must be assembled into larger software systems, so don't entirely omit complete system tests.

Really, this is well-trodden ground. Every time I see an article like this, I bemoan the fact that Martin Fowler's work on Continuous Integration is now 15 years old, yet seems to be so little-known to so many people.

An important thing to decide is what makes a successful build. It may seem obvious, but it's remarkable how this can get muddy. Martin once reviewed a project. He asked if the project did a daily build and was answered in the affirmative. Fortunately Ron Jeffries was there to probe further. He asked the question "what do you do with build errors?" The response was "we send an e-mail to the relevant person". In fact the project hadn't succeeded in a build for months. That's not a daily build, that's a daily build attempt.

And Martin Fowler isn't the only one who's been talking about these basic principles for decades. For example, consider Joel Spolsky's Daily Builds Are Your Friend (again, nearly 15 years old):

If a daily build is broken, you run the risk of stopping the whole team. Stop everything and keep rebuilding until it's fixed. Some days, you may have multiple daily builds.

On large teams, one good way to insure that breakages are fixed right away is to do the daily build every afternoon at, say, lunchtime. Everyone does as many checkins as possible before lunch. When they come back, the build is done. If it worked, great! Everybody checks out the latest version of the source and goes on working. If the build failed, you fix it.

I think that the Google Testing Team are primarily trying to convey the notion that different types of tests are useful for different purposes, and you need to have a complete collection of tests, using lots of different testing approaches, to consider your testing strategy complete.

In fact, they even discuss the notion of the "testing pyramid" at the end of their article, though it's a shame that they don't point to the original source of the "testing pyramid" notion, Mike Cohn's The Forgotten Layer of the Test Automation Pyramid

I'm pleased that the Google Testing Blog is publishing articles on testing, I just wish they'd dig a bit deeper into the history of the field, and take a more modern approach, rather then putting up strawmen that haven't been in favor in decades and then tearing them down as if they'd just had a bold new vision of how to build quality software.

3 comments:

Nicely done! You turned their strawman into a snowman and melted it, then proceeded to spank them soundly for reinventing the wheel. Instead of sitting here feeling grumpy about the Google article, I can instead go back to trying to get my real problem fixed, which is 1) getting the developers to write unit tests and 2) getting them to actually fix the bugs the testers are finding.