Tuesday, March 8, 2011

About a half year ago, one of my coworkers sparked a discussion on the test pyramid. Specifically, he referenced a link and asked our thoughts on it. Since I like to geek out, I had to respond. I actually spent some time creating what I thought was a reasonable analogy so I figured I'd share it. I'm also offering this now as I have another post I really want to discuss in regards to the boundaries in testing between a QA group and developers. Is there one? Should there be one? How do we deal with test overlap? It's something I've been mulling around and I need to think out loud.

It's an effective way to describe a proven testing strategy. The simple law here is that tests are all about feedback and how valuable that feedback is. The value is tied directly to their maintenance cost. If you look at it this way, you stop thinking in terms of layers. For example, brittle unit tests are high cost and thus less valuable than verifiable, repeatable, solid unit tests. This is the mindset all test implementers need to have in order to build the best test suite.

In terms of the test pyramid, we can certainly assume that there are responsibly written tests. What's illustrated there is how the test maintenance cost rises in parallel to the growing level of component interaction and complexity. The more complex the interactions in the test then the more likely it is to break. Broken tests need to be maintained. Tests of complex interactions take longer to repair.

At each level of the pyramid, we have a set budget on how much we can spend on maintenance. Unit tests are cheap therefore we buy lots. Integration tests are a little more expensive so we can't afford as much. Functional tests carry the heaviest price tag so we just buy a handful.

Tests at all layers are valuable and provide a distinctly different type of feedback. That's why it isn't a good idea to simply forego testing a layer at all. Having a codebase with only unit tests doesn't prove that the system actually runs.

There are a few more factors in what defines the value of a test but I'm glossing over that by calling them "responsible" tests. If you had two integration tests that exercised the same grouping of components then you should probably entertain getting rid of one. We want clean, clear feedback and having multiple test breaks for a single issue will obscure the root of the problem. Given that we're trying to be as frugal as possible we'd probably identify tests like that early on.

The one caveat/tangent I might mention is that the pyramid metaphor (and how I commonly see it described) doesn't account for things like capacity testing; the throughput of the application, the transactions/second they can service, their memory footprint and resource consumption and other issues that can potentially derail a release.

Application deployment is an implicit test to make sure that you can actually install and run the application. There would never be an NUnit test for that but their is value in knowing that the application can be reliably deployed. We should try doing that earlier to get that feedback when the information is more fresh in our minds.

...

And that was about it. I had a few more words but they dealt with the internals of our specific situation. Not worth repeating for obvious reasons.