Characterization Testing

Writing tests to describe and fix existing code

I think that the most confusing thing about testing is the word testing. We use it for many things, from exploratory testing and manual testing to unit testing and other forms of automation. The core issue is that we need to know that our code works, and we’ve lumped together a variety of practices and put them under this banner. We can look for some common strands between them but they are hard to find.

Most types of testing have the quality of being about correctness. We test our code to see that it is doing what we want it to do. This assumes that we know what it is supposed to do - that we have some sort of specification in our heads or elsewhere that tells us what we’re aiming for.

What if we don’t?

Unfortunately, this is exactly where we are with a lot of software. We need to make changes to it but we don’t know enough about what it does.

What does this code do? It seems to be stripping HTML-ish tags out of text, but the logic is ad-hoc and likely broken.

Even though this is a tiny example, it shows just how hard it can be to read badly written code. Nearly everyone has spent minutes up to hours staring at pieces of convoluted logic, trying to understand enough about them to be able to change them. It’s one of the universal experiences of development. The nice thing is that we can use testing to help us out. It’s testing with a different focus. Instead of trying to figure out what whether code is correct or not, we can try to characterize its behavior to understand what it actually does.

This test is based on a hypothesis. We’re guessing that if we format text that doesn’t have any tags, we end up with the same text. Sure enough, when we run the test it passes. We made a good guess, but why are we guessing at all?

Let’s back up and do it over again.

I’m going to write a test and give it the name ‘x.’ I’m calling it ‘x’ because I don’t know what the formatText function is going to do. I won’t even put in a real expected value either because, at this point, we don’t know what the behavior will be.

It’s hard not to think that this is cheating in some way. It’s too easy. We’ve bypassed thinking about expectations entirely and we’re writing tests. What possible value can they have?

It turns out that they have a lot of value. When we write characterization tests we build up our knowledge of what the code actually does. This is particularly useful when we want to refactor or rewrite. We can run our tests and find out immediately whether we’ve changed behavior. More subtly, we start to see our tests in a different way.

Here’s a test for another aspect of that function’s behavior. It passes. Does it show a bug in the code?

It’s a trick question. The context determines whether this is or isn’t correct behavior. When you write characterization tests, you often discover behavior you weren’t aware of. This leads to more questions. You can write more tests to answer these questions, or you can do more investigation, look at scenarios, or talk to users. However, if you haven’t determined that the behavior you've uncovered is a bug, it’s often a good idea to leave the test in place. This highlights the key difference between characterization testing and other forms of testing. The purpose of characterization testing is to document your system’s actual behavior, not check for the behavior you wish your system had.

Once, when I first started programming, I was asked to fix a bug. After I had, users complained. It turned out that that they depended upon the behavior I'd removed. They didn’t think it was a bug, they thought it was a feature. Many people I talk to have had that same experience. I don’t think it’s a fluke. It points to something rather sublime.

When a system goes into production, in a way, it becomes its own specification. We need to know when we are changing existing behavior regardless of whether we think it’s right or not.

Once we adopt this perspective, we have a different frame for understanding automated tests in general. We can see them as descriptions of what we have rather than statements of correctness. We can revisit our tests periodically to tighten up their conditions as we decide what the behavior should be at each level of our systems.

Characterization testing is a simple process. The hardest part is breaking dependencies around a piece of code well enough to be able to exercise it in a test harness. Once you have, it's just a matter of being curious about what code will do under particular conditions, supplying a dummy expectation value, and then running the test to find the the actual value. I find that as I do this, I end up revisiting the names of my tests often as I build up understanding of the code I'm exercising.