How Simple is too Simple to Test?

I'm sure you spend a lot of time thinking about how to write better software just like I do. And if you've dipped your toes into the waters of unit testing best practices, you will have undoubtedly asked this question at some point: how simple is too simple to test?

Well, today I found an error in one of my projects that suggests that there might not be a lower bound. Check this out:

This method does some checks on the contents of $pcIDs and throws an exception if something goes wrong. Simple, right? Of course. It's only 15 lines of code, including the signature and parentheses.

Would you believe this code contains an error? Let me give you a hint:

Do you see the problem (take your time, I'll wait).

waiting...

waiting...

waiting...

waiting...

waiting...

Figure it out yet? Need a more obvious hint?

What happened

See what happened there? We want $pcID[0] and $pcID[1] to be the only values in the input array. They should be integers > 0 and not equal to each other. At first glance the method appears to work correctly. But upon closer inspection you probably realized that ValidateID::run() returns a bool but this code doesn't check the return value. Instead it assumes that ValidateID::run() will throw an exception if the value is invalid.

So, validateProductCategoryIDs(array('cat', 'dog')); will not throw an exception, which is not the behavior we want.

Unit testing best practices

So, how simple is too simple to test? I'm starting to come around to the point of view that almost nothing is too simple to test.

There's a quote I like but I can't remember who wrote/said it:

Write tests for any behavior you care about.

Indeed. This fits in nicely with a lecture on Computer Bugs in Hospitals (great talk--by the way). The lecturers argue that doctors and nurses enter incorrect data into computer systems all the time. And some of those errors kill patients. But here's the part that's important for us--they don't know they made a mistake. A nurse entering drug doses into an infusion pump with a numeric keypad makes a mistake that they do not catch about 4% of the time! Doesn't that seem like a high error rate?

So, if nurses are messing up simple number entry tasks without noticing, how often are programmers inserting silly errors into their code without noticing? I think the answer is probably a lot.

The code I'm showing you in this post was written with care. The author wrote a comprehensive manual testing plan for it but no unit tests. The code was reviewed by the author using our code review checklist. And then I also reviewed the code using our code review checklist and executed the manual testing plan. And, after all of that QA, we still missed this pretty obvious error.

So how did we catch this error today?

My colleague wrote unit/integration tests for a method that calls the validateProductCategoryIDs() method. He actually didn't notice the problem but when I was doing a code review on his new tests, I noticed that array('cat', 'dog') types of inputs were not resulting in exceptions and I thought that was suspicious.

To summarize, we needed automated testing combined with code reviews to catch this error. Even though this was a simple error, neither manual testing, unit testing, nor code reviews alone were enough to bring this error to our attention. We only discovered this error by combining several kinds of quality assurance activities.

By the way, this is exactly what Steve McConnell recommends in Code Complete--multiple kinds of QA because each kind catches different kinds of errors.

A couple of thoughts about clean coding practices

We'll never know exactly what happened in the mind of the programmer who originally wrote this code but I can tell you with confidence that this was not an intentional error.

If you do a little root cause analysis you'll notice that we have two validation methods in the code I've shown you. One returns void or throws an exception and the other returns bool. And that's confusing as hell.

Don't use a confusingly similar naming system for methods that return incompatible types

Having validateXXX methods that return incompatible types is probably a code smell. And I'd be willing to bet you that it contributed to us not catching the error during code review.

We could probably reduce the potential for the same kind of errors in the future by changing some of the names of the validateXXX methods to "verifyXXX." That would make it clear that validate methods return bool and verify methods throw exceptions if they encounter data they don't like. Our IDE has strong refactoring support so we can make that kind of change quickly and safely.

Catch this kind of potential error with a static analysis inspection

My preferred solution is to use a static analysis inspection that's integrated into our IDE to help us find all instances of this class of problems while we are coding. I'm not sure how easy it would be to turn on that kind of inspection in our project but I'm sure you can see the appeal.

Takeaways

Even the best programmers make mistakes. You might be able to catch some of them with careful code reviews and good clean code practices. But you need to combine multiple QA methods if you want really low defect rates. Your unit testing best practices should include testing any behavior you care about, even for code that might appear to be too simple to contain an error.

The changes start to interact, and produce strange edge cases. Some of these edge cases are intentional, some are accidental. Other code is built that depends on the current, undocumented behavior. The actual requirements are lost to time. The code is now "not really testable."

The code has become 2000 lines of if-else chains, copy-and-pasted code, and side-effected booleans. It has half a billion possible states, and can't be understood by mere mortals, or refactored without risk to the business.

A new feature is needed. A bright, shiny new bit of code is created, because adding anything more to the old code is madness.

The code for the new feature is "too simple to test."

Tests support many development goals. Verifying that the code works correctly now is just one. It's arguably more important to have tests that verify that the code continues to work as it changes over time.

Yes. Quickcheck is an implementation of property-based testing. We do property-based testing mostly for reversible operations like encode-decode functionality when we want to make sure the data is preserved. But that's really a niche for us. The base of our testing pyramid is unit tests.

Another error is when you send an array with count(array) <= 1, because the count(arr) is NOT the first check, as it should be. Depending on the language also NULL can do the same error. (null[0] or null[1] would not work).

Yes. That's what I was trying to discuss towards the end of the post. It is confusing.

Passing an array instead of two params was the author's choice. It meets our coding standard and it makes a fair bit of sense in the larger context of where that data comes from. Two params would have been perfectly fine in this context too.

Great article!
I agree that nothing is too simple to test.
I've been writing unit tests for a while now and I can say that you need to unit tests every method you have. Even the simplest ones can create bugs sometimes...

I've seen an even eviler variation of the example you've shown us. Someone has called a validation method and stored its result in a boolean variable. After the iteration result stored in the method, the variable has been returned to a caller. It could end with false positive or false negative result. The result depends only on the last partial validation in the foreach loop, because of the overwriting the method variable in each iteration.

Yes. I've watched uncle Bob's stuff on unit testing. TDD is a little different than getting tests around existing code. But, yes, I agree. I'd never fail a PR because the unit tests were testing things that were simple.

I'm actually with Phil Koopman on the issue of method complexity/size. There's a sweet spot and I think it's bigger than uncle Bob recommends for my projects. That doesn't mean I like huge methods. But when I've chopped my code up into methods like uncle Bob recommends, I end up with really deep call chains and it becomes harder to understand what's happening.

Good question! I actually had to run some code to make sure it behaves as I expected.

It will get to the check for the number of elements in the collection unless the first element in the array is null. If that happens the first "if" returns true and you'll get an exception, which is what we want but the error message will be deceptive. That's a real edge case though.

I was hesitant to mention it because I don't details of the language. I was just curious because I didn't know if you could compare the first and second elements without an exception if there were fewer than two elements. In .NET the check itself would throw an exception, so I'd check the number of elements first.

Good eye. It's so interesting to hear all these comments and suggested improvements for what amounts to a pretty trivial and boring piece of code. My thanks to everybody who commented.

I'm curious how many of you run your code reviews at this level of detail? Are the things you are mentioning typical things you'd pickup and mention in a regular code review at your job? Or is this a special level of feedback because of the circumstances?

I caught the naming issue too and fixed it in the production code before I released the blog post.