Back to Basics: Why Unit Testing is Hard

More and more lately, I’ve been beginning to question the value of unit testing. I’ve really been starting to wonder if all the work we put into being able to actually test at the unit level and the extra scaffolding we put into our applications to support it is worth the cost.

I’m not going to talk about that subject yet. Instead, I want to look at some of the costs of unit testing and ask the question “why is unit testing hard?”

After all, if unit testing weren’t hard, we wouldn’t have to question whether or not it was worth it. It makes sense then to look at first why it is hard and what makes it hard.

The ideal scenario

Unit testing itself is rather easy once you understand how to do it. Even test driven or behavior driven development is easy one mastered… at least for the ideal scenario.

What is the ideal scenario then?

It is a unit test where the class under test has no external dependencies.

When a class we are writing unit tests for doesn’t have any external dependencies, we don’t need mocks or stubs or anything else. We can just write code that tests our code.

Let’s look at an example of this. Suppose, I had a class called Calculator. This Calculator class has some very simple methods. Specifically, let us talk about testing a method Add. Add takes two single digit integers and returns the result. If either integer passed in is more than a single digit, it throws an exception.

It is a pretty stupid method, with little use, but it will serve the point well here.

We can TDD or BDD this baby with minimal effort.

Let’s start thinking of test cases:

When I add 0 and a single digit number it should return the single digit number

When I add 0 and 0 it should return 0

When I add two single digit numbers it should return the sum of those numbers

When I add one two digit number it should throw an exception

Pretty easy to come up with test cases, just as easy to implement them:

We can then implement the code that will make this test pass pretty easily. I won’t show it here since it is so trivial.

This is the ideal scenario, or what I will call Level 1 Unit Testing.

Level 1 Unit Testing is where we have a single class with no external dependencies and no state. We are just testing an algorithm.

Taking it up a notch

The next level of unit testing is reached if we add state to the class under test.

Level 2 Unit Testing is where we have a single class with no external dependencies but it does have state. We are setting up an object and testing it as a whole.

If we take our existing example, and now we want to add a new method called GetHistory, it is still not difficult to implement the tests, but it gets harder, because we have to make sure we are setting up some state for our object as part of the test.

Let’s look at one of the test cases we might implement for this functionality:

Again, not too hard. But, state does make this a bit more difficult. Here the value of a Behavior Driven Development (BDD) style of unit testing can be seen as it helps us to clearly divide the test up into the different parts we now have.

The real complexity we have added here is that we now have to deal with a setup step before we can execute our test. BDD deals with this by having a special step for defining the context in which the actual test takes place. It is called a few different things in different BDD circles, but let’s stick with AAA for this post, since it is easy to remember.

The major difference between Level 1 Unit Testing and Level 2 Unit Testing is that in Level 1, we were really testing only one method. In Level 2 we are testing at the class level. Really we could call Level 1 Unit Testing method testing, since the unit we are testing is the method. The class that method existed in didn’t matter.

Enter dependencies

Let’s see what happens when we throw dependencies into the Calculator class.

Imagine that our Calculator class has to keep an audit trail of our calculations. We have a service that we can use to put calculations from the Add method into a storage location, like a database and our GetHistory method can query the storage location for the history.

As I was thinking about this, an important point occurred to me. Were this an integration test, our example test method above wouldn’t change at all.

But, as it turns out we are talking about unit tests here, we need to isolate the testing down to the class level.

So let’s think about what our test should do now. Here are some possible tests we might have.

When I add two number the result is returned and the Store method is called on the StorageService with that result.

When I get the history, the RetrieveHistory method is called on the StorageService and it’s results are returned back.

Level 3 Unit Testing is when we have a single class with at least one external dependency, but it does not depend on its own internal state.

Things really start to get complicated here, because we have to start thinking not just about inputs and outputs and sequences, but now have to think about interactions.

It really starts to get blurry here about what the expectations of our unit tests should be. In the example code above, do we need to check to make sure IsServiceOnline is called on the StorageService or do we only check that Store was called?

You’ll also notice here that we had to use a mock and pass our dependency into our class so that we could change its behavior. Along with that came the burden of creating an interface, so that we could have a mock implementation.

If you’re paying attention right now, you may be thinking to yourself that the example is bad. You may be thinking that the Calculator class now has two responsibilities.

It calculates things and return the result

It stores calculation results

Right you are, but we can’t wish away this problem. Let’s suppose we refactor and move the StorageService dependency out of the Calculator class. We have several options. We could make a decorator and use it like this:

However we attempt to solve this problem, we are still going to have to have some class that will have to have a mock in its unit test.

There is a simple fact that we cannot get around. If we are going to use the StorageService to store calculations, either Calculator will depend on it, or something else will depend on calculator and it. There is no alternative to those two options.

There is another simple fact we can’t get around also. If we are going to depend on another class in our unit test, we either need an interface that we can use for the mock class, or we need a mocking framework that will support mocking concrete classes.

So with Level 3 Unit Testing we are stuck with needing to mock at least one dependency and either creating a bogus interface, or using a mocking library that will let us mock concrete classes.

It gets worse

It only gets worse from here. At Level 3 we didn’t worry about state inside our calculator class, we worried about an external dependency that pretty much handled state for us. In many cases though we will have to worry about state and dependencies.

Level 4 Unit Testing is when we have a single class with at least one external dependency and depends on its own internal state.

In our calculator example, we can simply add the requirement that we only want to get the history for a particular session of calculations. We need to keep track of the calculations so that we can ask the StorageService for the history for our session.

Consider for a moment how fragile and complex this unit testing code is. Consider how simple the functionality of our class is.

We have a major problem here. Our unit testing code is more complex than the code it is testing! It’s ok, if the unit testing code is more lines of code than the code it is testing, that is usually the case. But, I consider it a big problem when our unit testing code is more complex, because you have to ask yourself the very real question.

Where is there more likely to be a bug?

I’m not saying anything yet

My point is not to make a point, at least not yet. My real goal here is to help us to change the way we think about unit testing.

We need to stop asking the general question of whether not unit testing is worth the cost and instead ask the more specific question of what level of unit testing is worth the cost.

Level 3+ has a very steep cost as mocking is unavoidable and adds considerable complexity to even the most trivial of implementations.

From that we can draw a bit of wisdom. If we are going to unit test we should strive to encapsulate as much of our pure logic into classes without dependencies and if possible without state.

The other thing to consider is that as the difficulty and complexity of the unit tests are increasing each level, the goal of the test and value starts to become lost also.

What I mean by this, is that when we start testing that our class properly calls another class with certain parameters, we are crossing over into testing the implementation details of the class.

If I say a class should be able to add 2 numbers and return the result. I am not talking about how it has to do it. As long as the result is correct, how doesn’t matter.

When I add a mock and say a class needs to add 2 numbers and store the result using a StorageService by calling the method Store on it, I have now tied how into the test. Changing how breaks the test.

That’s all we are going to look at for now. If you’ve read some of my other back to basics posts, you can see the progression up to this point. I’ve been discounting using interfaces and dependency injection for the sake of unit testing, but I have yet to offer an alternative. I still don’t. To be honest, I don’t have one yet. But, I do believe by breaking down this problem to its roots we can evaluate what we are doing and determine what our true problems are.

By the end of this series I hope to have a solution and a recommendation for tackling these kinds of problems.

As always, you can subscribe to this RSS feed to follow my posts on Making the Complex Simple. Feel free to check out ElegantCode.com where I post about the topic of writing elegant code about once a week. Also, you can follow me on twitter here.

I can’t wait to find out what conclusions you do reach. Interestingly you are right that unit testing code with state and external dependencies is hard.

But you know what else? It’s also hard to maintain!

If nothing else over time I’ve learnt that if something is untested (or testable) and hard to maintain then it will be bug ridden and everything you do with it will be painful and take longer than it should. The benefit of unit testing (and TDD) is, for me, eliminating this later pain. I’ve found in a large number of situations that taking the time to unit test from the outset has resulted in a design that is less tightly coupled and far easier to maintain just purely because it would have been a nightmare to test the first version I’d coded.

I’ve been following this series of yours from the start, and I think you generally make good posts – this one no exception. I’m not saying I agree with what you say here, nor do I disagree, but I think it is good somebody took the leaf from their mouth and actually said it.

Many of the hard-core TDD/BDD “evangelists” might be a bit provoked by what you say, but we really should think a bit more about these things instead of just accepting them because they are “in”, so it is good that you bring i up!

Jason,
Having done Unit Testing religiously for the last two years, I’m beginning to have the same doubts as you. Influenced by Ayende, I’ve started to focus more on Integration tests, but valuing unit tests for what you call Level 1 and Level 2 (very helpful distinctions by the way).

One insight I’ve had here is that TDD is not called UTDD – we can do Test Driven Development just as well using Integration tests instead of Unit tests; we just have to make sure that the tests are fast enough.

One thing we’ve done to reduce test complexity is use a test framework that lets us break down the test into composable pieces that abstract out the test implementation details. This makes it is easier to scan the tests for gaps and to see inappropriate responsibilities e.g. Why is Calculator.Add calling Store()? It also means we only have to change some composable pieces instead of many tests when an interaction changes (e.g. call to bool IsServiceOnline() becomes ServiceStatus GetStatus()) Here’s a sample implementation using your scenario: https://gist.github.com/739473

I do like how you have made the test very readable there.
The complexity though is still amazingly large in comparison to the functionality being tested.
If you look at that code you pasted, tell me if you think that it is more likely that you made an error in the test code or the SUT code?
I’m not sure how much good in the world we have done by writing those unit tests. Just somethings to consider.
Thanks for taking the time to put together that code example.

“When I add a mock and say a class needs to add 2 numbers and store the result using a StorageService by calling the method Store on it, I have now tied how into the test.”

I would write that as two tests. You even used the word “and” in “add 2 numbers and store the result”, which is often a sign of having multiple responsibilities. In this case it’s the test which has multiple responsibilities – it tests two different features.

My opinion: The value of a test is ultimately proportional to the value of the requirements it is supporting as viewed by the customer.

If it’s considered mission critical for the calculator to support a history, then it may be worth the extra complexity of testing.

Whereas, if the history is merely a nice-to-have but not essential, then there probably needs to be some justification for delving into unit test levels 3 and 4 when an integration test covering concrete instances of both the Calculator and StorageService would suffice.

The question then becomes how do we determine what is of value to the customer and how do we correlate that to a justifiable level of effort?

For those of us who interact with our customers, this is a familiar line we are all forced to walk. For those who just write code, this becomes much more difficult.

Just found your blog and putting it in my reader. Interesting thoughts. Looking forward to what you have to say in the future regarding this.

Yes, unit testing can be painful, but I don’t know that it increases the pain of testing. I think it may simply make us more aware of the pain that we were ignoring before. And in that awareness, I think it helps us pay more attention to what we’re designing.

Having done unit testing fairly consistently for the last 8+ years and now being in an environment where it’s a pretty new paradigm, I can say I really miss having a comprehensive set of tests when I make changes.

I have been following your back-to-basics series and will be waiting to see the solution. I think, there are two things going on here….there are unit tests (level1 and 2) and functional/feature tests (level 3 and 4). Functional/Feature tests are always hard to test and maintain. But, with appropriate naming and structuring you can ensure that they are easier to maintain. Without feature/functional tests bugs creep into your code in long term and doing changes to the code becomes a nightmare. It is very essential to have these tests to ensure stability of code. There is nothing wrong in mocking and I don’t see any reason why it should create complexity. Code written to test code is also after all code and it can be awesome/aweful.

Test code is as important as production code. It can have bugs. It needs to maintained. And we should strive to keep it as stable as possible.
Functional/Features tests document a large part of what you code is doing. By looking at these tests, one should be able to identify interaction and state changes going on in a particular scenario. I think the focus should be on improving the quality of the test code, rather than questioning the value of these tests.

[…] of the most common best practices today in software development is the idea of unit testing. I’ve written about how I have my doubts about blindly following this best practice in the past, but whether or not we should follow it, is […]

Great post. Honestly, I was pissed off when I read the title, but your analysis is sound. However, I still wholeheartedly believe that it is definitely worth it. Your code will be better designed. Your code will be easier to maintain. You will develop faster.