Have you ever been in a situation, where a simple change of code broke a few hundred tests? Have you ever had the idea that tests slow you down, inhibit your creativity, make you afraid to change the code? If you had, it means you've entered the Dungeon-of-very-bad-tests, the world of things that should not be.

I've been there. I've built one myself. And it collapsed killing me in the process. I've learned my lesson. So here is the story of a dead man. Learn from my faults or be doomed to repeat them.

The story

Test Driven Development, like all good games in the world, is simple to learn, hard to master. I've started in 2005, when a brilliant guy named Piotr Szarwas, gave me the book "Test Driven Development: By Example" [1], and one task: create a framework.

Those were the old times, when the technology we were using had no frameworks at all, and we wanted a cool one, like Spring, with Inversion-of-Control, Object-Relational Mapping, Model-View-Controller and all the good things we knew about. And so we created a framework. Then we built a Content Management System on top of it. Then we created a bunch of dedicated applications for different clients, Internet shops and what-not, on top of those two. We were doing well. We had 3000+ tests for the framework, 3000+ tests for the CMS, and another few thousand for every dedicated application. We were looking at our work, and we were happy, safe, and secure. Those were good times.

And then, as our code base grew, we came to the point, where a simple anemic model we had, was not good enough anymore. I had not read the other important book of that time: "Domain Driven Design" [2], you see. I didn't know yet, that you could only get so far with an anemic model.

But we were safe. We had tons of tests. We could change anything.

Or so I thought.

I spent a week trying to introduce some changes in the architecture. Simple things really: moving methods around, switching collaborators, such things. Only to be overwhelmed by the number of tests I had to fix. That was TDD, I started my change with writing a test, and when I was finally done with the code under the test, I'd find another few hundred tests completely broken by my change. And when I got them fixed, introducing some more changes in the process, I'd find another few thousand broken. That was a butterfly effect, a chain reaction caused by a very small change.

It took me a week to figure out, that I'm not even half done in here. The refactoring had no visible end. And at no point my code base was stable, deployment-ready. I had my branch in the repository, one I've renamed "Lasciate ogni speranza, voi ch'entrate". [3]

We had tons and tons of tests. Of very bad tests. Tests that would pour concrete over our code, so that we could do nothing.

The only real options were: either to leave it be, or delete all tests, and write everything from scratch again. I didn't want to work with the code if we were to go for the first option, and the management would not find financial rationale for the second. So I quit.

That was the Dungeon I built, only to find myself defeated by its monsters.

I went back to the book, and found everything I did wrong in there. Outlined. Marked out. How could I skip that? How could I not notice? Turns out, sometimes, you need to be of age and experience, to truly understand the things you learn.

Even the best of tools, when used poorly, can turn against you. And the easier the tool, the easier it seems to use it, the easier it is to fall into the trap of I-know-how-it-works thinking. And then BAM! You're gone.

The truth

Test Driven Development and tests, are two completely different things. Tests are only a byproduct of TDD, nothing more. What is the point of TDD? What does TDD brings? Why do we do TDD?

Because of those three reasons.

1. To find the best design, by putting ourselves into the user's shoes.

By starting with "how do I want to use it" way of thinking, we discover the most useful and friendly design. Always good, quite often that's the best design out there. Otherwise, what we get may be theoretically correct, but terribly hard to use.

Do you remember EJB 1 or 2? CORBA? Any technology designed by a committee, is going to be "correct", but overengineered and painful to use. And you don't want that.

The best design is discovered by the one who has to use it.

2. To manage our fear.

It takes balls, to make a ground change in a large code-base without tests, and say "it's done" without introducing bugs in the process, doesn't it? Well, the truth is, if you say "it's done", most of the time you are either ignorant, reckless, or just plain stupid. It's like with concurrency: everybody knows it, nobody can do it well.

Smart people are scared of such changes. Unless they have good tests, with high code coverage.

TDD allows to manage our fears, by giving us proof, that things work as they should. TDD gives us safety

3. To have fast feedback.

How long can you code, without running the app? How long can you code without knowing whether your code works as you think it should?

Feedback in tests is important. Less so for front-end programming, where you can just run the shit up, and see for yourselves. More for coding in the backend. Even more, if your technology stack requires compilation, deployment, and starting up.

Time is money, and I'd rather earn it, than wait for the deployment and click through my changes each time I make them.

And that's it. There are no more reasons for TDD whatsoever. We want Good Design, Safety, and fast Feedback. Good tests are those, which give us that [4] Ok, Iím lying a bit in here. Authors like Kent Beck and Tomasz Kaczanowski mention more reasons. For example: using TDD in communication (Behavior Driven Development and user stories are a great example). In my practice, though, Iíve found only those three to be of a great importance. Others were easily replaced with different (sometimes better) tools and methods than TDD. Your mileage may vary, so find out whatís best for you..

Bad tests?

All the other tests are bad.

The bad practice

So how does a typical, bad test, look like? The one I see over and over, in close to every project, created by somebody who has yet to learn how NOT to build an ugly dungeon, how not to pour concrete over your code base. The one I'd write myself in 2005.

This will be a Spock sample, written in groovy, testing a Grails controller. But don't worry if you don't know those technologies. I bet you'll understand what's going on in there without problems. Yes, it's that simple. I'll explain all the not-so-obvious parts.

So we have a controller. It's an outlet controller. And we have a test. What's wrong with this test?

The name of the test is "should show outlet". What should a test with such a name check? Whether we show the outlet, right? And what does it check? Whether we are redirected. Brilliant? Useless.

It's simple, but I see it all around. People forget, that we need to:

Trick 1: Verify the right thing

I bet that test was written after the code. Not in test-first fashion.

But verifying the right thing is not enough. Let's have another example. Same controller, different expectation. The name is: "should create outlet insert command with valid params with new account"

Quite complex, isn't it? If you need an explanation, the name is wrong. But you don't know the domain, so let me put some light on it: when we give the controller good parameters, we want it to create a new OutletInsertCommand, and the account of that one, should be new.

The name doesn't say what 'new' is, but we should be able to see it in the code.

If you are new to Spock: n*mock.whatever(), means that the method "whatever" of the mock object, should be called exactly n times. No more no less. The underscore "_" means "everything" or "anything". And the >> sign, instructs the test framework to return the right side argument when the method is called.

So what's wrong with this test? Pretty much everything. Let's go from the start of "then" part, mercifully skipping the over-verbose set-up in the "given".

1 * securityServiceMock.getCurrentlyLoggedUser() >> user

The first line verifies whether some security service was asked for a logged user, and returns the user. And it was asked EXACTLY one time. No more, no less.

Wait, what? How come we have a security service in here? The name of the test doesn't say anything about security or users, why do we check it?

Well, it's the first mistake. This part is not, what we want to verify. This is probably required by the controller, but it only means it should be in the "given". And it should not verify that it's called "exactly once". It's a stub for God's sake. The user is either logged in or not. There is no sense in making him "logged in, but you can ask only once".

Then, there is the second line.

1 * commandNotificationServiceMock.notifyAccepters(_)

It verifies that some notification service is called exactly once. And it may be ok, the business logic may require that, but then... why is it not stated clearly in the name of the test? Ah, I know, the name would be too long. Well, that's also a suggestion. You need to make another test, something like "should notify about newly created outlet insert command".

And then, it's the third line.

0 * _._

My favorite one. If the code is Han Solo, this line is Jabba the Hut. It wants Hans Solo frozen in solid concrete. Or dead. Or both.

This line, if you haven't deducted yet, is "You shall not make any other interactions with any mock, or stubs, or anything, Amen!".

That's the most stupid thing I've seen in a while. Why would a sane programmer ever put it here? That's beyond my imagination.

No it isn't. Been there, done that. The reason why wrote such a thing is to make sure, that I covered all the interactions. That I didn't forget about anything. Tests are good, what's wrong in having more good?

I forgot about sanity. That line is stupid, and it will have it's vengeance. It will bite you in the ass, some day. And while it may be small, because there are hundreds of lines like this, someday you gonna get bitten pretty well. You may as well not survive.

And then, another line.

Outlet.count() == 0

This verifies whether we don't have any outlets in the database. Do you know why? You don't. I do. I do, because I know the business logic of this domain. You don't because this tests sucks at informing you, what it should.

We expect the object we've created in the database, and then we verify whether it's account is "new". And we know, that the "new" means a specific account number and type. Though it screams for being extracted into another method.

Then we have some flash message not set. And a redirection. And I ask God, why the hell are we testing this? Not because the name of the test says so, that's for sure. The truth is, that looking at the test, I can recreate the method under test, line by line.

Isn't it brilliant? This test represents every single line of a not so simple method. But try to change the method, try to change a single line, and you have big chance to blow this thing up. And when those kinds of tests are in the hundreds, you have concrete all over you code. You'll be able to refactor nothing.

So here's another lesson. It's not enough to verify the right thing. You need to

Trick 2: Verify only the right thing

Never ever verify the algorithm of a method step by step. Verify the outcomes of the algorithm. You should be free to change the method, as long as the outcome, the real thing you expect, is not changed.

Imagine a sorting problem. Would you verify it's internal algorithm? What for? It's got to work and it's got to work well. Remember, you want good design and security. Apart from this, it should be free to change. Your tests should not stay in the way.

How should the test above look like? First, letís notice, that there are three potential business rules worth testing in here.

First, when we create outlet, we should have an OutletInsertCommand in our database, but we shouldnít have an Outlet yet (the change has to be accepted by a supervisor). Those two lines check that:

Outlet.count() == 0

OutletInsertCommand.count() == 1

Second, the supervisor (called acceptor in here) should be notified. And this is verified as well:

1 * commandNotificationServiceMock.notifyAcceptors(_)

Third, we can see, that the command in question, should have a "new account", and this state is represented like that:

Everything else is either irrelevant to the business goal, or just complete rubbish. We should be able to change everything else, as long as those three rules are not broken.

So what should we do? Is it enough to delete everything else from the test and just leave the lines mentioned above? Not really.

If we started with a test first, we would have those three requirements represented as three different tests, because we start with a name of the test only. And thatís a much better solution, because every time we break a business expectation, we will know exactly which rule we broke. In other words, itís better for a test, to have just a single business reason to fail.

Do you understand what's going on? If you haven't seen it before, you may very well not. The "where" part, is a beautiful Spock solution for parameterized tests. The headers of those columns are the names of variables, used BEFORE, in the first line. It's sort of a declaration after the usage. The test is going to be fired many times, once for each line in the "where" part. And it's all possible thanks to Groovy's Abstract Syntaxt Tree Transofrmation. We are talking about interpreting and changing the code during the compilation. Cool stuff.

This static closure, is telling Grails, what kind of validation we expect on the object and database level. In Java, these would most probably be annotations.

And you do not test annotations. You also do not test static fields. Or closures without any sensible code, without any behavior. And you don't test whether the framework below (Grails/GORM in here) works the way it works.

Oh, you may test that for the first time you are using it. Just because you want to know how and if it works. You want to be safe, after all. But then, you should probably delete that test, and for sure, not repeat it for every single domain class out there.

This test doesn't even verify that, by the way. It's a unit test, working on a mock of a database. It's not testing the real GORM (Groovy Object-Relational Mapping, an adapter on top of Hibernate). It's testing the mock of the real GORM.

So if TDD gives us safety, design and feedback, what does this test provide? Absolutely nothing. So why do we put it here? Because our brain says: tests are good. More tests are better.

Well, I've got news for you. Every single test which does not provide us safety and good design is bad. Period. Those which provide only feedback, may be thrown away the moment you stop refactoring your code under the test. Assuming you ever stop. I tend to use the Boy-Scout Rule ("Always leave the code you visit cleaner than you found it"), so I never stop refactoring. I just do it "by the way".

So here's my lesson number four:

Trick 4: Provide safety and good design, or be gone

That was the example of things gone wrong. What should we do about it?

The answer: delete it.

But I yet have to see a programmer who removes his tests. Even so shitty as this one. We feel very personal about our code, I guess. I definitely do. So in case you are hesitating, let me remind you what Kent Beck wrote in his book about TDD:

The first criterion for your tests is confidence. Never delete a test if it reduces your confidence in the behavior of the system.

The second criterion is communication. If you have two tests that exercise the same path through the code, but they speak to different scenarios for a readers, leave them alone. [1]

Now you know, it's safe to delete it.

Units of measurement

Imagine this: you have a booking system for a small conference room in a small company. By some strange reason, it has to deal with off-line booking. People post their booking requests to some front-end, and once a week you get a text file with working hours of the company, and all the bookings (for what day, for how long, by whom, submitted at what point it time) in random order. Your system should produce a calendar for the room, according to some business rules (first come, first served, only in office business hours, that sort of things).

As part of the analysis, we have a clearly defined input data, and expected outcomes, with examples. Beautiful case for TDD, really. Something that sadly never happens in the real life.

IllegalArgumentException is fair enough. We don't need to handle it in a more fancy way. We are done for now. Let's finally write the class under the test: BookingCalendarGenerator.

And so we do. And it comes out, that the whole thing is a little big for a single method. So we use the power of Extract Method pattern. We group code fragments into different methods. We group methods and data those operate on, into classes. We use the power of Object Oriented programming, we use Single Responsibility Principle, we use composition (or decomposition, to be precise) and we end up with a package like this:

We have one public class, and several package-scope classes. Those package scope classes clearly belong to the public one. Here's a class diagram for clarity:

Those aren't stupid data-objects. Those are full-fledged classes. With behavior, responsibility, encapsulation. And here's a thing that may come to our Test Driven minds: we have no tests for those classes. We have only for the public class. That's bad, right? Having no tests must be bad. Very bad. Right?

Wrong.

We do have tests. We fire up our code coverage tool and we see: 100% methods and classes. 95% lines. Not bad (I'll get to that 5% of uncertainty in the next post).

But we have only a single unit test class. Is that good?

Well, let me put some emphasis, to point the answer out:

Trick 5: It's a UNIT test. It's called a UNIT test for a reason!

The unit does not have to be a single class. The unit does not have to be a single package. The unit is up to you to decide. It's a general name, because your sanity, your common sense, should tell you where to stop.

So we have six classes as a unit, what's the big deal? How about if somebody wants to use one of those classes, apart from the rest. He would have no tests for it, right?

Wrong. Those classes are package-scope, apart from the one that's actually called in the test. This package-scope thing tells you: "Back off. Don't touch me, I belong to this package. Don't try to use me separately, I was design to be here!".

So yeah, if a programmer takes one of those out, or makes it public, he would probably know, that all the guarantees are voided. Write your own tests, man.

How about if somebody wants to add some behavior to one of those classes, I've been asked. How would he know he's not breaking something?

Well, he would start with a test, right? It's TDD, right? If you have a change of requirements, you code this change as a test, and then, and only then, you start messing with the code. So you are safe and secure.

I see people writing test-per-class blindly, without giving any thought to it, and it makes me cry. I do a lot of pair-programming lately, and you know what I've found? Java programmers in general do not use package-scope. Java programmers in general do not know, that protected means: for me, all my descendants, and EVERYONE in the same package. That's right, protected is more than package-scope, not less a single bit. So if Java programmers do not know what a package-scope really is, and that's, contrary to Groovy, is the default, how could they understand what a Unit is?

How high can I get?

Now here's an interesting thought: if we can have a single test for a package, we could have a single test for a package tree. You know, something like this:

We all know that packages in Java are not really tree-like, that the only thing those have with the directory structure is by a very old convention, and we know that the directory structure is there only to solve the collision-of-names problem, but nevertheless, we tend to use packages, like if the name.after.the.dot had some meaning. Like if we could hide one package inside another. Or build layers of lasagne with them.

So is it O.K. to have a single test class for a tree of packages?

Yes it is.

But if so, where is the end to that? Can we go all the way up in the package tree, to the entry point of our application? Those... those would be integration tests, or functional tests, perhaps. Could we do that? Would that be good?

The answer is: it would. In a perfect world, it would be just fine. In our shitty, hanging-on-the-edge-of-a-knife, world, it would be insane. Why? Because functional, end-to-end test are slow. So slow. So horribly slow, that it makes you want to throw them away and go some place where you would not have to be always waiting for something. A place of total creativity, constant feedback, and lightning fast safety.

And you're back to unit testing.

There are even some more reasons. One being, that it's hard to test all flows of the application, testing it end-to-end. You should probably do that for all the major flows, but what about errors, bad connections, all those tricky logic parts that may throw up at one point or another. No, sometimes it would be just too hard, to set up the environment for integration test like that, so you end up testing it with unit tests anyway.

The second reason is, that though functional tests do not pour concrete over your code, do not inhibit your creativity by repeating you algorithm in the test case, they also give no safety for refactoring. When you had a package with a single public class, it was quite obvious what someone can safely do, and what he cannot. When you have something enclosed in a library, or a plugin, it's still obvious. But if you have thousands of public classes, and you are implementing a new feature, you are probably going to use some of them, and you would like to know that they are fine.

So, no, in our world, it doesn't make sense to go with functional tests only. Sorry. But it also doesn't make sense to create a test per class.

It's called the UNIT test, for a reason.

Use that.

Conclusion

Test Driven Development is easy to learn, hard to master. Like with all powerful tools, one can do a lot of harm, when using it improperly. And with a tool so simple to use, that happens more often than not. Itís when you hear people cursing and blaming TDD for being slow and unhelpful.

TDD is just a method. Itís a way, not the destination. Whenever you seem to have a problem with TDD, or tests in general, it really helps, to recall what the goal is. Whenever someone gets angry at tests in a project, whenever people try to rediscover the wheel by using integration or functional test only, itís usually because they used TDD in a very wrong way.

My goal is to create cool software in an easy and pleasant way. I love my job. There are times, when I hate it as well. Iíve found that proper TDD helps me reduce the number of times, when I hate it. But improper TDD has exactly the opposite effect. What works in one context, may not work in another. What works for one person, may not have the same effect on somebody else. Itís crucial to be pragmatic and reflect on the methods and tools, instead of using them blindly. Drugs, guns and replicants [4] can be pretty dangerous, when used improperly. Letís share the best practices for TDD, before they make it illegal.