Harder Than It Sounds?

I was reading some of the TDD wiki at c2.com where the TDD process was described. I quote the first few points here to save you some time (though you might want to go read the original in context).

Think about what you want to do.

Think about how to test it.

Write a small test. Think about the desired API.

Write just enough code to fail the test.

It all sounds pretty easy by comparison. Ideed, the wiki says that:

Test code is easy to write. It's usually a couple calls to the server object, then a list of assertions. Writing the easy code first makes writing the hard code easy.

Funny, my experience and intuition are a little different. It can be hard to get started, or to get started well. How about an enumeration of questions here:

What is a good first test?

What makes a test easy or hard to implement?

What tests will lead to code that is not a waste of time?

How do I write tests so that I know the resulting code will integrate with the rest of the system?

What tests will make sure that there is not any unwanted side effects (like indicators set in a dialogue that aren't being reset when they need to be)?

How can I make sure that my tests aren't just testing a proposed design structure, and not that the program meets requirements?

How do I ensure that the way my test uses the system is really like the way that the system will use the code?

The one thing that I know for sure is that the code and the tests have a cross-pollinating effect. My coworkers have shown me how decisions made in testing will influence the system you're building in good ways or in bad ways,and the way you write your system will determine how you will write your tests from that point onward.

This is because there are predictable forces at work in TDD.

The first force is time pressure. You always are trying to maintain some velocity of development. This tends to drive you to do what is quickest and easiest, rather than to do hard things. It pushes you forward into implementation and pushes you to "not spend too much time refactoring". It is good and it is bad. It can work against clean design, but it can work in favor of simple design as long as the design remains simple.

The second force is locality: you want to write your code and tests to be close together, avoiding "bumper shots". When you write a test, you want to write just one function to implement it, so that there is a very short call stack between the test and the code. You tend to not want to build a whole train of interfaces. You will want to make public accessorsfor just about anything (which may be another way that Smalltalk and Python are smarter than C++ and Java, btw). You want to be "close to the code" in your tests, and "close to the test" in your code.

The third force is inertia. Tests depend on code, making the code responsible. Responsibility creates stability.

These forces together tend to 'munge' the two together. For instance, choosing to always test through a presenter in a MVP gui app will tend to push you to write implementation directly into the presenter, and of course once there is functionality in the presenter, it's easier to write the next test to use the presenter. Now moving code out of the presenter will cause tests to break, and will require refactoring time (though you have a body of tests to prove that you're doing it well) and productivity concerns will urge you to leave things as they are. Thatdecision makes it easier to do the same with the next test.

When I look at any kata or teaching example of TDD I think I all of this at work. The code is shaped (driven) by the choice of tests and the ordering of tests. If you pick a different first test, or a different way to evidence its effect, determine that it worked, the code may come out rather different. If you start with a different early implementation, then your tests may come out rather differently. Luckily the little examples don't run on long enough for all that much difference to come out.

Maybe I'm just looking at shades of gray. Whatever the solution, it will need to refactored and managed. Whatever the result it will be provably sufficient. But i have to think that there's more to writing tests than writing the tests. It's a good system overall, but it looks like a a layer of heuristics is welcome. Maybe that's what we'll get from BDD.

Tim asks "How do I ensure that the way my test uses the system is really like the way that the system will use the code?".

One way we can do that is by adhering to the top down approach that we advocate. Start with a high level integration test, i.e. FitNesse. That should give you an idea of how a client wants to talk to the system at its boundary. As you're test driving the outer most layer, you'll discover how it needs to use the next outer-most layer. As you're test driving that second layer, you'll discover the API for the third. And so on, and so on...

We've gotten pretty lazy about this on our team, in part because our customer group is not taking charge of acceptance tests. That leaves it up to us (developers) to make sure ATs happen. And even though we get by in from the customer on them, I see that we often do that late in the process. How often have we heard "the story is almost done. I just have to get the AT passing"? That's sacrilege. We're building ATs around what we've coded rather than building code around the requirements. When this happens, even though the code may be test driven at a granular level, the interface to that code is not.

It seems to me that if you start from the outside and work down exactly as you say, the completion of the AT is still the last step (assuming refactoring is done as-needed). Isn't it the AT that defines "done"? If you had the AT passing and still had UTs to do, that would seem to be a travesty, no? It would mean that you're not really driving from the AT.

Worse, it would be wrong if the last step were to get the AT *written*, but it seems perfectly acceptable that the last thing to be *finished* is that the AT passes. Maybe I'm missing something.

There is a shade of gray when the developers who are going to implement the feature are writting the acceptance tests. The developer puts on a customer hat and tries to recreate the specification in executable form via past conversations or future conversations. The problem with this is the specification becomes molded in the image of the developer, who can't write the specification without thinking about how it will be implemented in the system. The knowledge of the system will unavoidably taint the decisions of the developer. In the ideal situation, the customer writes the AT and when it is done, that is the acceptance of the feature. Now when the developer has to write the AT's, a different world emerges because the AT's are no longer customer specifications. They are the developers interpretation of a feature, written by and for, the developer, also called integration tests. They are no longer the magical client of the system, the template for behavior of the system. They can be helpful, but not the final word on what the system does.

"If you had the AT passing and still had UTs to do, that would seem to be a travesty, no?" Well of course. However, this situation wouldn't come up in the top down aproach, because the UT's would be doing something the customer didn't specify. Bring up the need for what the UT tests to the customer, maybe they will specify it, maybe it isn't important to them.

Micah Martin once said something to like of, if you had just the AT's of a system, you could recreate the system without any other knowledge. I think this is the case because they are made with the customers business model in mind, and the business model is what drives behavior of the system. The developers will always have their head buried in the sand of technology.

Reading the posts here, I come out of it with the understanding that a top-down approach would have us:

) write one or more ATs which at this time won't pass of course

) test-drive the implementation of the client interface that the ATs want, resulting in a lot of UTs

..but then, David mentions "As you're test driving the outer most layer, you'll discover how it needs to use the next outer-most layer. As you're test driving that second layer, you'll discover the API for the third. And so on, and so on...". This seems to indicate that these inner layers are all supported by their own separate UTs. Normally I would've thought that once I'd written the tests for the outermost layer, any additional layers would be the result of refactoring supported by the existing UTs -- is this not true?

Coming to think of it, there's of course one obvious reason to test-drive inner layers separately.. documentation! Other than that though, I would be very interested in hearing more about the process at work in these cases - I've not test-driven any remotely complex project so far, and thus haven't had to think much about this yet.

Well, of course the AT working is last-ish if you go top-down. It's the last layer of the stack to pop. It's kind of a problem to do refactoring last sometimes, because once you've waited until the end, the next story is beckoning and it's tempting to run the numbers up (that hawthorne effect) rather than spend the time.

The other is when you have an error in the fixture. It may be that the system is working, but the fixture is broken.

The whole thing with layers upon layers bothers me, frankly. It is hard to keep your context when you're working N layers deep. Deep code spelunking is necessary far more than it ought to be. There is a "forest and trees" thing to deal with.

I would like my tests to be really, really shallow. They have not been to date, because a lot of them are testing through the presenter in order for the tests to read more like a simulation of the UI (for customer readability). It think that there's something wrong with that, though. It ties the ATs to the UI in an implicit and superficial way. I'm vascillating between that being a good thing or a bad thing.

Tim: I would like my tests to be really, really shallow. They have not been to date, because a lot of them are testing through the presenter in order for the tests to read more like a simulation of the UI (for customer readability). It think that there's something wrong with that, though. It ties the ATs to the UI in an implicit and superficial way. I'm vascillating between that being a good thing or a bad thing.

Have you tried writing the business rule tests below the presenter, and writing presentation tests for the presenter?

I have only JUST recognized this syndrom and how the forces lay out. We are, as a team, moving more to testing the model in general. We do have the same pressure, though, to build all AT fixtures in a way that the client understands. That pressure pushes us to build all the ATs to the taste of those who read them.

I am new to TDD development. I start working on a new project where I write AT using JWebUnit. Nobody forces me to do that. I enabled caching for Hibernate and my tests run very quickly. I definitely see benefits of automated testing when I refactor. And I only have to maintain very minimum set of tests classes, which correspond to users requirements. I was working on other projects where we have to write a lot of JUnit classes, which gets broken when someone changes code. In a rush to production they get fixed much later and never. And I saw the same pattern of neglecting unit tests everywhere I worked. It made me wonder why. I think developer should only write test classes for the system that he is responsible for and other people will be using. In a bigger team it might be a single class and AT will coincide with UT. If I am one developer team I should only write AT at UI level.

One way we can do that is by adhering to the top down approach that we advocate. Start with a high level integration test, i.e. FitNesse[?]. That should give you an idea of how a client wants to talk to the system at its boundary. As you're test driving the outer most layer, you'll discover how it needs to use the next outer-most layer. As you're test driving that second layer, you'll discover the API for the third. And so on, and so on...

Actually, that's the opposite of the problem I had. Working from the top was fine, but the case I was concerned about turned out to create a kind of structured that didn't work at the API level, and I was pretty deep by the time I found out. Of course, this was because I didn't know the low-level code of the project, and was not so familiar with the platform API either. When you top-down, you run the risk of not being able to match the rubber and the road. When you bottom-up, you run a different risk. I guess that's why it's harder than it seems.

I've never TDD-ed something new, and so I only can relate to working with TDD-ing something that has to fit to an existing structure which is alient to me. I cannot easily separate the problems with the approach from the problems with the code base and unfamiliar platform. I don't have any other experiences to judge by.

As such, it seems to me that writing tests is hard, because there is a world of stuff I don't know that my tests must exist in and navigate through. I often read "writing tests is easy", but I've not experienced that to be the case. It's easy to write something naively either from the top or the bottom, but there is an art to writing the correct tests at the correct level.

I read about how it's supposed to be, but I don't experience it. That's a cognitive dissonance. I think I'll have to go and build something brand new from scratch with TDD in order to experience the magic that is supposed to happen. It implies that TDD is like all other work -- easy when you have blue skies and green fields, harder when you do not. And that TDD code may be subject to many of the same kinds of design rot as code written in other methods.

Tue, 14 Mar 2006 23:37:35, David Chelimsky, re: Does outer layer represent all the tests for next inner layer?

To some extent, yes. But there are points where parts of the inner layer become useful to multiple parts of the outer layer, and they become complex and warrant their own tests. To some, that point is immediately upon their introduction. To some, that point is never. Most of us live somewhere in the middle.

But somehow this new test for inner layer has to be derived from user requirements, otherwise it will introduce a constraint on inner layer that is not related to user requirements at all. Or in the future user might ask for new feature that will contradict that speculative test without contadicting all other real tests for upper layer.

Wed, 15 Mar 2006 17:57:01, David Chelimsky, re: Does outer layer represent all the tests for next inner layer?

There's a distinction some of us like to make between "customer tests" and "developer tests". As their names imply, customer tests, which are typically pointing only at the outermost layer, belong to the customer. They directly reflect business need.

Developer tests are a different animal. While they are in support of customer tests (and therefore customer requirements), they exist to help you drive design, provide quick failure isolation, etc. If you only test the outer most layer, failure isolation becomes difficult. If you test every single method directly, duplication may appear (A uses B, so tests on both A and B exercise behavior in B), which can make refactoring difficult. Somewhere in the middle of these two extremes lies the balance I was refering to.

Also, in some cases, you can mock B and and specify that A uses B correctly rather than specifying the effects of that correct use (which you specify in your tests of B). This is not always the best answer, nor is it never the best answer. Again - a balance is to be sought.

The important point is that these tests I'm talking about are developer tests. They exist to provide the developer with tools to do developer specific things, like writing new code, or changing existing code.

If a developer writes classes for other developers or teams he has to create all tests that directly hits his classes. Tests has to be at the boundary of the system that developer is responsible for. I think isolation of the problem is also very easy with only testing of the boundary. If code worked before and I changed only one line of code and acceptance tests failed afterward I know the problem is in that line of code.

Okay, there need to be tests at the outer boundary, but testing through the public API isn't necessarily the best way to test-drive anything. You need to be very close to your code. This is an idea that is expressed a lot in my project: that we need to test the small parts well, and then the rest is just "wiring tests" -- if I know that the alarm arms and disarms well, I only need to make sure that tripping a sensor calls the function that enables the alarm. I don't need to try to make long reach-around tests that work entirely through the interface to trigger all the possible code paths. It's a chain of trust.

So I agree that the boundary needs to be tested, but let's not undervalue the deeper tests.