When discussing the topic of unit testing and methodologies they might entail (mostly TDD, i.e. Test-Driven Development), I noticed a curious imbalance in the number and strength of arguments pro and contra. The latter are few and far between, up to the point of ridiculous scarcity when googling “arguments against TDD” is equally likely to yield stories from bothsides of the fence. That’s pretty telling. Is it so that TDD in general and unit tests in particular are just the best thing ever, because there is an industry-wide consensus about them?…

I wouldn’t be so sure. All this unequivocal acknowledgement looks suspiciously similar to many other trends and fashions that were (or still are) sweeping through the IT domain, receding only when the alternative approach gains enough traction.

O RLY?

Take OOP, for example. Back in the 90s and around 2000, you would hear all kinds of praise for the object-oriented methodology: how natural it is, how it helps to model problems in intuitive way, how flexible and useful its abstractions are. Critics’ camp existed, of course, but they were small, scattered and not taken very seriously. Objects and classes were reigning supreme.

Compare this to present day, when OOP is taking blows from almost every direction. On one hand, it is rejected on performance basis, as the unknown factors of virtual method’s call are seen as a liability. On the other hand, its abstraction patterns are considered baroque, overblown and outdated, unfit for modern computing challenges – most notably concurrency and asynchronism.

Could it be that approaches emphasizing the utmost importance of unit tests are following the same route? Given the pretty much universal praise they are receiving, it’s not unimaginable. In this context, providing some reasonable counterarguments seems like a good thing: if we let some air out of this balloon, we may prevent it from popping later on.

Incidentally, this is a service for TDD/unit testing that I’m glad to provide ;-) So in the rest of this post, I’m going to discuss some of their potential drawbacks, hopefully helping to even-out the playing field. Ultimately, this should always lead to better software engineering practices, and better software.

It’s not an attire

There are several good reasons to write unit tests, including:

Prediction of long-term development process, with many significant functionality changes that are unknown beforehand, but might require fundamental codebase overhaul(s) in the future.

Very frequent, incremental releases that gradually add new features and change functionality, while the distance between development and production remains as short as possible.

Usage of dynamic, interpreted, weak or non-statically typed languages with no compile-time checks ensuring at least basic correctness of the code.

But on the other side of the spectrum, we have also several bad ones. Using unit tests as a replacement for actual QA is a well-known felony, but it’s never a harm to repeat the warning. Indeed, even 100% code coverage and thousands of passed tests will not guarantee correctness, especially if they are overlooking some real-world usage scenarios.

Another bad reason would be using TDD for the sake of TDD itself, i.e. how cool and snazzy it supposedly is. That’s similar to “being agile” in a Dilbertmanner, without giving much thought to what it’s about to accomplish and what’s the rationale behind using this technique. Incidentally, “agile” and TDD is known to often come together: a match that may actually not make much sense.

It’s also code

What is being tested by unit tests is of course the application’s code. What is used for testing is, incidentally, also code. Thinking about it for little while, we can come to all sorts of conclusions, among which the eternal dilemma of “who watches the watchers?” is actually not the most interesting one.

For me, it seems more relevant to ask how good unit tests are as a code. Since they are verifiers we use to check correctness of our program, we should definitely care about their quality. Come think about it, for best result the unit tests should actually be better craft than code they are checking, in quite similar manner how a precise measuring device is used to calibrate the cheaper one.

Yet, in how many cases this is true? Isn’t it so that the often tedious task of writing unit tests is delegated to junior developers, and people generally new to the project? Or maybe it’s done by seasoned programmers, but mostly when they are not having a particularly productive day and prefer to not touch more challenging tasks? Is the test code appropriately divided into parts, avoids repetition, has adequate number of comments, is clean and readable? Is it reviewed with at least the same scrutiny that the production code is?…

Just a food for thought, really ;)

It’s not silver bullet

According to enthusiasts of unit testing, running a test suite and seeing it light up green with 90%+ coverage is a reassuring, empowering experience. It gives us confidence and motivation to do whatever we want to do next, be it adding more features or doing some refactoring, both without the fear of breaking something in undetectable way.

Put this way, it appears like unit tests are the only thing we need to ensure high quality of code we write. As a result, we might be lulled into false sense of complacency, overlooking ways we could be doing better. Because, as it turns out, unit tests are among the least effective ways of finding bugs, with formal reviews and high volume beta tests scoring significantly higher in this regard. Not to mention that combining more than one method will – through simple probability effects – produce even better results.

It’s not free

Considering the use of unit tests, we must remember that they are always an added cost. Writing them in the first place, then maintaining, amending, enhancing, correcting, keeping in sync with requirements towards code being tested – all of these activities take time and effort. All of the added reliability, if any, comes at a cost which can be divided into several parts:

a day-to-day cost of maintaining the test suite, including the opportunity cost of writing actual application code instead

an upfront cost of setting up testing environment for a project, e.g. finding and installing necessary FooUnit (or similar) libraries, doing a research about testing practices for framework/platform/etc. we’re operating on, writing runner scripts if needed, and so on

a cost of adapting the methodology by a programmer, which includes both the small upfront cost of learning the basics, and a steadily decreasing cognitive cost of becoming gradually better at it over time

Those costs may of course be balanced out by the benefits of stopping some bugs from slipping into later stages of development, but they won’t be voided. To quote the oft-repeated cliché: There is no free lunch.

It’s not preventing bugs (and has no science behind it)

In TDD circles, that early squashing of bugs is usually referred to as “preventing” them. This typically is followed up by saying that it’s easier and cheaper to prevent bugs than to fix them. Of course, this is just a smug wordplay, unless you can somehow reach into my mind and alter its state before I type the offending line of code.

The actual claim behind this “prevention” statement is that defects introduced in early phases of development cost more to fix the later they are detected. It is attributed to research made by Robert E. Grady and supposedly comes from his 1999 publication An Economic Release Decision Model: Insights into Software Project Management. If we use it as a premise, it’s easy to see how unit tests help lower the overall costs of creating a piece of software. After all, they are all about fishing out bugs as soon as possible, preferably before they even leave the developer’s PC.

Sounds plausible… except that it’s not what Grady discovered at all, and his whole “study” is highly dubious. In reality, the perpetuated correlation between detection time and costs to fix a bug has no evidential grounds whatsoever – it’s just tribal wisdom.

…It’s not all

I could go on with further points, mentioning how usefulness of unit tests is less apparent for compiled, statically typed languages and how QuickCheck-like tools are often much better alternative. I could also add something about tests needlessly solidifying the design, interface or even implementation decisions of code they verify, discouraging change due to additional effort it always entails.

But these are the arguments which are quoted relatively often, so I preferred to concentrate on some of the others. Whether they invalidate the benefits of unit tests and/or TDD is of course up to you to decide. I don’t suspect for the answer to be unambiguous, because software engineering is often about using the right tools for the job at hand, not the same ones every time.

I see TDD not only as a way to spot bugs. For me it’s more important that TDD leads to loosely coupled code since it’s pain to instantiate a lot of dependencies in the test. Also you get arguably better APIs when you first use it in the test. That’s why writing test first is crucial.

There is one more disadvantage – writing tests is often hard and spending time figuring out how to test code might be better spent doing something else.

@up: For writing library with public API, tests-first seems indeed to be the best way. All clients will interface with your library via the API, so it’s great that tests will do the same – by writing them, you are essentially putting yourself in the shoes of users.

It’s not the case for actual applications, of course. Still, I agree that TDD may lead to higher quality code from the dependencies’ perspective. However, I always feel that the iterative nature of the process (write test -> fix/write code -> refactor -> repeat) indirectly promotes ‘hacky’ solutions just to make the tests pass. Sure, you know you’ll refactor it later, but you might be missing whole architectural picture if you’re only concentrated on “here and now” of the tests.