ACCU Buttons

Vorsprung Durch Testing

At times it might seem as if the T in TDD stands for Trendy, but
there is more to Test-Driven Development than just a statement of
fashion. There is also more to it than just testing.

It is possible to identify a subset of three motivating
practices in TDD that characterise a fairly conventional and
uncontentious form of unit testing [Henney1]: programmer
testing responsibility, automated tests and example-based test cases. These form a
unit-testing base that can be employed in the context of both
static and agile development macro processes, and were motivated
and demonstrated previously on the humble but surprisingly rich
example of a sorting function in C [Henney2]. Thus, programmers are responsible for unit
testing their work, with system-level testing a separate and
complementary role and activity; tests should be executed
automatically - execution of code by code - rather than manually;
tests are black-box tests expressed as specific examples of typical
or edge cases of using the unit under test.

The next step is to recognise that effective testing can be more
than just bug hunting. In TDD unit testing helps to support and
drive design, and vice-versa. Three more practices can be
identified that build on the core unit-testing foundation to
provide us with a micro-process component that also supports
design: active test writing, sufficient design and refactoring.
These design-focused practices expand the role of the basic
unit-testing practices: examples drive the scope of design
[Marick], programmer responsibility extends
to the suitability and quality of code over time - not just at a
single point in time - and automation underpins the practical
execution of this approach.

Active Test
Writing

Black-box testing by example is not just limited to exploring
the correctness of an implementation against an interface contract:
it is also useful for framing and presenting it, and for
formulating and exploring the contract itself. In other words,
design.

Passive testing is essentially the process whereby the feedback
of tests is limited to defect detection. Tests are typically
written some time after the code they test, where they play what is
essentially a destructive role: they cannot confirm total
correctness, only the presence of incorrectness. Although such an
approach to testing has obvious value, it can encourage an approach
to both design and testing that is overly formal and sequential.
The opportunity to learn about what is being designed, and how to
design it better as a whole, is missed [Henney3]. Defects lead to localised fixes, but the
test-writing process does not influence the key decisions in a
design, which in effect is considered frozen. The feedback loop is
too long, so there is less motivation to change things because of
the feeling of "what's done is done". The code has effectively gone
into conventional maintenance mode early, even though initial
development may be ongoing.

Active test writing adopts a more balanced perspective, using
the act of test writing as a creative exercise to balance the more
destructive intentions of test execution. Tests represent a first
point of use of an interface, and the ease or difficulty of writing
test cases gives instant feedback on the qualities of the interface
and the implementation behind it.

High coupling manifests itself in tests that are difficult or -
in simple unit-testing terms - impossible to write. For example, an
object that depends on data that could be passed in, but has
instead ended up being coupled to a configuration file or registry,
a database connection or some global variable (whether expressed
obviously as such a variable or disguised as a singleton
object).

Low cohesion manifests itself in supernumerary test cases that
test quite unrelated features, suggesting that inside a given unit
there are smaller units struggling to get out. For example, the
standard C realloc function expresses
three quite distinct behaviours: malloc,
free and, err,
realloc [Henney4].
The standard java.util package contains
miscellaneous unrelated facilities - collections, event-handling
models, date and time handling, internationalisation features...
and further miscellaneous miscellanea. It also stands as a caution
to anyone who might consider util,
utils, utilities,
utility, etc, to be a clear and cohesive
name for a header, a package, a library, etc.

In terms of organising the active part of active test writing, there are many
options. The bottom line is that writing of test code is carried
out in close proximity - both space and time - to writing of
production code. The writing of test cases and corresponding
implementation can be interleaved, with one following or preceding
the other closely, or stepped a little further apart. Being able to
write a test case first is a useful and helpful discipline, but
only dogma would suggest that its exclusive use is an absolute
requirement and a necessary prerequisite of TDD. However, although
writing test cases much later than the target code can work, both
the quality of the feedback and the motivation to do so is
weaker.

Sufficient
Design

This continuous and reflective view of design at the code face
may raise another question in some minds about the whole nature of
developing iteratively and incrementally: why not just "do the
right thing first time"? Perhaps surprisingly, I have heard this
question posed as a serious criticism, but the question itself
raises more questions about the meaning of the question and the
questioner's assumptions than it does about agile development
techniques at any level. It assumes that the "right thing" is in
some way knowable "first time" and constant thereafter. However,
the "right thing" is dependent on time and is anything but
constant, so both "right thing" and "first time" lose their simple
interpretations. The learning nature of software development pretty
much guarantees that the knowledge of what it is to be built and
how it can be built are moving targets. While they may not
necessarily be wild and erratic, their variability stands to
undermine any approach that is based on constancy and precognition.
The difference between a process with no variables and one with
some is the difference between defined and empirical processes.
Treating an empirical process as a defined process is a problem
waiting to happen [Schwaber].

Yet there can still be a lingering sense that sorting everything
out up front is both reasonable and do-able, leading one way or
another to a big up-front
design (BUFD) phase (see sidebar, "Big", as in "a Lot of", not just "a Bit
of"). This inevitably leads to overdesign. Design based
on assumptions that turn out to be incorrect needs to be reworked,
often quite late. Design that tries to tackle uncertainty by being
less specific becomes lost in technical detail focused on
generality rather than on the actual problems that need to be
addressed. At the opposite end of the spectrum is no up-front design (NUFD), which represents
a failure to exercise, in a timely manner, even the most basic
knowledge about what is to be developed. An approach based on a
view that accepts change but seeks stability is likely to be a more
reasoned one, albeit a little rougher in its detail up front, where
roughness implies sketched rather than shoddy. An approach based on
what I have referred to in recent years as rough up-front design (RUFD) can steer
this middle path. Establish a stable baseline architecture that
expresses a common vision and a sketch of what is to be worked on,
without wasting time on details that are better expressed and
handled in code or that are best left until more concrete knowledge
is available. Note that stable is not the same as static, so the
architecture is open to change rather than being frozen. This
approach can also be dubbed sufficient
up-front design (SUFD).

"Big", as in "a Lot of", not just "a Bit
of"

It is worth clarifying what BUFD (or BDUF, as it is also known)
entails, because this appears to be an occasional source of
confusion. For example, misunderstanding its meaning can lead to
proclamations such as the following [Spolsky]:

I can't tell you how strongly I believe in Big Design Up Front,
which the proponents of Extreme Programming consider anathema. I
have consistently saved time and made better products by using BDUF
and I'm proud to use it, no matter what the XP fanatics claim.
They're just wrong on this point and I can't be any clearer than
that. And, to demonstrate the point, Joel Spolsky makes available
for download a so-called functional spec of a commercial product,
codenamed Aardvark. However, the deeds do not support the words.
The document may have been written up front, but hunt all you like
for big design because you won't find it. Strong belief and pride
appear to have clouded correct use of accepted terminology.

The accepted archetype of BUFD arises from the strict waterfall
approach of defining development as a precisely phased pipeline of
activities, so that requirements analysis strictly precedes design
activity, which strictly precedes coding, which strictly precedes
testing. In a bid to reduce risk from unknowns later in the
lifecycle, a BUFD approach doesn't just do a bit of design up
front, it does a lot. Hence the use of the term big rather than a bit of or some. The BUFD path is paved with good
intentions - even if somewhat suspect - but the idea is that the
design goes into a lot of detail, specifying internal structure to
the nth degree - from packages and classes right down to private
methods and private data. In essence, a blueprint that supports a
plan-driven model of development.

However, at the beginning of the Aardvark spec is the following
note:

This specification is simply a starting point for the design of
Aardvark 1.0, not a final blueprint. As we start to build the
product, we'll discover a lot of things that won't work exactly as
planned. We'll invent new features, we'll change things, we'll
refine the wording, etc. We'll try to keep the spec up to date as
things change. By no means should you consider this spec to be some
kind of holy, cast-in-stone law.

So, of all the things this spec might be, a big, up-front design
document is not one of them. It makes this quite clear to the
reader by describing itself as "a starting point for the design"
not "the design". Reading further into the spec uncovers frequent
use of words such as "maybe", "probably" and "possibly" to describe
certain technical decisions. And then there is the length of the
document itself: twenty pages. When you strip away the extraneous
details, such as the front cover, preamble and the neo-Hungarian
coding conventions, you are left with a shorter document that
outlines some of the core requirements, proposes a user interaction
model and sketches a few features of the architecture. The document
is also not heavy on text and is fairly generous with its use of
spacing. Whichever way you look at it, this is not big design.
Which all comes as a welcome relief, but does rather undermine the
claim of its author.

Advocates of genuine BUFD would regard the Aardvark spec as
incomplete and insubstantial, lacking detailed specifications of
code structure or the look and feel of the application. They would
tar it with the same brush that the article uses to daub XP. I
believe that the contrast the article is trying to make is to
compare no up-front design with some up-front design, not with big
up-front design. Joel Spolsky is actually advocating a design
approach based on sufficiency, exploration and incrementalism. So
although he may not be on the same page as XP advocates, he is many
pages short of being a fully paid-up BUFD practitioner.

Sufficient design in TDD manifests itself in test-bounded design
increments, where tests describe the scope of what is being worked
on at any point in time. This moderates creeping featurism, cuts
extraneous code and encourages incremental and measured progress.
Active testing supports the goal of sufficient design by keeping
the role of functions, classes and packages clearly defined. Tests
bound the functional behaviour of these units, keeping them
'honest' with respect to their current role in the enclosing
system.

Driving the design from the baseline architecture through tests
leads to more cleanly separated units with a close dependency
horizon (a dependency occurs where one unit, e.g. class or header,
depends on another unit for its definition, e.g. inheritance or
inclusion, and the dependency horizon for a given unit is where its
dependencies end, i.e. where its immediate dependencies, and their
immediate dependencies in turn, and so on, have no further
dependencies). Of course, there needs to be some coupling at
certain levels otherwise, by definition, no coupling results in no
system.

Refactoring

A dirty kitchen is a disgrace to all concerned. Good cookery
cannot exist without absolute cleanliness. It takes no longer to
keep a kitchen clean and orderly than untidy and dirty, for the
time that is spent in keeping it in good order is saved when
culinary operations are going on and everything is clean and in its
place. Personal cleanliness is most necessary, particularly with
regard to the hands.

This is the very motivation and essence of refactoring.
Refactoring preserves the functional behaviour of a piece of code
while changing - and, one hopes, improving - its developmental
qualities. Refactoring is a stable and local change, typically
motivated by a required change in functionality. Operational
behaviour, such as performance or memory usage, may change, but
improvement of operational qualities rather than developmental
qualities is the focus of the similar but distinct activity of
optimisation.

Changes to functionality may follow the line of the existing
code easily, requiring no more than a consistent extension or
in-place modification of the code. At other times a change in
functionality may also suggest a change in implementation of an
interface. An existing implementation may be OK in other respects,
but may support the functionality change poorly, requiring undue
effort to implement it. For example, the need to perform general
date arithmetic on an existing date representation that favours
presentation over calculation, such as YYYY-MM-DD, suggests that a change in representation
may be appropriate before extending the functionality [Henney5]. Alternatively, the quality of an existing
piece of code may generally be poor, caught in a tangle of
spaghetti flow or spaghetti inheritance. For example, a self-aware
class hierarchy, where the root of the class hierarchy depends on
other classes in the hierarchy, can be a troublesome knot in the
dependency graph of a program, rather than an exemplary pattern to
be followed elsewhere.

Refactoring acknowledges that we can lay down code in confidence
but still learn better ways of achieving the same end. Indeed, it
is more than this: the learning is not simply passive; it is put
into practice and draws from practice. Of course, there is a risk
that making such a change is not necessarily an improvement: any
modification runs the risk of introducing a bug. Therefore,
practise with a safety net: refactoring should be undertaken with a
clear head, with another pair of eyes, with tools, with tests, or
with any suitable combination of these. In the context of a
test-driven approach, test cases offer a regression test suite that
act as a baseline for both refactoring and optimisation.

Given that the inevitability of change is one of the few
constants in software development, this active acknowledgement and
positive support of change through tests is reassuring. Refactoring
is the other side of the design coin from what we might consider to
be prefactoring. Refactoring
adjusts the design vision and detail after the fact to balance the
formulation beforehand.

Test Match
Report

Test-Driven Development is a bar-raising, learning process.
Removing the tests leaves the safety net at ground level and
knowledge localised, isolated and transitory. A TDD approach offers
more than just a pile of tests: it offers specification as well as
confirmation. Both of these reasons are sufficient to justify
writing tests that sometimes apparently test the trivial. And
specifying even the trivial to be sure that it always works means
that regression testing comes for free as part of the deal.

Another consequence of TDD is the resolution of an imbalance in
the traditional view of testing. Testing is often characterised as
a destructive activity, and one that is predominantly quantitative
in its feedback. TDD makes testing a constructive activity, with
qualitative feedback on design, not just defect reports.

TDD is not a total process: you need other complementary drivers
to move development forward. For example, an incremental macro
process where each increment is scoped with respect to functional
or technical objectives provides a good backdrop to the code-facing
emphasis of TDD. Likewise, practices such as reviewing, joint
design meetings and continuous integration support and are
supported by TDD. It is also important to distinguish TDD from XP:
although historically it emerged from XP, TDD is neither a synonym
nor a metonym for XP. Implementing XP necessitates employing TDD,
but the converse is not true. TDD fits with many different
macro-process models. There are many more programmers practising
TDD in other processes than are using it in a strict XP
environment.

References

[Henney1] Kevlin Henney, "Driven to Tests",
Application
Development Advisor, May 2005, available from
http://www.curbralan.com.