Literature abounds on testing practices and tools. However, there is not much out there, outside academia, about the concepts underpinning testing and software behavior. Yet a solid conceptual foundation is essential in any discipline to reason about it and move it forward. In this blog, I attempt to bridge the gap using my academic background and evolving personal views.

Monday, November 2, 2015

A software test
looks for the effects of a system or component in its
environment when executed under specific conditions. This can be
as simple as checking the outputs for a given input. Or it can
be more complex, such as looking for changes in the component's
surrounding environment (databases, other processes, UI, etc.)
as the component runs. Simply put, a test checks what the
component does.

Over time,
practitioners realized that small components -- units
-- can be tested in isolation to find bugs early and obtain
quick feedback in a controlled fashion. The benefits
are many, including greater automation, comprehensiveness, design
quality, and facilitation of refactoring. But there are also
common problems that cause a lot of pain: over-
and under-specification.

Test Scope

The challenge of
controlling the unit's entire environment soon became a
dominant issue. Instead of only looking (or, should I say, digging)
for the effects of the unit on external resources when testing,
we were now looking primarily at the effects
of the unit on other code pieces.

In the early days
of unit testing, it was often good enough to verify the visible
state of the unit based on return values, side effects on parameters,
and queries to its public interface. If applicable, the unit's
effects on global resources (those visible by the unit's
clients, such as the file system) would be checked too.
The following diagram, on the left, illustrates this concept.

The visible scope
is an incomplete picture, however. A unit is not a program.
Alone, the unit makes little sense. It is the interaction
with other units -- both callers and callees -- what makes
the unit valuable. Those interactions should also be
verified. Although the test assumes the caller's role,
interactions with collaborators (callees) and even
other resources (e.g., RESTful services) are often hidden
from the test's perspective, as the figure illustrates on
the right.

Fortunately, as the
need to check the environment past this visibility barrier
became evident for proper testing, tricks started to emerge
which eventually lead to the creation of modern mocking and
spying frameworks such as Mockito
and PowerMock in Java.

What vs. How

Testing a unit (or
component or system) should cover both the
calls to the unit and the underlying interactions with its
collaborators. The entirety of the effects of the unit in this
environment defines, in short, how the unit behaves.
This idea of how the unit works can also be extended to
its implementation. But is this what needs to be tested?
Are all calls and interactions relevant to testing?

Let's assume that
all effects of the unit are verified in a test scenario and the
test passes. So far, so good. The unit conforms to the
expectations of the test. Later, though, we decide to improve
the unit's implementation, adapt it to changes in other units, or add a
new capability (e.g., sorting some results) without affecting
what the unit already promises to do in that scenario. But now,
some part of the unit's state and interactions make the test
fail. Why? Because the test is over-specified. It goes too far.
It checks too much.

An over-specified
test checks how the unit works rather than just what
it is supposed to do. This poses a maintainability problem which
makes software evolution harder rather than easier -- the
opposite of what good testing is supposed to allow! We could
react by stripping down the test to a minimum or event deleting
it, but then the test could be under-specified. It could fail to
detect what the unit should really do. Pick your poison.

Specifications

Where do we find
the balance? What is the right boundary? To answer this, we need
a very important concept in software engineering:
specifications.

A specification is
a description of what a system or a part of it does under given
conditions. It can be formal or informal, documented or
implicit, but in the end it boils down to the contract with the
user or environment -- what the system requires and what it
guarantees. How those guarantees are achieved are a
different matter. That is implementation-specific.

Back to our
over-specified test, we realize after a while that perhaps it
did not matter after all whether and how often the unit reads
from the database or a cache (assuming that performance is not
part of the specification). It only matters that the unit
fulfills its goal -- its contract -- correctly. Therefore, if
the unit replaces some database queries with cache accesses, our
test should not care. It should not fail just because the unit's
internals changed.

So far, so good,
right? Well, almost. What is the unit's goal after all? What
is its specification? If we know it, we all agree on it, and
it's stable, then we are good. We make sure we check only the
behavior covered in the specification.

Often, however, we
don't know exactly what the unit's behavior should be. The
specification is still in early stages. We might disagree with
coworkers. The unit might be old and written by developers who
are no longer around. Requirements can change. All of this puts
stress on the boundary between the what (specification)
and the how (implementation) of a unit.

In Practice

So, specifications
are often unclear. In those cases, our goal is to continue
iterating on the unit and its specification. The primary role of
tests here is to aid that process. If we understand that the
specification needs to be defined eventually --
the sooner and the clearer, the better -- and achieve that goal,
we can finally focus on completing its implementation and its
regression test suite.

In all, we want to
settle on that specification so we don't have to choose between
over-specified tests that play safe but make change hard
and under-specified tests that can miss important bugs.