Automa Blog

The Test Hourglass

20Aug13

An interesting question in the context of test automation is which ratio of your tests should be unit tests, integration tests or GUI (/system) tests. A common answer to this problem (especially in Agile circles) is given by Mike Cohn's test pyramid:

The test pyramid argues that automated tests that run against the user interface of an application are slow, brittle and difficult to maintain, and that you should therefore have many more low-level unit tests than high-level GUI tests. The test pyramid also argues for a medium amount of "service tests", which are similar to UI tests, but use programmatic interfaces rather than the GUI to drive the application.

The test pyramid is right in many ways. It is true that tests which operate the GUI are slower and more brittle than pure unit tests. It also makes a lot of sense to use a solid amount of service tests for verifying that the components that were tested in isolation by the unit tests then work together as expected.

One aspect the test pyramid does not describe is how the ratios of different types of automated tests change as the project progresses. Tools such as Automa are increasingly used to capture GUI tests before the implementation. This style of development has many advantages, including better fleshed-out requirements due to the precise nature of executable tests, and better test coverage. When you follow such an approach, you initially have more GUI tests than service- or unit tests. At least for a short time, you thus do not adhere to the test pyramid.

The theory behind the test pyramid also does not mention several of the advantages of GUI tests. First and foremost, GUI (/system) tests are the only type of test that can really check that an application meets the customer's requirements. Saying that the unit and service tests pass will not satisfy a customer when there is a bug that can be reproduced through the GUI. Second, an advantage of system tests is that they don't have to change as often as service- or unit tests: Whenever the code changes, it is likely that a unit test will have to be updated to reflect the new implementation. On the other hand, the system tests only have to be updated when the code change actually leads to a change in the user-visible behaviour of the application. In the common case of a refactoring, for instance, this would not be the case.

So what does our test portfolio look like? Firstly, it must be pointed out that our product, Automa, comes as a Python library or console application, hence does not have a GUI. On the other hand, Automa is a tool for GUI automation, so testing its functionality effects operating a GUI, even if it is not Automa's own. A typical example of a system test for Automa would be to use its GUI automation commands to start the text editor Notepad, write some text, save the file and verify that the file has the correct contents. The commands would be sent through the command line, but result in an action in the GUI. In this way, automated system tests for Automa are not GUI tests in the pure sense of the definition, but share a lot of the characteristics of an average GUI test.

As explained in a previous post, our development proceeds in an Acceptance Test-Driven style: When starting work on a new feature, we first define a set of examples that describe the required behaviour. These examples are then turned into automated acceptance tests. Only when this is finished do we begin with the implementation.

During the implementation, we write unit-, integration- and service tests to help us with our development. We do not adhere to a dogma of having to have such and such percentage of test coverage, but rather use common sense to determine when it makes sense to add a test. This keeps our test portfolio slim, while including checks for the most important functionalities.

Since our product does not have a GUI, we cannot readily apply the concepts of the test pyramid to our process. For us, it makes more sense to speak of system-, integration- and unit tests as comprising our test portfolio. System tests are those that operate the executable binary of Automa through its command line interface. This is essentially done by piping the input and output of Automa.exe using Windows' command redirection operators > and <. The next level in our test hierarchy are integration tests, which use Automa as a Python library. They test that Automa's Python API works, but also on a more technical level that Automa's internal components cooperate as expected. Finally, we have a significant number of unit tests that check the correctness of Automa's individual functions and classes.

Using the classification from the previous paragraph, Automa's test portfolio currently consists of 38% system tests, 16% integration tests and 45% unit tests. In a picture, this would look roughly like an hourglass:

Each of the compartments of the hourglass has an area that corresponds to the relative percentage of the respective type of test in our portfolio. We call this the Test Hourglass.

The test hourglass clearly has a very different shape from the test pyramid. Does this mean we're doing something wrong? Not necessarily. Firstly, as already mentioned a few times, the test pyramid does not readily apply to our case since our system tests do not check a GUI in the conventional sense. Second, we are very satisfied with our test portfolio. It captures all cases that are important to us, yet requires little maintenance effort when some existing functionality does change. It is also not too slow, because we frequently optimize both our product and the test suite for performance. All in all, it can be said that our test portfolio is one of our greatest assets.

The hourglass shape of our test portfolio might be the result of several factors. First, the acceptance test-driven development style naturally leads to a large number of system tests. Second, while our system tests are effectively automating a GUI when exercising Automa's API, the interface to our application is not graphical. The advantages of a service test over a GUI test in the test pyramid come from avoiding brittle and slow GUI operations. If the interface to the system under test is not graphical, as in our case or for instance for a web service, then not much is to be gained by choosing a service test over a system test. We suspect that for systems with a non-graphical interface, the test hourglass could represent a good analogue to the test pyramid and graphical applications.

The test pyramid provides a useful guideline for applications with a graphical interface. However, as with all guidelines, it needs to be evaluated in the context in which it is to be applied. Novel GUI automation tools might make it feasible to have a higher ratio of GUI tests than in the past. For applications without a graphical interface, the test hourglass might present a more applicable guideline than the test pyramid. In all cases, it is important to choose the approach that is right for your particular situation, rather than blindly following an established dogma.

This article is hosted on the Automa Blog. If you would like to know more about Automa, we welcome you to visit our home page.