The test pyramid

The test pyramid

The test pyramid is a concept that was developed by Mike Cohn. It states that you should have an appropriate amount of each type of test. In the pyramid he distinguishes different types of tests:

Exploratory tests: Performed manually by a tester

System tests: Executed by a program or script that automates the UI (also known as Acceptance Tests or UI tests)

Integration tests: Executed against a layer just beneath the UI (sometimes referred to as subcutaneous tests)

Component tests: Executed against a single component of the application

Unit tests: Test a single “unit” in the software (sometimes a class, sometimes a function)

In this post I’ll be talking about automated testing, so the exploratory tests are not part of this discussion. Because of their similar nature, I will also be grouping the integration tests and component tests in one category. The theory for the testing pyramid says that you should have a good coverage (almost complete) for your unit tests, a decent coverage for the integration tests and a small coverage for system tests.

This is widely accepted as the way to implement automated testing. While the general idea is sound, I think it shouldn’t be applied blindly. When you ask yourself the question, “how should we organize or testing efforts?”, the answer, as often in software, is “it depends”. For any given software product, there are various factors in play that could skew the pyramid.

Before we look at what influences our pyramid, we have to look at the characteristics of these tests. As you go towards the top of the top pyramid, your tests will instill more confidence, as they are end-to-end tests. Conversely the more to the bottom you go the less confidence the tests will bring you, since they only test a small part in isolation. On the other hand, at the top of the pyramid, you will have slower, more brittle and harder to write and maintain tests. A unit test tends to be more deterministic, faster and easier to write.

In an ideal world, we would like our tests to be end-to-end, fast, deterministic and easy to write and maintain. Unfortunately, that’s not possible (yet). However, instead of looking at the pyramid and focusing our test efforts on following this theory, we should take this ideal goal and strive towards a test suite that matches that description.

Tooling

When this theory was presented,we didn’t have all the tools we currently have (or they where not as easily accessible as they are now). A few of the characteristics have been skewed as a result of this. To give a few examples:

Cloud infrastructure (VM’s, containers) provided us with cheap “hardware” on demand. This could solve part of the problem of slow UI tests. Instead of trying to write fast UI tests, we can just throw hardware at it. (that doesn’t mean you should write slow tests though, but sometimes it’s more cost effective to add hardware instead of manpower)

Services like BrowserStack and SauceLabs became available, allowing you to spin up tests on a variety of platforms, without a big investment.

Testing frameworks have improved. BDD has become quite popular and as a result a lot of the frameworks have been adding features.

Application frameworks have been adapted to be more loosely coupled and more flexible in the way we run them. As an example, any ASP.NET application can now be self-hosted, through the use of OWIN. If you combine that with an in-memory database, it can speed up integration tests.

Knowing that the tools have improved, we can already see that our pyramid, should perhaps be a bit taller, with a little less focus on the bottom part, and some more focus on the top (confidence!). This is still on a global level though. Depending on the application you can still change the pyramid to the need of the application.

Application

The nature of the application plays a big role in where you should put your emphasis in testing. Each layer of the pyramid is influenced by the application type.

Unit testing

Steve Sanderson wrote a great blog post about the cost and benefit of writing unit testing. I advise you to read it, as he makes great points in there. The summary of his post is that depending on what code you have, the costs and benefits are different.

The reasoning is as follows:

Algorithmic code with little dependencies is easy to test because you have a fixed set of outcomes for a fixed set of inputs. It’s also very beneficial, because code tends to be non-obvious and mistakes are easily made.

Trivial code is easy to test, but there’s little to no benefit in doing so since the chances of catching a mistake are slim

Coordinators (code that “glues” together a bunch of dependencies) is difficult to test, because you’d need a lot of mock, stubs and fakes and when you change the implementation you usually have to change the test. There’s also very little value in testing it, because this code usually doesn’t really do something, it just delegates.

Then there’s non-trivial code that has a lot of dependencies. This code is difficult to test, because there are a lot of dependencies, but it would be good to add some tests as the code can be non-obvious.

Taking this into account, when your application has a lot of algorithmic code, you should probably opt for a thicker layer of unit tests. In practice though, most of the applications I see have a majority of coordinators and trivial code (and sadly also overcomplicated code). So, unless you have a specific application that is very algorithmic in nature, I’d tend to write less unit tests. The objective should not be getting test coverage, the object is getting confidence when making changes to the software.

Integration testing

An integration test takes a few layers (or components or parts, whatever you want to call it) together and tests them as a whole. Depending on how your application is structured this can be easy or difficult. If you have an application that is very tightly coupled, it will probably be difficult to separate it from its UI. On the other hand, if you have an application that is easily configurable and flexible, it might be easier to isolate a part of it.
An example of this could be a Web API back-end with a Javascript front-end, hosted on OWIN with a database underneath. In order to just test the server part, you could self-host the Web API in your tests, use an in-memory SqLite database and run all your tests in-memory for a fast to write and run, deterministic suite of tests. In that case, put some more weight on the integration layer. If you have a legacy application that doesn’t lend itself to isolation, write less integration tests (or refactor it first).

UI testing

The UI of your application is what end users will see, so it’s very important to test. The three things holding us back from writing more UI tests are speed, brittleness and ease of writing. Let’s look at them one-by-one.

If you have a rather small application, even though your tests are relatively slow, the suite as a whole will not take very long and you can probably afford to cover most of it with UI tests. For a very large application, this will almost be impossible, unless you start parallelizing your tests (but that works against the third principle, ease of writing).

Brittleness of tests comes from an application not being deterministic under different circumstances. For example, a web app might behave differently when there’s a slow connection, a desktop app could behave differently when certain programs are installed or not.

Depending on how complex your UI is, writing UI tests can be more difficult or not. A UI where you have to go through ten different steps to execute a use case is harder to test then one where you have to do two steps. Other than that, the type of application often dictates which tools you can use. There are plenty of test tools for web applications, but very few can script interactions for a mobile application.

The amount of UI testing you want to do depends largely on the size of the application, how deterministic it is, how easy the use cases are and what tools are available.

The test-refactor cycle

The factors written above only work when you apply them in combination. You could easily find yourself in a situation where you have non-algorithmic code, that doesn’t lend itself to isolation, is big and has difficult use cases. You can’t just slim down all three layers of the pyramid. If you get into a situation like this, the application simply does not lend itself easily to automated testing. When that is the case, you could rely primarily on non-automated testing. This is a bad idea (why it is a bad idea deserves a whole different blog post) and should always be a temporary solution.

It’s better to refactor your code to make it more testable. That’s a catch 22, you can’t refactor because you need automated tests and you can’t write automated tests because you need to refactor first. The best approach here is to get into a test-refactor cycle. The idea is that you make the smallest possible refactor to allow you to write an automated test. When that test passes, you can refactor more (because now you have a test). If you keep doing this and focus on the attributes mentioned in the previous section, you’ll soon notice that you can get to a higher level of confidence.
Often it’s impossible to start these initial cycles on a unit test or integration level without doing a lot of refactoring upfront. The easiest way to do is, is through UI testing. When you have no tests, the first thing you need is confidence (the most important part of testing). UI testing will give you the most confidence and requires little to no refactoring upfront. Once you have that in place, you can start moving down the pyramid and isolate parts of the application for integration testing. When you built up some confidence through these tests, it’s time to refactor the lower level code and refactor the code into pure algorithms, coordinators and trivial code. The algorithms can then be unit tested.

Summary

The testing pyramid is a good starting point to structure your test efforts. We should skew the pyramid depending on the application though. Therefore, we should take into account various factors. To summarize, here are a few guidelines:

Algorithmic code should be tested with unit tests

Do not unit test code that is trivial or just delegates tasks

Make your application configurable so you can isolate parts of it and test it with integration tests

For legacy applications, start with UI tests, refactor and then add integration tests. Another round of refactor should expose the algorithms, which you can then unit test.

To illustrate my point, here are a few examples of applications and what their test distribution should be (the numbers are totally made up, it’s just to give an idea on where to put the emphasis):

A modern web app written in JavaScript, backed by a REST API, using Web API hosted on OWIN. The application uses Entity Framework as an ORM and serves to update an inventory of shopping items.

Unit tests: 10%. There’s very little algorithmic code and you would be stubbing out the EF-layer which gives a false sense of security.

Integration tests: 80%. You can self-host the API and attach an in-memory database. This will give you fast, reliable and fairly easy to write tests

UI tests: 10%. Since our integration tests cover a fair part of our stack, the UI tests should just cover the main use cases.

A legacy web app, built with ASP.NET Web Forms that is used for listing and browsing properties (real estate).

Unit tests: 5%. Web forms are notoriously hard to unit test because of their inherent dependencies. Extract the algorithms and put those under test. Leave the rest for higher layers.

Integration testing: 5%. If there’s no possibility to bypass the UI, it will be very hard to write integration tests. Implement the part that can be bypassed

UI testing: 90%. The application does not lend itself to other types of tests, because it wasn’t built with testability in mind. Your first job is to gain confidence in your refactors and changes to the code base. A UI test does not need refactoring and gives you a high level of confidence. Once you have that confidence, you can start refactoring the code and slowly add more weight to the Integration and Unit tests

A REST API for a bank, that allows you to send requests for calculating mortgages, loans and investments

Unit tests: 80%. The code is highly algorithmic (and the algorithms are stable). You can write tests that don’t need a lot of maintenance, because an implementation detail won’t need a change to the test

Integration tests: 20%. Since you have a REST API, you can easily write some tests that exercises the API and checks the result to see whether all units are composed correctly.

UI tests: 0%. There is no UI and the integration tests are end-to-end tests.

These examples are quite arbitrary, but if you look at the application you’re working on, you’ll surely see some similarities and I hope by applying these techniques you can gain more confidence in your code.