Consider a complex and rich web app. Under the hood, it is probably a maze of servers, each performing a different task and most talking to each other. Any user action navigates this server maze on its round-trip from the user to the datastores and back. A lot of Google’s web apps are like this including GMail and Google+. So how do we write end-to-end tests for them?

The “End-To-End” Test

An end-to-end test in the Google testing world is a test that exercises the entire server stack from a user request to response. Here is a simplified view of the System Under Test (SUT) that an end-to-end test would assert. Note that the frontend server in the SUT connects to a third backend which this particular user request does not need.

One of the challenges to writing a fast and reliable end-to-end test for such a system is avoiding network access. Tests involving network access are slower than their counterparts that only access local resources, and accessing external servers might lead to flakiness due to lack of determinism or unavailability of the external servers.

Hermetic Servers

One of the tricks we use at Google to design end-to-end tests is Hermetic Servers.

What is a Hermetic Server? The short definition would be a “server in a box”. If you can start up the entire server on a single machine that has no network connection AND the server works as expected, you have a hermetic server! This is a special case of the more general “hermetic” concept which applies to an isolated system not necessarily on a single machine.

Why is it useful to have a hermetic server? Because if your entire SUT is composed of hermetic servers, it could all be started on a single machine for testing; no network connection necessary! The single machine could be a physical or virtual machine.

Designing Hermetic Servers

The process for building a hermetic server starts early in the design phase of any new server. Some things we watch out for:

All connections to other servers are injected into the server at runtime using a suitable form of dependency injection such as commandline flags or Guice.

All required static files are bundled in the server binary.

If the server talks to a datastore, make sure the datastore can be faked with data files or in-memory implementations.

Meeting the above requirements ensures we have a highly configurable server that has potential to become a hermetic server. But it is not yet ready to be used in tests. We do a few more things to complete the package:

Make sure those connection points which our test won’t exercise have appropriate fakes or mocks to verify this non-interaction.

Provide modules to easily populate datastores with test data.

Provide logging modules that can help trace the request/response path as it passes through the SUT.

Using Hermetic Servers in tests

Let’s take the SUT shown earlier and assume all the servers in it are hermetic servers. Here is how an end-to-end test for the same user request would look:

The end-to-end test does the following steps:

starts the entire SUT as shown in the diagram on a single machine

makes requests to the server via the test client

validates responses from the server

One thing to note here is the mock server connection for the backend is not needed in this test. If we wish to test a request that needs this backend, we would have to provide a hermetic server at that connection point as well.

This end-to-end test is more reliable because it uses no network connection. It is faster because everything it needs is available in-memory or in the local hard disk. We run such tests on our continuous builds, so they run at each changelist affecting any of the servers in the SUT. If the test fails, the logging module helps track where the failure occurred in the SUT.

We use hermetic servers in a lot of end-to-end tests. Some common cases include

Startup tests for servers using Guice to verify that there are no Guice errors on startup.

API tests for backend servers.

Micro-benchmark performance tests.

UI and API tests for frontend servers.

Conclusion

Hermetic servers do have some limitations. They will increase your test’s runtime since you have to start the entire SUT each time you run the end-to-end test. If your test runs with limited resources such as memory and CPU, hermetic servers might push your test over those limits as the server interactions grow in complexity. The dataset size you can use in the in-memory datastores will be much smaller than production datastores.

Hermetic servers are a great testing tool. Like all other tools, they need to be used thoughtfully where appropriate.

10 comments
:

Very interesting post. I will be looking into implementing something similar in our system. I have one question. If the servers are receiving requests I assume they are running inside a service container. Starting up the likes of Tomcat WebSphere jboss etc US very slow. How do you manage to include these tests in a continuous build environment , they must be slow.?

Yes, the tests are large tests and certainly some of our slower tests. We run them in a "continuous build" in the sense of running them automatically at a regular frequency such as every 15 mins or half hour. Running them at every changelist is definitely expensive. But running at every few changelists means a binary search between the last passed run and the failed run will help us track down the problem changelist fairly fast.

You are right that hermetic servers can be used in performance tests! We use them for micro-benchmarking tests and have been able to catch performance regressions in servers very early with such tests.

In addition to the points we mentioned in the article for packaging the hermetic server, performance tests do need additional hooks in the package. One of them is the ability to inject simulated latencies to the request/response times between servers. We have found that useful for modeling real servers better.

I was talking about a simpler system wherein the code that gets checked in automatically gets bench-marked and compared versus its previous runs as well. Or at a component/system level perhaps the log times for the various tasks get compared with one another.

I don't believe any hooks into the code is required at all - although logging may require some debugging capability within the system.

Just by comparing the times as the product is being built out we can see how the feature/component/system performs as the code base becomes larger.