One thing that slows developers down, other than the QWERTY keyboard, is running tests. Test-driven development is widely accepted as the best way to write code. Yes, it takes time to write tests, but it is an investment which leads to more resilient code as development progresses. Once they get into the flow, it becomes a joy for developers to run the tests and feel confident in their changes. They have instant feedback on whether they have broken anything [1].

Unfortunately, as the project ages and grows, the time it takes to run the tests increases from a few seconds to several minutes, or even hours. The tight feedback loop that made developers nibble has gone.

You look at ways to make the tests run faster. You look for any tests that you can remove. You look for any tests that you can speed up. You look at running tests in parallel.

Maybe you make some progress and the tests run twice as fast. Great! Unfortunately, two months later you are back in the same position, only this time you already have all your optimizations in place. Things are just not scaling.

[1] Of course, we all know that passing tests does not actually prove that nothing is broken, but even the smallest set of tests will tell us that not everything is broken.

Increasing Productivity

In this blog post, I am going look at how to increase developer productivity, by running your tests in a distributed manner.

Similar to the wave of “big data” technologies, such as Hadoop and NoSQL, where we realized that if we scale horizontally across machines we can increase the volume and speed at which we can process data, I am going to apply the same logic to running our tests.

Why PaaS?

With the following example, I am going to focus on Ruby Rspec tests. I have run distributed Ruby with Hadoop before, and it would be possible to make that work, but big data solutions are designed primarily for processing data. PaaS is designed for long running, dedicated, and often complex, applications.

We want fine grain control on our test environment. We want to set up a test environment similar to our development environment, or ideally close to production. PaaS gives us that and allows us to quickly replicate that environment horizontally across a large cluster of machines.

At its core, PaaS is designed for running the code that developers create and scaling it horizontally. Tests are simply more of the same code that needs to be run in a specific environment.

Running tests often requires creating throw-away databases, which is another core feature of most PaaS solutions, like Stackato.

Where We’re At

As an example, let’s look at some Ruby Rspec tests, that I am recently familiar with, from the Cloud Foundry Cloud Controller, which is a component of the Cloud Foundry open-source project, which Stackato is based on.

For me, it takes about 2 hours to run these Cloud Controller Ruby Rspec tests in the basic way, with no optimizations. This uses SQLite3 and runs all the tests sequentially.

The configured Travis-CI test run uses either MySQL or PostgreSQL, instead of SQLite3, and runs tests in parallel across 3 threads. This optimized test run takes around 10 minutes on Travis-CI.

7695 examples, 0 failures, 2 pendings
Took 589.385694927 seconds

You can see that we get a nice speed improvement from using parallel_rspec to run tests in parallel across 3 threads. Obviously, a real database helps too. Can we do better?

The limitation here is that all the tests are limited to a single machine. We are always going to limited in how far we can scale vertically. Multiple threads may be able to make use of multiple cores and each thread also uses its own dedicated database, but we are still constrained to the local resources on the machine they run on.

Scaling Horizontally

If we can run 7695 tests in 10 minutes on one machine, then how many machines would we need to bring that down to 10 seconds? About 60, if we naively assume zero overhead for distributing the tests.

That is 60 machines and potentially 60 dedicated databases, depending on the test runner setup.

PaaS Provisioning and Orchestration

Using the Stackato or Cloud Foundry REST API (or command-line tools), we can programmatically provision as many PostgreSQL or MySQL databases as I need. We can create an application that manages running the tests in a distributed way and scale it across a large cluster.

Stackato Frankenstein

Since Stackato can run anywhere, we could potentially take a bunch of random machines and build a Stackato cluster with them. We could even run Stackato on every developer's laptop, forming them into a single Stackato cluster, and distributing our tests across them. Think of it as SETI@Home for developers running tests. Would you let 10 other developers run a 10th of their tests on your machine, if you could run your tests 10 times faster?

Scale Up and Scale Down

Due to the speed in which an application can be scaled up, it is possible that our distributed testing application could be scaled up on-demand and scaled down when not in use. It could momentarily consume a large percentage of unused resources on the Stackato cluster, returning them when the test run completes. The speed at which it could process the tests would depend on how much resources were available at the time.

Designing the Application

Stackato and Cloud Foundry run "applications". Therefore I will refer to our test runner as an "application" and Cloud Foundry's Cloud Controller as the "project", to save confusion.

What does our test runner application look like?

Centralized Code Checkout

We will need to checkout the latest Cloud Controller code from GitHub. If we use the filesystem service, that Stackato provides, we can use the same instance of this checked-out code across all application instances.

Test Run Requests

Our test runner application will need to receive requests from the developer to run the tests at a specific git branch or commit.

For this test run job submission we can use a HTTP REST API interface. This will allow us to easily POST and DELETE test run jobs. We can also GET the results or current status of our submitted test run. Ruby-on-Rails may be a good choice for quickly implementing this REST API.

Since all instances of the test runner application will be identical, any instance can receive POSTed jobs. These instances are load-balanced by Stackato's router, which itself might be load-balanced behind a hardware load-balancer.

It might also be nice if we can pass a patch to apply to the checked-out code, allowing us to test uncommitted changes, but let’s leave that addition to version 2.

Orchestration

Inter-instance communication will be needed to distribute the Rspec tests across the cluster. We can use Stackato's RabbitMQ service to provide a way to distribute individual Rspec tests. All the tests can be sequentially added to a RabbitMQ Work Queue of which all the workers (all instances of our test runner application) are subscribed. Each test runner picks off the next Rspec test in the queue and runs it. Results can either be posted back over RabbitMQ or entered directly into a central database (see below).

Notification of failed tests is best done over RabbitMQ. This helps with implementing things like fail_fast, which would immediately terminate a test run on the first failure.

Central Database

We will need a central database to store and orchestrate the test run jobs. Only one job can be run at a time and we need to know when a job has completed successfully or failed. Stackato gives us many choices for this database. It could be PostgreSQL, MySQL, Redis, MongoDB or even Memcached. An relational database, such as PostgreSQL or MySQL, is probably going to be our best bet. This will allow for table locking and transactions, which will help with the orchestration. There may also be some off-the-shelf Ruby gems or Rails plugins that do a lot of this job management for us.

Test Databases

Test databases will need to be provisioned. One for each instance of our test runner application, so that there are no collisions when running Rspec test cases across test runner instances. The best way to do this is provision them ahead of time, giving them incremental names that can then be coupled with each test runner application instance via the instance_index, mentioned above.

As mentioned above we need to have a filesystem for checking out the code, which is shared between all instances of the test runner application. We will name this "gitcheckout".

We will name our databases for storing and orchestrating jobs, simply "jobs". We will call our RabbitMQ message queue "mq". Finally, our list of provisioned test databases with their incremental names.

This all goes into our stackato.yml file, which is pushed to Stackato alongside the application code when we deploy it. At this time the databases instances, RabbitMQ instances and the filesystem instances will all be provisioned.

Conclusion

This is merely a rough blueprint of what could be achieved with PaaS to help developers run highly distributed Rspec tests. The purpose being to dramatically speed up the time it takes to run tests and increase the feedback loop frequency on large projects.

Being able to horizontal distribute the tests would dramatically speed up the time taken to run the tests and let developers know when things are going downhill much sooner. This will result in an earlier course correction when things are going wrong. It also makes developers more engaged in writing code and running tests if they find it less laborious to utilize an overly-bloated test suite.

In part two of this post, we will build this on Stackato and see how well it works. Stay tuned and please post and any feedback on this design in the comments.

Share this post:

Phil is the Director of Engineering for Stackato at ActiveState. Stackato is an enterprise PaaS solution based on Cloud Foundry and Docker. Phil works closely with the Stackato development team and is a compulsive code-reviewer. You will see Phil regularly on ActiveState's Blog writing about Cloud Foundry, Docker, OpenStack, CoreOS, etcd, Consul, DevOps and many other cloud related technologies. Prior to coming to ActiveState, Phil worked in London for the BBC, helping build the iPlayer, and Cloudera in San Francisco, supporting Hadoop and HBase. He also spent time in Japan, where he worked for Livedoor.com. Phil has worked for several startups in Vancouver which included building several large-scale data processing applications such as real-time search engines, log indexing and a global IP reputation network.