As I have mentioned before, I have been involved in the re-platform of the Inventory system here at Shopzilla for the last 14 months. During this time we got to experiment with a couple of approaches with great success. One was the concept of blackbox testing and whitebox testing using Behaviour Driven Development (BDD). Before I explain what this is, let’s take a look at the new Inventory platform.

Our new Inventory platform switched from a feed batch processing approach to a streaming model. This meant that we had a number of services deployed, each with their own responsibilities in the pipeline. At a high level this is as follows:

A simplified view of the new inventory platform

As can be seen from the diagram, a feed is ingested by the Feed Validation Service. This validates the feed and transforms it to an internal format. The Feed Processing Service picks it up, performs delta calculations and creates individual “Offer Events” that are streamed to the downstream services where they are persisted and made available to clients.

Each of these services has a defined contract at its API and performs a subset of the overall platform functions. Together the services perform the end to end goal of ingesting a feed and making it available to clients.

We needed to find a way to test not only if a service worked in isolation but also that the services all worked in conjunction with each other. Further, we wanted to experiment with Behaviour Driven Development (BDD). BDD has been around for several years yet it is only recently that the frameworks have begun to get mature enough to use in the development of a new platform.

In order to achieve confidence we came up with the concepts of Blackbox testing and Whitebox testing.

Whitebox testing is individual service or component granular testing. Blackbox testing tests end to end system functionality.

In more detail:

Whitebox tests

Covers all the scenarios that the service API contract should cover

The person writing the test cares about the internals of the service, the what, not the how

Fast set up, execution and tear down

A large number of tests

Each test has a lower data set up and tear down overhead, and contains targeted data validation for the scenario under test

Are not concerned about the API contract of services, rather the path data takes through the system

The person writing the tests only cares about data set up and making sure that the first service in the pipeline is invoked, data is persisted in the correct places and downstream services are notified appropriately

Slow tests that take longer to run – typically on a scheduled basis

We implemented both the Whitebox and Blackbox tests using BDD. So let’s take a look at example Gherkin files.

Note that there is neither much data setup, nor data verification. A Whitebox feature file can consist of many scenarios of the same style. So with Whitebox tests we have: small amount of data setup, small amount of data verification, many scenarios per feature.

Blackbox test Gherkin file

Here is the blackbox equivalent of processing a feed – note we don’t care what country it is, that’s left for the Whitebox tests:

We actually run our Whitebox tests as part of our Maven build. Cucumber JVM, which we selected as our BDD framework, makes the integration extremely easy via its Cucumber JUnit runner. Our builds typically take anything from one minute to 4 minutes for a full maven clean install. Note that the four minute build is only one service and it has Hadoop MapReduce based tests using MiniMRCluster which is slow in and of itself. Note: we’ve made some inroads into efficiently BDD testing using Hadoop, but that will be the subject of another blog post.

Our Blackbox tests are run at midnight and take about 40 minutes.

Developing with Blackbox tests

As you can imagine, Whitebox tests are very fast to develop with given they only test one service and are executed as part of the build. Blackbox tests, on the other hand, by their nature require a dedicated environment with all services deployed. We can’t have lots of these as it becomes resource expensive and quite unmanageable. Instead, team members have to communicate when they are using the integration environment, then wait about 2 or 3 minutes while they point the integration environment towards their dev machine. This process is slow because it:

Sets up all servers to point back to the developer’s machine so that the BDD framework can communicate with the environment when executing the tests

Deploys the latest version of the services to the integration environment, including any branches of services the developer is working on

As you can imagine, the long running nature of these tests means that the turnaround time is not great. However, we are very specific with the Blackbox tests we create and, now the platform is maturing, our addition of blackbox tests has decreased considerably. Many changes now only require updates to a service’s Whitebox tests.

BDD frameworks

Finally, a quick note on BDD frameworks. Originally we assessed executing BDD tests using Python and Lettuce and Cucumber JVM. We originally thought Python and Lettuce would be faster after our initial assessment so we used that. While using it to develop real functionality, we began to suspect that Python/Lettuce was much slower than Cucumber JVM. Not to mention that Cucumber JVM’s codebase was being regularly committed to and features were being added very quickly. So we converted a suite of tests over to Cucumber JVM to find a roughly 25% reduction in execution time. So we gradually migrated all Whitebox tests for all services to Cucumber JVM and haven’t looked back. Our Blackbox tests are still in Python and Lettuce, which we’ve gotten used to working with. There is a slow initial set up and execution time anyway for the Blackbox tests so the Python/Lettuce decision has far less of an impact. If we were to re-write the Blackbox tests, we’d probably stick with the same approach but use Cucumber JVM.

Future direction

Blackbox and Whitebox testes have worked very well for us, especially the non-coding way writing a BDD failing whitebox test makes us think about what we’re about to implement. We will continue working this way until a better method comes along.

As for the speed of the Blackbox tests, we’re always working on streamlining them. I believe as of writing we’ve just got them down to 20 minutes execution time from 40 minutes.

So how about you? Have you used BDD or the concept of Blackbox and Whitebox tests for a multi-service system? How did it work out?

Related

About Mik Quinlan

Java Technical Architect and Agile Mentor with more than 18 years experience. I specialise in helping organisations adopt Agile methods effectively, developing strategies for effective cultural change and implementing hands on with the team, and turning cultural change into returns with visible ROIs.
You may contact me via LinkedIn at https://www.linkedin.com/in/mikquinlan/ or on Twitter @MikQuinlan.