Testing mobile applications is not always an easy feat. In addition to defining what to test and determining how to write those tests, actually running tests can also be problematic — in particular, UI test suites running on real mobile devices or emulators sometimes run for an extensive amount of time.

This post talks about both how we at SoundCloud initially handled our Android UI tests with internal tooling and our migration to Firebase Test Lab for running all of our UI tests.

In the CI builds running on our Android listener application, the time it took to run the UI tests was always a bottleneck for the overall runtime of a build. So in order to not just rely on one single device or emulator, we developed tooling and services to allow us to run our UI tests on an internal device farm using 30 Nexus 5 devices in parallel. This made it so we could run our tests in about 15 minutes, as opposed to the three hours it would take if we ran them all on a single device.

Unfortunately, after years of working successfully, this system started showing signs of its age, causing multiple problems in our day-to-day development. Initially, it had a great feature set with rich UI test results and reports, video recordings of failed tests, reports of the most frequently failing tests over time, and more. But eventually, it started to slow us down. We frequently saw long build queues and failures due to infrastructure issues, and soon it became clear that we had to either fully commit to maintaining and updating the system, or else start thinking about alternatives.

We began investigating different platforms and cloud providers offering real devices and emulators in different configurations, and in the process, we realized two problems we would face.

1) Our test suite was tailored to running on physical Nexus 5 devices, and due to shared state, many tests had to be run in isolation to work correctly. Additionally, our custom test services made sure every test ran in isolation without sharing any state with other tests. However, cloud providers for running Android tests offer a variety of devices in different sizes and usually run all tests one after another, often with shared state between tests. This difference, along with the age of our test suite, resulted in multiple issues for us.

2) The size and runtime of our test suite meant that running our UI tests using a cloud provider to spin up a single device or emulator would degrade our test feedback times to multiple hours.

Making Our Test Suite More Robust

It was clear that in order to run our tests on other device/emulator farm setups, we would have to make our testing more robust. For example, some tests expected to run on the exact screen dimensions of a Nexus 5 and started failing if the screen size differed slightly. Others couldn’t handle the state other tests left behind.

Around the same time that we were looking at different solutions and experimenting with ways to improve the stability of our test suite, Google released the first versions of the Android Test Orchestrator, which wraps the test instrumentation to allow each single test to run in its own Instrumentation. The result is that tests do not share a lot of state, and each one runs more in their own sandbox. Additionally, if one test happens to crash, the other tests can still run and report their correct states; the test suite will not be interrupted immediately due to a crash.

Using the Android Test Orchestrator is as easy as adding a few lines to your project’s build.gradle file, and doing so can help stabilize a test suite quite a bit. This, of course, depends greatly on the test suite itself:

After adding this piece of setup every time tests from the androidTest configuration are run they will run with the Android Test Orchestrator.

Firebase Test Lab

Firebase is Google’s suite of many services for building great mobile applications. One part of it is the Firebase Test Lab — this enables developers to easily create and launch a test run on a real device or Android emulator hosted in a Google data center and run a set suite of tests on it. After the tests have finished running, the Firebase web UI will then display the results of each test — in addition to information such as a video recording of the test run, the full Logcat, and screenshots taken with the help of the Firebase Test Lab Screenshotter Library.

Tests can also be run on Firebase Test Lab through Android Studio. Simply sign into Android Studio with the Google account linked to your application’s Firebase project. When you run a test, select the "Cloud Testing" tab, choose the Firebase project, and configure the devices/emulators you want to run your tests on.

For our use case, which was wanting to test our Android application, the choice to go with Firebase — which will always allow us to test on the latest Android versions and use the newest test tooling such as the Android Test Orchestrator — was an easy one.

However, the main use case of Firebase Test Lab is one that allows users to upload the application APK and a test APK. Firebase Test Lab then runs all the tests contained in the test APK — or a subset of them depending on the configuration — on one device/emulator or a matrix of different ones. However, every device and/or emulator will run the same set of tests. For a test suite of our size, this would result in huge test durations.

Massively Sharding Tests

Firebase Test Lab currently does not offer splitting a test suite into multiple parts and executing smaller parts of the suite on multiple emulators in parallel to bring down testing times. However, the open source community (namely Walmart Labs) has come up with a solution for this.

Flank is a tool that allows for massively sharding an Android UI Test suite across many devices/emulators on Firebase Test Lab, at the same time, in parallel. In other words, instead of running all tests on a single emulator, it allows for scheduling a smaller amount of tests on multiple devices/emulators. Each of these shards will then execute in parallel, massively reducing the time it takes to run all tests in the process.

Flank is configured with a small file defining the device or emulator that should be used, the Android OS version, and how long each shard should run for. Flank keeps a record of the duration of each test so that it can group them together to ensure a fairly constant runtime. Additionally, as Firebase bills Test Lab usage by the minute, Flank tries to intelligently group tests together so that they use the minimal overall billable time on the devices, thereby saving costs.

By using Flank, we were able to reduce our overall test suite runtimes when compared to our previous internal systems. And in case our test suite grows larger, Flank will help us schedule our tests on more emulators, enabling us to continue to have our current test durations.

Our previous in-house testing solution gave us many features we got used to and loved, which we lost when we started using Flank. For example, after a test run, we would receive a test report in rich HTML with links to Logcat and video recordings of the test run for failing tests. Additionally, filtering tests by annotations was a way for us to easily configure different suites of tests to run, depending on the current task. However, as part of our work to migrate our UI test suite to Firebase Test Lab, we built these features into Flank and contributed them back into the main repository so that other users of Flank can enjoy them as well.

Overall, running our tests in Firebase Test Lab with Flank has helped us modernize our UI testing stack, stabilize our test suite, and even speed up test runtimes in the process, which leads to happier and more productive developers.