Post navigation

ScaLearning 7 – Distributing Concurrent Tests

Like many developers who make the journey from Java to Scala, I often find myself amazed at how much easier it is to do some things, or how much easier it is to express myself in Scala.

“ScaLearning” will be a series of short blog-posts just documenting little tidbits I find interesting, confusing, amusing, or otherwise worthy of talking about.

Motivation

Recently in order to gain confidence in our web application, our team decided it pragmatic to run a series of tests over a deployed version of the application complete with production-like database. A simple suite of non-destructive tests we could run in any environment was quickly put together.

Unfortunately one of our simplest tests quickly began causing us trouble. The test emulated a search spider, crawling every link it found on the entire site in an exhaustive graph traversal complete with cycle-detection. Unfortunately, the test ran for over 24 hours without completing.

While we’ve made other optimizations to improve performance of the test (such as excluding sufficiently similar pages), the topic of today’s post is the concurrency we introduced in order to help take the edge off test time.

Distributing Concurrent Tests

Our goal was relatively simple. We wished to run a very simple test across many thousands of URLs:

Of course, this was called by a method which pushed and popped from a stack, and “markForTesting” pushed a new link onto that stack. This code worked great sequentially, but we wanted it to operate concurrently in order to minimize testing time. For this, we employed Akka’s actors:

Assuming the methods called within testUrl are thread-safe (which we also did using Actors), this will run a single test using a second Thread, and allow us to continue on with our business. However, since there’s only a single Actor here, we only have one Thread with which to process URLs. This means that we’re still effectively opening each link sequentially.

So far everything I’ve presented as code came very naturally. In fact, minor modifications for the purpose of blogging notwithstanding, we used the code I’ve presented so far to test several links across our site very successfully, in a fashion very similar to the following:

The theory was that “result” would wait for all of the answers to come back before returning. Unfortunately, that’s not quite the sequence of events the Actor sees. In reality, after digging, we figured out what messages the actor received:

(() => testUrl(“/index.html”)) occurs, which is quickly sent to a ConcurrentTest runner

“result” comes next, as it takes a second or two for the runner to open the test

(() => testUrl(_)) is received for several other URLs as links are scraped off the first page

“result” doesn’t actually wait for all the answers to come back, as it has no way of knowing how many answers are actually required. For that matter, we aren’t sure of that number either, as the test is meant to be dynamic. Instead, “result” simply compiles the answers it has so far, and then shuts down all of the actors. This means we get a “yes” or “no” about “/index.html”, but all of the other URLs are still sitting in the mailbox of Master when it’s shut down. Uh-oh!

So how do we know when we’re done? Mailbox sizes. We added a new match case to Master which would calculate if it believed the tests to be done yet:

This code is different than “result” in that it actually attempts to detect if the tests are done by:

Waiting for all currently outstanding test-methods to complete

Counting any pending messages in the router and worker mailboxes.(This should always be zero, as we’ve waited for all answers to return, but it’s still safer to be sure)

Counting any pending messages on the master

Return “true” if the total pending messages is 0, otherwise “false” as more tests have to run

This algorithm will work for us because when we run Futures.awaitAll it runs every outstanding test to completion. Any URLs found on the pages to be tested are checked against previously-visited URLs, and added to Master’s queue if they are new. Since Master is still processing “done” those tests will stay on the queue and “mailboxSize” will return a positive non-zero number. If, however, no new links are encountered, then there will be no tests waiting on the Master queue, and our “done” operation will detect 0 pending tests.

Now we sleep our thread, asking the master if it’s completed its job once every cycle, until the master claims all of its workers have completed their work and no new work is pending for the master to distribute.

Feedback as to other potential approaches is very welcome, I find the entire topic of concurrency and job distribution very interesting.