Living in a state of accord.

Once an application goes live, it is absolutely essential that any future changes are able to be work with the existing data in production, typically by migrating it as changes are required. That existing data and the migrations applied to it are often the riskiest and least tested functions in the system. Mistakes in a migration will at best cause a multi-hour outage while backups are restored and more likely will subtly corrupt data, producing incorrect results that may go unnoticed for long periods making it impossible to roll back. At LMAX we reduce that risk by running compatibility tests.

Sanitised Data

The first requirement for testing data migrations is testing data that as production-like as possible. It’s almost never a good idea to bring actual production data into development and it’s definitely never going to happen for a finance company like LMAX. Instead, we have a sanitiser that runs in production and generates a very heavily sanitised version of production data that still has the same “shape”. For example, it has the same number of accounts, the same distribution of open positions across instruments, null values in the same places there’s null values in production and so on. However it replaces anything that’s even vaguely personally identifiable or sensitive information. If in doubt, sanitise it out.

Despite being heavily sanitised, we still treat it as data that should be secured but it can be brought into our development, testing and staging environments.

We have multiple production environments so we have sanitised data that matches the shape of each of those environments.

Can Migrations Run?

Once we have sanitised data the most basic check is to confirm the migration process will actually complete successfully. This ensures we can release successfully but doesn’t give us any real confidence that the migration is successful. For such a primitive check it’s surprisingly effective as it picks up the common errors of changing columns to NOT NULL when the production data actually does have null values, or adding a unique constraint to tables when the content isn’t actually unique.

Did Migrations Work?

The obvious next step is to write tests to confirm that migrations actually worked. Our compatibility test jobs are setup so that we can easily write a JUnit test that will be run after migrations complete so we can verify the state.

The most direct form of test is an example based test. We select some examples from the production dataset and write assertions to check that specific bit of the data migrated in the way we expect. The down side is that we’re dealing with live production data which is regularly updated so it’s possible that our examples will change after we’ve written the test and then, correctly, migrate to something different to what we expect. Still, these are often useful to put through for a single run as a sanity test when developing the migration, then delete them.

Slightly more generic, we can write a test that assert constraints that must be true immediately after the migration completes. For example, when we made permissions more fine-grained we needed to assign a new type of payment role to every account that used real money, but not to any demo accounts (which use pretend money). We can write a test to verify that migration worked correctly quite easily, however once the migration goes out admin users may add or remove the role to different accounts and the constraint would no longer hold. For cases like that we simply delete the test once the migration has gone live at which point it’s done its job anyway. We also mark the test with a @ValidUntil annotation that makes it clear that the test has a limited life time in case we forget to delete it.

Finally, we can often identify constraints that should always be valid and write permanent tests for them. These are extremely powerful, testing not just that our current migration works correctly but that no future migration breaks that expectation.

Did Something Unexpected Happen?

The compatibility tests that should always hold true have an additional benefit – they give us early feedback that the production data has diverged from expectations for some reason, typically because a bug slipped through. We can then investigate, fix the bug to prevent any more data issues and work out how to clean up the problem.

Obviously, finding issues only after production data has gone wrong is not something we ever want to do but it’s still a useful safety net if something slips through all our pre-release testing and the database schema constraints we use. Typically when we do find issues they are minor inconsistencies that don’t cause any issues now, but are like little time bombs just waiting for a future release to assume it can’t possibly happen. So even getting feedback that late in the process often allows us to avoid any user-noticeable effects.

Making Them Easy

We have a base class we extend our compatibility tests from which makes it easy to get a connection to the database and has a few handy utilities for asserting things. By far the most useful however is the assertNoRowsMatch method. It does exactly what it says – takes an SQL query and asserts that no rows match. If any do, it prints them out so you get really useful debug information to start investigating the problem. For example:

Just like production code, you should assume things are going to go wrong in your tests and when it does you want good logging to help track down what happened and why. So just like production code, you should use a logging framework within your DSL, use meaningful log levels and think about what info you’d need in the logs if something went wrong (and what you don’t). There’s also a few things we’ve found very useful specifically in our test logging.

Log Alias to Real Name Mappings

Since the DSL uses aliases, if we want to poke around in the exchange manually to understand why a test failed, we need to know the real names to use. So whenever we create a real name for an alias we log some information about it. For example when creating an instrument:

All the key information we need is in that log statement – the alias, real name plus unique identifiers (externalId and internalId).

Name Your Threads

We use a custom JUnit test runner (via the @RunWith annotation) so we can run tests within a test suite in parallel. With tests running in parallel all their output gets mixed up and becomes hard to read. Recently we started setting the test thread names to the name of the test case.

There are a few cases where we spawn additional threads within a test (typically to pull data from long poll or other push event channels). In those cases we generally pass the thread name down with an additional prefix (e.g. LongPoll-…StopLossOffsetAndAStopLossPrice) so we can still associate that output with the right test.

Time Traveller Names

The way we allow time travelling tests to run in parallel is reasonably complex – only one thread ever actually executes the time travel and there’s a bunch of cross-thread coordination – so our thread names aren’t as useful in that little area of code. As such we give each test we’re currently running a time traveller name so we get log output from the Tardis like:

With Selenium 1, JavaScript alert and confirmation dialogs were intercepted by the Selenium JavaScript library so they never appeared on-screen and were accessed using selenium.isAlertPresent(), selenium.isConfirmationPresent(), selenium.chooseOkOnNextConfirmation() and similar methods.

With Selenium 2 aka WebDriver, the dialogs do appear on screen and you access them with webDriver.switchTo().alert() which returns an Alert instance for further interactions.

However, when you use WebDriverBackedSelenium to mix Selenium 1 and WebDriver APIs – for example, during migrations from one to the other – the alerts don’t appear on screen and webDriver.switchTo().alert() always throws a NoAlertPresentException. This makes migration problematic because selenium.chooseOkOnNextConfirmation() doesn’t have any effect if the next confirmation is triggered by an action performed directly through Web Driver.

The solution is to avoid WebDriverBackedSelenium entirely and instead create a WebDriverCommandProcessor, configure it to not emulate dialogs and pass it to DefaultSelenium.

The key method call here is setEnableAlertOverride(false) which disables the alert emulation. Note that the Selenium 1 methods for interacting with dialogs will no longer work – you have to switch all alert handling over to WebDriver.

Also note the subtle but important use of a closure to provide a Supplier<WebDriver> instead of just passing the webDriver. The overloaded constructor instance that takes a WebDriver immediately calls start() on the processor and setEnableAlertOverrides must be called before start().

It’s unfortunate, but understandable, that Selenium 1 and WebDriver style alert interactions can’t be mixed as that would make migration a lot easier. At least with this approach you can convert your alert interactions over in one batch then convert all the other interactions incrementally.

Today LMAX Exchange has released ElementSpecification, a very small library we built to make working with selectors in selenium/WebDriver tests easier. It has three main aims:

Make it easier to understand selectors by using a very English-like syntax

Avoid common pitfalls when writing selectors that lead to either brittle or intermittent tests

Strongly discourage writing overly complicated selectors.

Essentially, we use ElementSpecification anywhere that we would have written CSS or XPath selectors by hand. ElementSpecification will automatically select the most appropriate format to use – choosing between a simple ID selector, CSS or XPath.

Making selectors easier to understand doesn’t mean making locators shorter – CSS is already a very terse language. We actually want to use more characters to express our intent so that future developers can read the specification without having to decode CSS. For example, the CSS:

Much longer, but if you were to read the CSS selector to yourself, it would come out a lot like the ElementSpecification syntax. That allows you to stay focussed on what the test is doing instead of pausing to decode the CSS. It’s also reduces the likelihood of misreading a vital character and misunderstanding the selector.

With ElementSpecification essentially acting as an adapter layer between the test author and the actual CSS, it’s also able to avoid some common intermittency pitfalls. In fact, the reason ElementSpecification was first built was because really smart people kept attempting to locate an element with a classname using:

//*[contains(@class, 'valid')]

which looks ok, but incorrectly also matches an element with the class ‘invalid’. Requiring the class attribute to exactly match ‘valid’ is too brittle because it will fail if an additional class is added to the element. Instead, ElementSpecification would generate:

contains(concat(' ', @class, ' '), ' valid ')

which is decidedly unpleasant to have to write by hand.

The biggest benefit we’ve seen from ElementSpecification though is that fact that it has a deliberately limited set of abilities. You can only descend down the DOM tree, never back up and never across to siblings. That makes selectors far easier to understand and avoids a lot of unintended coupling between the tests and incidental properties of the DOM. Sometimes it means augmenting the DOM to make it more semantic – things like adding a “data-id” attribute to rows as in the example above. It’s surprisingly rare how often we need to do that and surprising how useful those extra semantics wind up being for a whole variety of reasons anyway.

In programming courses one of the first thing you’re taught is to avoid “magic literals” – numbers or strings that are hardcoded in the middle of an algorithm. The recommended solution is to extract them into a constant. Sometimes this is great advice, for example:

if(amount >1000){
checkAdditionalAuthorization();}

would be much more readable if we extracted a ADDITIONAL_AUTHORIZATION_THRESHOLD variable – primarily so the magic 1000 gets a name.

That’s not a hard and fast rule though. For example:

value.replace(PATTERN_TO_REPLACE, token)

is dramatically less readable and maintainable than:

value.replace("%VALUE%", token)

Extracting a constant in this case just reduced the locality of the code, requiring someone reading the code to unnecessarily jump around the code to understand it.

My rule of thumb is that you should extract a constant only when:

it’s reasonably easy to think of a good name from the constant – one that adds meaning to the code OR

the value is required in multiple places and is likely to change

Arbitrary tokens like %VALUE% above are generally unlikely to change – it’s an implementation choice – so I’d lean towards preserving the locality and not extracting a constant even when they’re used in multiple places. The 1000 threshold for additional authorisation on the other hand is clearly a business rule and therefore likely to change so I’d go to great lengths to avoid duplicating it (and would consider making it a configuration option).

Obviously these are just rules of thumb so there will be plenty of cases where, because of the specific context, they should be broken.