Multi-doc transactions for MongoDB

I’m glad to announce experimental support for multi-document transactions in the mgo driver that integrates MongoDB with the Go language. The support is done via a driver extension, so it works with any MongoDB release supported by the driver (>= 1.8).

Features

Here is a quick highlight list to get your brain ticking before the details:

Supports sharding

Operations may span multiple collections

Handles changes, inserts and removes

Supports pre-conditions

Self-healing

No additional locks or leases

Works with existing data

Let’s see what these actually mean and how the goodness is done.

The problem being addressed

The typical example is a bank transaction: imagine you have two documents representing accounts for different people, and you want to transfer 100 bucks from Aram to Ben. Despite the apparent simplicity in that description, there are a number of edge cases that turn it into a non-trivial change.

Imagine an agent processing the change following these steps:

Is Ben’s account valid?

Take 100 bucks out of Aram’s account if its balance is above 100

Insert 100 bucks into Ben’s account

Note that this description already assumes the availability of some single-document atomic operations as supported by MongoDB. Even then, how many race conditions and crash-related problems can you count? Here are some spoilers that hint at the problem complexity:

What if Ben cancels his account after (1)?

What if the agent crashes after (2)?

How it works

Thanks to the availability of single-document atomic operations, it is be possible to craft a sequence of changes that manipulate documents in a way that supports multi-document transactional behavior. This works as long as the clients agree to use the same conventions.

This isn’t exactly news, though, and there’s even documentation describing how one can explore these ideas. The challenge is in crafting a generic mechanism that not only does the basics but goes beyond by supporting inserts and removes, being workload agnostic, behaving correctly on crashes (!), and yet remaining pleasant to use. That’s the territory being explored.

The implemented semantics offers an isolation level that allows non-repeatable reads to occur (a partially committed transaction is visible), but the changes are guaranteed to only be visible in the order specified in the transaction, and once any change is done the transaction is guaranteed to be applied completely without intervening changes in the affected documents (no dirty reads). Among other things, this means one can use any existing mechanism at read time.

When writing documents that are affected by the transaction mechanism, one must necessarily use the API of the new mgo/txn package, which ended up surprisingly thin and straightforward. In other words for emphasis: if you modify fields that are affected by the transaction mechanism both with and without mgo/txn, it will misbehave arbitrarily. Fields that are read or written by mgo/txn must only be changed using mgo/txn.

Using the example described above, the bank account transfer might be done as:

The assert and update values are usual MongoDB querying and updating documents. The tcollection is a MongoDB collection that is used to atomically insert the transaction details into the database. As long as that document makes it into the database, the transaction is guaranteed to be eventually entirely applied or entirely aborted. The exact moment when this happens is defined by whether there are other transactions in progress and whether a communication problem occurs and when it occurs, as described below.

Concurrency and crash-proofness

Perhaps the most interesting piece of the puzzle when coming up with a nice transaction mechanism is defining what happens when an agent misbehaves, even more in a world where there are multiple distributed transaction runners. If there are locks, someone must unlock when a runner crashes, and must know the difference between running slowly and crashing. If there are leases, the lease boundary becomes an issue. In both cases, the speed of the overall system would become bounded by the speed of the slowest runner.

Instead of falling onto those issues, the implemented mechanism observes the transactions being attempted on the affected documents, orders them in a globally agreed way, and pushes all of their operations concurrently.

To illustrate the behavior, imagine again the described scenario of bank transferences:

In this diagram there are two transactions being attempted, T1 and T2. The first is a transference from Aram to Ben, and the second is a transference from Ben to Carl. If a runner starts executing T2 while T1 is still being applied by a different runner, the first runner will pick T1 up and complete it before starting to work on T2 which is its real goal. This works even if the original runner of T1 died while it was in progress. In reality, there’s little difference between the original runner of T1 and another runner that observes T1 on its way.

There’s a chance that T1’s runner died too soon, though, and it hasn’t had a chance to even start the transaction by tagging Ben’s account document as participating in it. In that case, T2 will be pushed forward by its own runner independently, since there’s nothing on its way. T1 isn’t lost, though, and it may be resumed at any point by calling the runner’s Resume or ResumeAll methods.

The whole logic is implemented without introducing any new globally shared point of coordination. It works if documents are in different collections, different shards, and it works even if the transaction collection itself is sharded across multiple backends for scalability purposes.

The testing approach

While a lot of thinking was put onto the way the mechanism works, this is of course non-trivial and bug-inviting logic. In an attempt to nail down bugs early on, a testing environment was put in place to simulate multiple runners in a conflicting workload. To make matters more realistic, this simulation happens in a harsh scenario with faults and artificial slowdowns being randomly injected into the system. At the end, the result is evaluated to see if the changes performed respected the invariants established.

While hundreds of thousands of transactions have been successfully run in this fashion, the package should still be considered experimental at this point, and its API is still prone to change.

There’s one race

There’s one known race that’s worth mentioning, and it was consciously left there for the moment as a tradeoff. The race shows itself when inserting a new document, at the point in time when the decision has been made that the insert was genuinely good. At this exact moment, if that runner is frozen for long enough that would allow for a different runner to insert the document and remove it again, and then the original runner is unfrozen without any errors or timeouts, it will naturally go on and insert the new document.

There are multiple solutions for this problem, but they present their own disadvantages. One solution would be to manipulate the document instead of removing it, but that would leave the collection with ghost content that has to be cared for, and that’s an unwanted side effect. A second solution would be to use the internal applyOps machinery that MongoDB uses in its sharding implementation, but that would mean that collections affected by transactions couldn’t be sharded, which is another unwanted side effect (please vote for SERVER-1439 so we can use it).

Have fun!

I hope the package serves you well, and if you would like to talk further about it, please join the mgo-users mailing list and drop a message.