Race-condition-free deployment with the "symlink replacement" trick

On Unix, mv is an atomic operation. This enables a well-known "symlink replacement trick" for race-condition-free website deployment, among other things. Let's create a script that encapsulates the process for general-purpose use.

Motivation

When deploying an update to a website, if we do something like

git pull within our deployment directory, or

rsync to our deployment directory, or even

run a script that quickly replaces the deployment directory with a new one,

... then there can be some number of milliseconds where the files we are trying to serve are nonexistent or in a state of change.

There is a filesystem-based mechanism that enables one to deploy updates with zero risk of this race condition occurring.[^1] It relies on the fact that mv is an atomic operation and Unix supports symlinks. The basic idea is that we specify our document root as a symlink to a directory containing the current version.

In the following examples, our document root is www. First, we copy all the files for our website into www.A:

$ rsync -CrP 'remote:~/website/' 'www.A/'

Then, we create the symlink, pointing the document root to www.A.

$ ln -s www.A www

Here's the directory structure in its entirety so far:

$ tree -AF --noreport
.
├── www -> www.A/
└── www.A/
└── index.html

When it's time to deploy an update, we prepare the next version in a different directory. We copy the updated files into www.B:

Whereas something like just asking 'ln' to overwrite the existing symlink is not:

$ ln -sfn www.B www
./ DELETE www
./ CREATE www

There is an unlikely but possible moment between that DELETE and the subsequent CREATE where a webserver might attempt to serve a file and find that its directory is missing!

[^1]: There are other means of avoiding this problem. Reconfiguring your webserver to serve from the new directory, and then sending them a signal to begin using the updated configuration would also work. In this document we only consider the case where we are deploying filesystem updates and we can't restart our webserver. Like when deploying a static website on a cheap commodity web host.

Rollback and staging for free

One of the interesting things about this technique is that if you find shortly after deploy that your updates are broken, you can switch the symlink back to the previous version. With no extra effort you've gained the ability to do deployment rollbacks.

Another is that if you instruct some webserver to use the "next" directory www.B as its document root, you gain a staging or preview site where you can inspect your changes before they "go live."

Preparing the stage

If we can assume that the new versions will be mostly the same as the previous versions, we can take advantage of a bandwidth-saving feature of rsync. Cloning the "live" site into the "stage" site before using rsync to transfer updated files will result in a bandwidth reduction, as rsync skips unmodified parts of files.

If we do this just before transferring files, we encounter a mildly complex multiple-connection process:

Create staging area with clone on host

Push all files up from development boxes

Perform symlink switch

However if we are willing to let the 'stage' directory persist on disk, we can re-use it and prepare it ahead of time, after each deploy.

Perform symlink switch

Prepare next staging area

Development boxes copy files into staging area at their leisure

An abstraction

Let's encapsulate this technique into a set of scripts so we don't have to remember, say, what the -T option to mv is and why it's needed.

Finally, we can also automate the initial setup of a set of directories and symlinks that this process expects to work with. Note that this initial setup step can't take advantage of the 'symlink replacement' trick.

The directory www.d contains three arbitrarily-named directories to hold the previous, current, and next versions as needed. The current directory contains three symlinks, www, www.prev, and www.stage.

With trivial changes to the management scripts these names could be modified to taste, or the three version directories need not live in their own containing directory. For example, changing '.d/1' to '.A', '.d/2' to '.B' and '.d/3' to '.C' in the initialization script is all that is needed to produce a filesystem layout like this instead:

A simplified abstraction

This version eschews the rollback feature. Since I keep my files in version control, as should you, I can let this deployment mechanism stay ignorant of how a "rollback" differs from just another deploy. Attempting to eschew the staging directory as well makes the script more complex so we'll just keep that.

A generalized abstraction

We have seen that this technique employs two directories, but with three directories we gain one level of rollback.

With 4 directories, could be have 2 levels of rollback? Can we write a script that works for N directories and achieves N-2 levels of rollback?

This is an interesting question from an abstraction and reduction perspective, that I may explore at some point but I don't really have a practical use for more than one rollback directory. (In fact I prefer none for my own work.)

A unified tool

Rather than have multiple scripts on our $PATH, let's bake them together and add some sanity-checking, argument parsing, and error handling for a more robust tool.

The great advantage of this technique is that it is easily documented, and someone looking at the directory full of "web-20121009" and similar directories with a symlink "web" -> "web-2012-10-23" should be able to figure out what is going on.

It's simple, it works, and can be reverse-engineered relatively easily. The only caveat is the reliance on GNU "ln", but the atomic replacement of symlinks is only covering the few milliseconds it takes to replace the symlink. If your website users are that sensitive to documents changing from underneath them, you'll need a different solution such as switching VMs on the fly.

@czettnersandor , @bitboxer , capistrano and such other tools are usually used for deploying form local server to remote server directly. Our team needed automatic deployment from github to staging server, so I had to create my own wheel after hours of research about existing deployment tools.