Your Shopping Cart

Total Price $0

Creating Your First Website Failover Setup

By Erik Reagan

Once upon a time—well, thrice really—the Focus Lab website went down, along with some clients' sites. It wasn’t just a simple, brief outage. The shortest outage was just past 10 hours; the longest was around 42 hours. That’s a long time for a site to be down. We were able to restore a few of them by deploying the version-controlled files to another environment and updating the DNS records to point to the new environment. But in each of those cases we were technically deploying an older version of the database, which isn’t ideal.

This experience helped nudge me toward a more automated approach to setting up a failover server. This isn’t just about having a backup every X number of hours. This is about automating a response to downtime that is longer than acceptable. So let's talk about 3 main ingredients to setting up a simple failover system.

“Nothing is failproof”

To be clear, we don’t host any of our clients’ sites. We recommend hosting providers we trust and respect, and our clients often go along with our recommendation. So when a site goes down we often aren’t technically responsible or liable for much of anything. That said, if I can provide some form of service that provides a solution for when a server goes down for an extended period of time, I want to do just that. It's never a matter of if a site goes down. It's always a matter of when.

I looked into a few options for automating this process. As with any technical endeavor, there are many paths to the same result. As such, I’d like to provide you, Dear Reader, with the key ingredients to a failover solution. How you automate the ingredients will be up to you. Toward the end here I’ll give some suggestions that should get you starting in the right direction.

Assumptions

Before diving in, here are some assumptions I am making in this article:

Your website or app is setup to be portable so the paths to your files being different across environments won’t break anything.

You are somewhat comfortable with the command line. (I’d consider myself a novice and I get around this stuff just fine.)

You will try figuring this out and solving problems on your own—before posting comments here asking why something doesn’t work.

You don’t mind clicking a few affiliate links along the way.

Okay, let’s dive in.

“It's never a matter of if a site goes down. It's always a matter of when.”

The Ingredients

At the risk of being overly obvious, I have to list the first key ingredient as the Production Server and Failover Servers. After all, without the Production Server we have nothing to backup, and without the Failover Server we have nowhere to backup to. Your Production Environment is already decided, as you have a live site you want to setup a Failover for. When it comes to the Failover Server, we currently use Digital Ocean. Digital Ocean is an awesome tool to have for the quick provisioning of test or temporary environments. I’ve only used them in this context so far.

Second is ssh access to both environments. Most of this setup (for us) is done over ssh connections via the command line. If you aren’t comfortable with the command line, you might want to brush up on that before moving forward. This article won’t teach you anything about the command line.

The third ingredient is the commands or scripts we’ll be running. At their core they are very simple. The possible variations are many though. So understanding the commands will be helpful to you. When in doubt, learn more by Googling the command or reading the man pages.

The Commands

With most websites that we build and support, there are two main things to backup. The first is the files and the second is databases.

Our first step will be to create a simple export of the necessary databases. In most of our work we only have between 1 and 3 databases to backup. We use the following command to do that:

mysqldump

-u[username]

The -u flag is where you put your MySQL username that has access to the databases you need. You would be replacing [username] with—well—the username.

-p[password]

This flag is exactly what you would expect, assuming you read what the -u flag is above.

--add-drop-table and --add-drop-database

These flags do exactly what they say. They add queries to drop all tables and databases they encounter. We do this because we want a carbon copy of the database, rather than lingering old data that may have been removed in Production.

As I write this I realize that using --add-drop-table is probably redundant considering we use --add-drop-database. I’ll let you decide though.

--databases dbname [db2name ...]

The final piece of the mysqldump command is to specify the databases we need. If you need one database you just put the single name. Any additional database names should be separated by a space.

| gzip > [/path/to/project/fileName].gz

The mysqldump command cranks out a text result of SQL commands most commonly saved as a .sql file. We pipe the result through to gzip compression setting the path of our new file to live at the root of our project files. Name this file whatever you like. We’ll use it later. I recommend you save it one level above your web root directory so it is not accessible through the browser or to unwanted users.

That’s the extent of our work on the Production server. Run this command and you should see your new compressed database export sitting next to the rest of your site’s files.

Our next step is to get the files backed up to the Failover Server. I mentioned earlier that we use Digital Ocean for this at Focus Lab. I won’t go into setting up a box at Digital Ocean, so get to googling if you need help there. My buddies at ClearFire wrote a little something about this if you want to start there.

Once you have your Failover Server ready and accessible, log in over ssh and change directories over to your project’s root. Our next command is going to knock on the digital door of our Production Server and copy everything over.

rsync

This program is for file syncing and transferring. It’s a beautiful thing. This is the heavy lifting of the process once it’s automated. Google around about rsync if you aren’t familiar with it.

-rltpvh

This is the collection of flags I chose for this syncing script. It’s almost identical to using -a (“archive”) but there are a few differences. In the interest of this being a learning experience, I’ll let you dig around to see what these flags actually do.

--ignore-existing

This flag makes our process a bit more efficient by ignoring files that already exist on the Failover Server. Simple concept and will save on bandwidth and transfer duration over time.

[user]@[host]:[remotePath] [localPath]

This is pretty self explanatory. You need to replace [user] with the username you’re using to access the Production Server. The [remotePath] is the absolute path to the directory you’re copying while [localPath] is respective to the Failover Server.

Once you run this comment you’ll be prompted for the password of the [user] for the [host]. Enter the password and your syncing begins. Watch and enjoy.

When the transfer completes you’ll have a carbon copy of your Production Server files on your Failover Server. This includes the database backup we created previously. Our final step is to import that data. We need to unzip the compressed file and dump the data into MySQL.

zcat [textfile] | mysql -u[dbuser] -p[password]

Again, let’s look at both pieces separately.

zcat [textfile]

Unzipping the compressed. This produces the SQL commands we need to run. So we’ll pipe this into MySql.

| mysql -u[dbuser] -p[password]

Similar to our export, we’re just defining connection details to MySQL. These are probably different credentials from your Production Server.

Step by Step

Considering the main ingredients and the commands above, this is what the step-by-step process would look like:

MySQL dump/backup on the Production Server

Run rsync from the Failover Server to sync files from the Production Server

Unizip and import the Production database(s)

Automation

So far, everything I’ve shared is assuming you’re manually running these commands. That’s not horrible, but certainly not ideal. If you have a site go down for an extended period of time, you want an automated recovery process when possible.

Automating this process is more depth than I intended for this article. It varies by person, by team, and by environment. That said, I’d like to give some suggestions to help you move toward automating this process.

Uptime Monitoring

The first step in automating this process is to know when a site goes down. The simplest way to monitor this is to setup a service such as Pingdom to notify you when a site isn’t accessible in certain ways. Most of these service providers give you the ability to send a request to a specified URL endpoint which could trigger any part of the process you design.

Alternatively you can use a DNS based service that checks for your site to return specific content or status codes. Upon enough downtime, the service would automatically change the DNS to point to your Failover Server instead of your Production Server.

Production Server Cron Job

The database backup should probably be automated to run at an interval that adequately copies data for your particular project. Most of the sites we operate are sufficiently backed up nightly.

If you don’t know how to setup a cron job, ask the mostly-trusty Google. I’m sure you’ll find some articles and/or StackOverflow threads that will get you started. If you can’t figure it out, try reaching out to your hosting provider’s support team for some assistance.

SSH Keys

You may recall that we had to manually type the password of the Production Server ssh user when using rsync. You can automate this step by setting up SSH keys so the two environments can comfortably talk to one another without saying the secret password each time.

Continuous Integration Servers

An alternative to running multiple servers would be to run your own Continuous Integration (CI) Server that runs checks and automates the creation of a Failover Server as needed. I’m not well versed in the CI options here, but we have used Jenkins CI at Focus Lab. Going the CI route would give you the ability to put your checks and subsequent actions in a single place.

For example, rather than relying on a nightly rsync process for all files, you could:

Use rsync to transfer database backup somewhere nightly

Use something like Pingdom to monitor uptime

Automatically provision a new server at Digital Ocean when a site goes down for too long

Deploy a git repository to the new server

Dump the latest database into it

Manually switch the DNS once that’s all complete

This process is a different article in itself though. One I'm not well equipped to write.

Nothing is Failproof

Automation can be dangerous if you don’t implement the proper “checks” along the way. What if your database backup script never returns the result you expected? What if rsync is failing every time it’s run? If you automate this process you would be wise to go a bit farther in the scripts to confirm the results are what you expected.

The last thing you want is a beautiful automated setup that you think works great, only to find out during a site outage that it was never running properly.

Final Thoughts

I hope this helps you get started in setting up your own failover solution. There are so many possibilities out there. I know where we landed. But how about you? What's your plan?