The RedMonk IT Report: S3/ZRM for Backup

As reported previously, after considering several options, I’ve settled on Amazon’s S3 service as the offsite backup platform for RedMonk’s production server, hicks. The new setup will replace our current duct tape and bailing wire solution, that basically rsyncs directories down to my local server apone every few days. Instead, we’re now performing nightly backups that would hopefully allow us to recreate our production web server implementation in the event of a machine failure, corruption, etc. Here’s what we’re doing, and trust me – if I can do this – anyone can.

What to Backup

There are two primary kinds of assets that require backup: our five primary MySQL databases, and our Apache web root. As our production web server, hicks’ job is essentially to run our primary web server, supporting multiple instances of WordPress, an instance of Mediawiki, and a soon-to-be-decommissioned instance of Movable Type.

Backing up the web root is a fairly trivial task for something like rsync, but MySQL is currently being backed up via a very manually intensive mysqldump/rsync routine, with very little scheduling, incremental backup and so on. And manual routines, as we all know, are a definite no-no when it comes to backup. So something had to change.

Where to Backup

Although our current backup strategy has essentially consisted of backups to my local machines, I’ve never felt terribly comfortable with that approach. There are multiple potential issues: for one, my hardware is not reliable. They’re commodity drives in a non-RAID configuration, meaning that they are neither reliable nor easily recoverable in the event of a failure. For another, my infrastructure is not exactly to datacenter standards: one serious powerspike could destroy weeks or months worth of backups. Just as I’m only to happy to make hosting someone else’s problem (the excellent John Companies, in this case), so too would I prefer to outsource my backup infrastructure. I’m not in the business of hosting, and would prefer to not be in the business of backup.

Fortunately, as previously documented, Amazon offers just such a service with its S3 offerings. Alternatives considered were Joyent’s Strongspace and rsync.net, but while both of those solutions will continue to play a role in my client and/or personal backup strategies, S3 promises to be the most economical for this particular use case.

In short, the plan is to regularly sync the backups to my S3 account.

How to Backup – The Overview

As mentioned previously, the web root is simple – it’s just a simple file system backup. I could compress the backups, I suppose, and might do so in the future, but for now I’m content with simply rsyncing (or the equivalent) the file system assets to a backup location.

For MySQL, however, a more sophisticated solution was called for: enter the Zmanda Recovery Manager. For those of you not familiar with Zmanda, they’re a company founded around the Amanda open source archiving application. Given my more limited needs, I’m employing Zmanda’s other offering, a MySQL focused backup tool known as the Zmanda Recovery Manager. Essentially I’m using the tool for three things: a.) dump our databases according to a set schedule (nightly), b.) maintain backup instances for a set period of time (1 week) then destroy them, c.) compress the backups (some of our databases are of moderate size), and d.) restore the databases from backup upon request. While we’re having problems with the last step (more on that in a bit), otherwise everything was very straightforward.

How to Backup – The Details

Fortunately for me, the available documentation for both backing up to S3 and ZRM is both substantial and very simple even for relative novices to follow.

To begin, I decided to tackle ZRM first. Following these instructions, I was able to configure and run a backup job inside of five minutes. One very important note before you begin: the instructions linked to will have you drop your database as part of the restore procedure. I HIGHLY recommend that you do your own MySQL dump (mysqldump -u username -p databasename > dumpfilename.sql) before you begin, just in case the ZRM restore fails and you’re forced to import your own copy (mysql -h hostname -u username -p databasename < dumpfilename.sql) as an alternative.

The only change I had to make was installing the recently released Debian packages rather than the Fedora/RHEL RPM’s specified in the instructions. For those of you on Ubuntu, that can be accomplished using the following:

Follow that, and you should be all set. The only other catch is that while the instructions call for you to login as root, Ubuntu would have you sudo each command instead. Not a huge issue, but something to remember. Currently, I have ZRM scheduled to take a snapshot of our DB’s on a nightly basis. The restore test failed, which I’ll go into in just a moment, but I expect to have that resolved shortly.

Next, I needed to establish an automated backup of both the webroot and our backed up MySQL databases to our predetermined offsite provider, Amazon’s S3. To do so, I followed these simple instructions. The author, John Eberly, walks you through the installation of a Ruby based rsync clone, s3sync, the creation of a simple bash script that will execute that script, and the scheduling of that job. While the notes are excellent and quite complete, a couple of issues/clarifications:

In the script line that begins “export SSL_CERT_DIR”, be sure to modify the end of the path from ‘certsruby’ to ‘certs’ as I wasn’t paying attention and just modified the front of the path. The ‘ruby’ belongs on the next line.

If you get a strange error that says something like “cannot find s3sync.rb” or some such, that’s because the ‘ruby’ from the above line belongs just before the ‘s3sync.rb’ in the script provided. Thought this was a path issue at first, but it’s a simple typo.

Eberly mentions that you need to modify the script so that it’s only readable by you (because it contains your S3 key information), but doesn’t provide instructions on how to do that. I simply did a chmod 700 scriptname.sh which I believe will restrict the file appropriately, but my knowledge of Unix-style permissions is abysmal, so feel free to correct me if I’m wrong.

When you go to schedule the job in question using crontab -e, if you get an error that says, “cannot write crontab” or some such, the problem may be nano’s justification (presuming you use nano as your default editor, as I do – I know, I should use vi, but I hate it). The fix is to cancel that, do another crontab -e, and once in nano type CTL-SHIFT-J; this will turn off justification, and you should be able to write the crontab entry. For example, mine looks like this:

30 1 * * * /path/to/my/uploadscript.sh

That executes the backup I’ve specified in earlier steps at 1:30 AM every morning, and worked last night.

Apart from those little nits, the provided instructions are terrific and easy to follow – even for me.

Remaining To Do’s

While ZRM is dutifully backing up our content, it is, as mentioned, not restoring the databases properly. As a test, I dropped just my WordPress database and attempted a restore according to the instructions provided; unfortunately, the restore failed (this is why, if you happened by my site late last night, you might have seen some weird database errors). Fortunately I’d done a backup myself beforehand, and simply reimported the dump I’d done before dropping the DB, but a backup procedure without a working restoration sequence is not terribly useful. I’ve posted a query to the ZManda forums here, but I’m increasingly of the belief that the failure was because ZRM was attempting a restore of all of the databases backed up, but I’d dropped only mine. As soon as I can verify that locally, I expect this problem to be resolved. Worst case, I can always purchase support from Zmanda for $200, a very reasonable price.

Summary

So what have we done here? Two things. First, we’ve automated the WordPress backup, retention and deletion process, and second, we’ve automated the backup of WordPress databases and file system content to an offsite provider who’s responsible for handling mirroring, redundancy and so on. All using the combination of mostly free/open source software and community supplied documentation. No additional hardware was purchased, and the ongoing hardware costs are well within our budget (more on that when my bills start coming in). The total effort required to do this was maybe two hours, even for someone of my limited skillset and then only because some of the gotchas mentioned above took longer than they should have to resolve.

The conclusion here is one that has been made countless times before: trends such as open source, community documentation, and Hardware-as-a-Service are lowering solution costs dramatically. As the person responsible for the operations side of a small business, I couldn’t be any happier about that.

11 comments

Take a look at rdiff-backup when you’re evaluate general backup tools. It uses the rsync protocol (can go over SSH) so that only changes are transfered but also includes some really interesting concepts around incremental backup. You end up with a remote mirror just like you’d get with rsync but it also stores reverse increments so that you can restore from a previous date.

I’ve been using this for almost three years now for all kinds of stuff and its been great. I have some tooling around it in the form of a ruby DSLs for configuration and managing multiple remote backups and whatnot. Let me know if you think you could use it and maybe I’ll slap an MIT license on it and make it available.

[…] recently setup an automated backup system for my (and my wife’s) blog.1 Based on the recommendation of Mr O’Grady (and my belief that RESTful architectures are a good way to solve most problems) I decided to use […]