Search

Avoiding Server Disaster

Worried that your server will go down? You should be. Here are some
disaster-planning tips for server owners.

If you own a car or a house, you almost certainly have
insurance. Insurance seems like a huge waste of money. You pay it
every year and make sure that you get the best possible price for the
best possible coverage, and then you hope you never need to use the
insurance. Insurance seems like a really bad deal—until you have a
disaster and realize that had it not been for the insurance, you
might have been in financial ruin.

Unfortunately, disasters and mishaps are a fact of life in the
computer industry. And so, just as you pay insurance and hope never to
have to use it, you also need to take time to ensure the safety and
reliability of your systems—not because you want disasters to happen,
or even expect them to occur, but rather because you have to.

If your website is an online brochure for your company and then goes
down for a few hours or even days, it'll be embarrassing and
annoying, but not financially painful. But, if your website is your
business, when your site goes down, you're losing money. If
that's the case, it's crucial to ensure that your server and
software are not only unlikely to go down, but also easily recoverable if
and when that happens.

Why am I writing about this subject? Well, let's just say that this
particular problem hit close to home for me, just before I started to
write this article. After years of helping clients around the world to
ensure the reliability of their systems, I made the mistake of not
being as thorough with my own. ("The shoemaker's children go
barefoot", as the saying goes.) This means that just after launching
my new online product for Python developers, a seemingly trivial
upgrade turned into a disaster. The precautions I put in place,
it turns out, weren't quite enough—and as I write this, I'm still
putting my web server together. I'll survive, as will my server and
business, but this has been a painful and important lesson—one that
I'll do almost anything to avoid repeating in the future.

So in this article, I describe a number of techniques I've used to keep
servers safe and sound through the years, and to reduce the chances of a
complete meltdown. You can think of these techniques as insurance for
your server, so that even if something does go wrong, you'll be
able to recover fairly quickly.

I should note that most of the advice here assumes no redundancy in
your architecture—that is, a single web server and (at most) a
single database server. If you can afford to have a bunch of servers
of each type, these sorts of problems tend to be much less frequent.
However, that doesn't mean they go away entirely. Besides, although
people like to talk about heavy-duty web applications that require massive
iron in order to run, the fact is that many businesses run on small,
one- and two-computer servers. Moreover, those businesses don't need
more than that; the ROI (return on investment) they'll get from
additional servers cannot be justified. However, the ROI from a good
backup and recovery plan is huge, and thus worth the investment.

The Parts of a Web Application

Before I can talk about disaster preparation and recovery, it's
important to consider the different parts of a web application and
what those various parts mean for your planning.

For many years, my website was trivially small and simple. Even if it
contained some simple programs, those generally were used for sending
email or for dynamically displaying different assets to visitors.
The entire site consisted of some static HTML, images, JavaScript and
CSS. No database or other excitement was necessary.

At the other end of the spectrum, many people have full-blown web
applications, sitting on multiple servers, with one or more databases
and caches, as well as HTTP servers with extensively edited
configuration files.

But even when considering those two extremes, you can see that a web
application consists of only a few parts:

The application software itself.

Static assets for that application.

Configuration file(s) for the HTTP server(s).

Database configuration files.

Database schema and contents.

Assuming that you're using a high-level language, such as Python, Ruby
or JavaScript, everything in this list either is a file or can be
turned into one. (All databases make it possible to "dump" their
contents onto disk, into a format that then can be loaded back into
the database server.)

Consider a site containing only application software, static assets
and configuration files. (In other words, no database is involved.)
In many cases, such a site can be backed up reliably in Git. Indeed,
I prefer to keep my sites in Git, backed up on a commercial hosting
service, such as GitHub or Bitbucket, and then deployed using a system
like Capistrano.

In other words, you develop the site on your own development machine.
Whenever you are happy with a change that you've made, you commit the
change to Git (on your local machine) and then do a git
push to your
central repository. In order to deploy your application, you then use
Capistrano to do a cap deploy, which reads the data from the central
repository, puts it into the appropriate place on the server's
filesystem, and you're good to go.

This system keeps you safe in a few different ways. The code itself is
located in at least three locations: your development machine, the
server and the repository. And those central repositories tend to be
fairly reliable, if only because it's in the financial interest of the
hosting company to ensure that things are reliable.

I should add that in such a case, you also should include the HTTP
server's configuration files in your Git repository. Those files
aren't likely to change very often, but I can tell you from experience,
if you're recovering from a crisis, the last thing you want
to think about is how your Apache configuration files should look.
Copying those files into your Git repository will work just fine.

Backing Up Databases

You could argue that the difference between a "website" and a "web
application" is a database. Databases long have powered the back ends
of many web applications and for good reason—they allow you to
store and retrieve data reliably and flexibly. The power that modern
open-source databases provides was unthinkable just a decade or two
ago, and there's no reason to think that they'll be any less reliable
in the future.

And yet, just because your database is pretty reliable doesn't mean
that it won't have problems. This means you're going to want to
keep a snapshot ("dump") of the database's contents around, in case
the database server corrupts information, and you need to roll back to
a previous version.

My favorite solution for such a problem is to dump the database on a
regular basis, preferably hourly. Here's a shell script I've
used, in one form or another, for creating such regular database
dumps:

The above shell script starts off by defining a bunch of variables,
from the directory in which I want to store the backups, to the parts
of the date (stored in $YEAR, $MONTH and $DAY). This is so I
can have a separate directory for each day of the month. I could, of
course, go further and have separate directories for each hour, but
I've found that I rarely need more than one backup from a day.

Once I have defined those variables, I then use the mkdir command to
create a new directory. The -p option tells
mkdir that if
necessary, it should create all of the directories it needs such that
the entire path will exist.

Finally, I then run the database's "dump" command. In this
particular case, I'm using MySQL, so I'm using the
mysqldump
command. The output from this command is a stream of SQL that can be
used to re-create the database. I thus take the output from
mysqldump and pipe it into gzip, which compresses the output
file. Finally, the resulting dumpfile is placed, in compressed form,
inside the daily backup directory.

Depending on the size of your database and the amount of disk space
you have on hand, you'll have to decide just how often you want to run
dumps and how often you want to clean out old ones. I know from
experience that dumping every hour can cause some load problems. On
one virtual machine I've used, the overall administration team was
unhappy that I was dumping and compressing every hour, which they saw
as an unnecessary use of system resources.

If you're worried your system will run out of disk space,
you might well want to run a space-checking program that'll alert you
when the filesystem is low on free space. In addition, you can run a
cron job that uses find to erase all dumpfiles from before a certain
cutoff date. I'm always a bit nervous about programs that
automatically erase backups, so I generally prefer not to do this.
Rather, I run a program that warns me if the disk usage is going above
85% (which is usually low enough to ensure that I can fix the problem
in time, even if I'm on a long flight). Then I can go in and remove
the problematic files by hand.

When you back up your database, you should be sure to back up the
configuration for that database as well. The database schema and
data, which are part of the dumpfile, are certainly important.
However, if you find yourself having to re-create your server from
scratch, you'll want to know precisely how you configured the database
server, with a particular emphasis on the filesystem configuration and
memory allocations. I tend to use PostgreSQL for most of my work, and
although postgresql.conf is simple to understand and configure, I still
like to keep it around with my dumpfiles.

Another crucial thing to do is to check your database
dumps occasionally to be sure that they are working the way you want. It turns
out that the backups I thought I was making weren't actually
happening, in no small part because I had modified the shell script
and hadn't double-checked that it was creating useful backups.
Occasionally pulling out one of your dumpfiles and restoring it to a
separate (and offline!) database to check its integrity is a good
practice, both to ensure that the dump is working and that you
remember how to restore it in the case of an emergency.

Storing Backups

But wait. It might be great to have these backups, but what if the
server goes down entirely? In the case of the code, I mentioned
to ensure that it was located on more than one machine, ensuring its
integrity. By contrast, your database dumps are now on the server,
such that if the server fails, your database dumps will be
inaccessible.

This means you'll want to have your database dumps stored
elsewhere, preferably automatically. How can you do that?

There are a few relatively easy and inexpensive solutions to this
problem. If you have two servers—ideally in separate physical
locations—you can use rsync to copy the files from one to the
other. Don't rsync the database's actual files, since those might get
corrupted in transfer and aren't designed to be copied when the
server is running. By contrast, the dumpfiles that you have created
are more than able to go elsewhere. Setting up a remote server, with
a user specifically for handling these backup transfers, shouldn't be
too hard and will go a long way toward ensuring the safety of your
data.

I should note that using rsync in this way basically requires that you
set up passwordless SSH, so that you can transfer without having to be
physically present to enter the password.

Another possible solution is Amazon's Simple Storage Server (S3),
which offers astonishing amounts of disk space at very low prices. I
know that many companies use S3 as a simple (albeit slow) backup
system. You can set up a cron job to run a program that copies the
contents of a particular database dumpfile directory onto a particular
server. The assumption here is that you're not ever going to use
these backups, meaning that S3's slow searching and access will not be
an issue once you're working on the server.

Similarly, you might consider using Dropbox. Dropbox is best known
for its desktop client, but it has a "headless", text-based client
that can be used on Linux servers without a GUI connected. One nice
advantage of Dropbox is that you can share a folder with any number of
people, which means you can have Dropbox
distribute your backup databases everywhere automatically, including to a number of
people on your team. The backups arrive in their Dropbox folder, and
you can be sure that the LAMP is conditional.

Finally, if you're running a WordPress site, you might want to
consider VaultPress, a for-pay backup system. I must admit that in
the weeks before I took my server down with a database backup error, I
kept seeing ads in WordPress for VaultPress. "Who would buy
that?", I
asked myself, thinking that I'm smart enough to do backups myself. Of
course, after disaster occurred and my database was ruined, I realized
that $30/year to back up all of my data is cheap, and I should have
done it before.

Conclusion

When it comes to your servers, think less like an optimistic
programmer and more like an insurance agent. Perhaps disaster won't
strike, but if it does, will you be able to recover? Making sure that
even if your server is completely unavailable, you'll be able to bring
up your program and any associated database is crucial.

My preferred solution involves combining a Git repository for code and
configuration files, distributed across several machines and services.
For the databases, however, it's not enough to dump your database;
you'll need to get that dump onto a separate machine, and preferably
test the backup file on a regular basis. That way, even if things go
wrong, you'll be able to get back up in no time.

Reuven Lerner teaches Python, data science and Git to companies around
the
world. His free, weekly "better developers" email list reaches thousands
of
developers each week; subscribe here. Reuven
lives with his wife and children in Modi'in, Israel.