Creating Filesystem Backups with 'rsync'

At my ISP, we still use a tape backup system for long-term backups but
we also have two identical disk drives in each server. A RAID-1
mirror would be the obvious way to get the data onto both drives to
protect against failures. But what's more common in your experience -- a
hard drive failure or accidentally deleting an important file?

Instead of using RAID-1, I use a Perl script called "synchro" to
synchronize the drive pairs each night. In this article, I will present
the reasons I decided do it this way, and share my script with you.

RAID technology

RAID can increase performance, but only under the right
conditions. For best results, more than two drives and SCSI
controllers are usually the way to go. In my case, we have EIDE
controllers. EIDE requires that the CPU do a lot the work in data
transfer so the CPU becomes a bottleneck. In my tests of Linux
software RAID-1 with EIDE drives, the performance hit was more than we
could live with. Therefore this is not really an option for us.

RAID-0 (striping) can increase available space, but does not provide
increased reliability. With RAID-0 (and RAID-4 and -5 for that matter),
data is striped across multiple drives to combine several physical
partitions into one larger logical one. I use Linux software RAID-0 on
two 40-GB drives to create a large filesystem to hold our NNTP news
cache. In this application, reliability is not an issue because it's
only a cache, so even losing the entire drive pair would only make
reading news slower. Performance for the cache is not an issue because
total number of people accessing the news server simultaneously is never high. RAID-4 and -5 offer redundancy but require even more CPU time to implement in software.

RAID can increase reliability. With redundant configurations provided
by RAID levels greater than 0, data is spread across multiple drives
so that a single drive failure does not result in loss. I've used
hardware RAID controllers in the past. I'd love to use something like
a Vortex SCSI-RAID controller but our ISP operates on a small budget.
I have found that being able to run down to a local discount store for
replacement parts is far more practical than keeping emergency spares
for exotic things like hardware RAID controllers on hand.

The complexity of a RAID setup (hardware or software) also makes more
demands on the ISP staff; complicated systems can be very
nerve-wracking when the phones are ringing because there is no server
running to pick up the modem lines!

Project goals and requirements

My server has survived two drive failures. Both times, Linux started
emitting warning messages days before the drives failed, so we were
ready with tape backups and replacement parts. Drives can fail suddenly, but they often give you lots of warnings. This reduces our need to have a RAID-1 mirror. Just keep an eye on those log messages!

By far, our most common problem has not been hardware failures. It has
been human error. Files are deleted or incorrectly modified both by
our own staff and by clients, and need to be restored quickly. In this
case, a RAID system will not help. A "delete" command will instantly and
efficiently remove the file from both the drives in a mirror. You are
still left with spinning the backup tapes, which can take hours.

I try to use revision control (RCS or CVS) for all system files. This
allows backing out changes as long as everyone is consistent about
checking in changes. Things still sometimes slip past us and this usually
does not help with clients' files.

So my goals are to keep a backup filesystem online at all times to
replace files that are accidentally modified or deleted, and to have
a complete drive available to deal with less common hardware failures.

A slow mirror

The solution that I came up with for my server was to replicate data to the second drive once a day. It's like a RAID-1 mirror that takes a day to copy the files.

This approach is not perfect. With RAID-1, the files on the recovery drive would always be up to date, but this system is as
good as a daily tape backup. It also does not help when
deleted/changed files go unnoticed for more than a day -- they will end
up disappearing from the secondary drive. Just be aware this little
script is a supplement to a good tape backup scheme, not a replacement
for it.