Make off-site backups or you will lose your data

Backup Pains

Who needs attackers when you have system administrators? Learn why copying your data doesn't mean you've backed it up.

As I write this column, I cannot help but reflect on the irony of just having wiped out a month's worth of data. In the spirit of this article, I was fiddling around with backups on my web server, and I managed to accidentally delete most of /var/ and all of the /home/ directory. This wouldn't have been so bad if I hadn't kept the daily backups in /home/backups/. Oops.

Backing Up Doesn't Always Mean You Have Backups

If your data isn't available, or the systems to process and serve it aren't available, you have a problem. In the case of my web server, the missing /var/ and /home/ render it pretty much useless. It serves 404s and that's pretty much it. To make sure data is available, you need to back it up. Seems simple right? In reality most of us (myself included) get it wrong, and although we go through the motions of making a backup, what we're really doing is just copying the data somewhere else that is equally vulnerable to loss.

In my case, I made a classic mistake of storing my backups on the same system that the data being backed up is on, and to make things worse, I actually kept it in a commonly accessed directory. Not that this would have mattered. Because the server only has one hard drive, I am only a single disk failure away from complete data loss no matter how much I back my data up locally on the server. Even if I were to install a second hard drive in the machine, it's still all too easy for a single event (bad drive controller, attacker wiping the system, fire, flood, power supply going bonkers, theft, etc.) to wipe out more than one hard drive.

What Are Real Backups?

Three main elements go into making real backups. Number one: You have to ensure that the data was actually backed up. I have seen far too many systems that write the data to a CD, DVD, or tape improperly, which results in nonrecoverable data. Ideally, you need to test out every backup you make, but if this isn't practical, you at least need to make occasional spot checks to ensure the data can be recovered.

Number two: You need to have off-site backups that are as close to read-only as you can get. This doesn't necessarily mean they have to be in a different physical location (although this is always a good idea), but they have to at least be separate enough that a single failure or event such as formatting a disk array or losing a server won't wipe out both the live data and the backups. A perfect example of this in action (besides, of course, my recent faux pas) is the website AVSIM Online, which lost 13 years of data to a single attack [1]. According to reports, AVSIM Online had two servers that copied their data off each other in an effort to back each other up. As I have said before, many of us are only copying data when we do backups and not making actual backups. In this case, an attacker broke into both servers, since they were basically identical, and deleted all the live data and the copies held on both servers. AVSIM Online lost their website, email, file library, forums, and more in one fell swoop and most likely will never get the data back. In my case, I was lucky. I only deleted a month's worth of logfiles and collected data, so all I have to do is wait a month for new data to be collected – good thing this wasn't someone's financial records.

Number three: You need to make sure you aren't deleting files out of the backup or zeroing the file contents unless you are beyond 100% certain you will never need that file again. For this reason, RAID isn't a backup solution. Even if you have multiple hard drives in a RAID configuration so that the loss of a single or even multiple drives will notcause data loss, you can still lose data through deletion (rm, mkfs, etc.) or zeroing or altering of files (cat foo > bar).

Principles of Security

The three principles of security are: Availability, Integrity, and Confidentiality (also referred to as the AIC triad). In a nutshell, you have to keep the stuff you need to work working, you need to ensure that your data hasn't been changed by an attacker, and you need to keep your private stuff confidential.

Get Off the System

Fortunately almost every mature backup program supports getting data from a client and storing it somewhere else – often on a dedicated server, disk array, tape, DVD, and so on. Some excellent options exist for Linux: Amanda [2], which ships with almost every distribution, as well as BackupPC [3] and Bacula [4], both covered in Linux Pro Magazine[5][6]. Although I won't cover the details here, suffice it to say they are very powerful, have lots of knobs to adjust, and definitely back up your data if you set them up right. The quick and dirty option is rsync (yum install rsync, apt-get install rsync, etc.) [7].