Mar 5, 2011

I recently made a very dumb mistake and wiped out /home/greg/ on my personal desktop. It wasn't a hardware failure, it was user error. I had some manual backups, it wasn't catastrophic. I had never really set up a good system though, and there were some annoying losses. After restoring my sanity I decided it was time to set up better backups. I run a NAS with ~3T of usable disk space and part of the reason for running this was backups. My failure to set up a backup system was the only thing missing.

Obviously lots of people have solved this problem before, but my situation was unique enough that many options were off the table. I had a few requirements:

My home directory (primaries) is encrypted with Ubuntu's TrueCrypt setup, the documents I want to back up the most are financial in nature, and so I wanted my backups always encrypted on disk.

I want snapshotted backups so that I was resilient against hardware failures, but also from rm -rf stupidity.

However, I did not want to simply store a series of diffs as that would make recovery more complex and I want recovery to be simple.

Still, I wanted efficiency and speed so I wasn't choking my internal network at various points in the day.

Most importantly though, I wanted to understand exactly how my backup system works and what it's doing. Rather than trust some other code that I didn't understand and couldn't tweak, I wanted to roll my own. Most folks do this with shell scripts, cron, and rsync. I wanted to do something similar, but since my shell-foo is abysmal, I decided on python.

If this is useful to anyone else, I've shared my code. The script has two modes controlled by arguments: backup and snapshot

Backup:

Optionally tries to mount a path which should be set up in /etc/fstab. In my case, this is NFS.

Mounts an encrypted filesystem at /mnt/.../current/

Rsyncs a series of files and paths to /mnt/.../current/

Optionally unmounts the encrypted filesytem.

Snapshot:

Makes up to N periodic snapshots of the encrypted files at one of several frequencies. For example, it might be configured to keep 24 hourly snapshots, 7 daily snapshots, 4 weekly snapshots, and 3 monthly snapshots. Any number of snapshots can be kept at any frequency.

The snapshots are taken of the encrypted files, not from the decrypted filesystem. As a result, you can run the snapshots directly on the remote backup system, I run it on my NAS. It works just fine if you run it locally as well.

Both modes are managed using a .backuprc file in the user's home directory. For example, mine looks something like this:

# Optional, log all events
LOG_FILE /home/greg/logs/backup.log
# Optional, we try to mount this path first. Failures halt execution.
PRE_MOUNT /mnt/backup/
# Required, password and mount point for encrypted/decrypted file
# systems. The password can be in plaintext since this file is stored
# on an encrypted filesystem anyway. We aren't going for paranoid.
ENCFS_PASSWORD AddYourOwnPasswordHere
ENCRYPTED_MOUNTPOINT /mnt/backup/desktop/
# This is where we will write files unencrypted. Must be empty, must
# not be mounted already.
DECRYPTED_MOUNTPOINT /mnt/encryptedbackup/
# Required, rsync flags.
RSYNC_FLAGS -CRa --delete
# Number of snapshots. Format: [type=,...] e.g. hourly=12,daily=7
SNAPSHOTS hourly=12,daily=7,weekly=2,monthly=1
# List of file paths to rsync. Any line that doesn't contain a space is
# a file path. Paths can be filenames or directories. This is simply
# the argument passed to rsync. As a result, you can use rsync features
# like adding a "./" directory to tell rsync which components of the
# path to sync over.
/home/greg/./.heartbeat
/home/greg/./src/
/home/greg/./financial/
/home/greg/./picasa/

The python source, an example .backuprc and an example crontab are all found over here on github.