Friday, May 23, 2008

Incremental backups to Amazon S3

Based on this great blog post by Tim McCormack, I managed to write some scripts that back up files to Amazon S3. The files are encrypted with GnuPG and rsync-ed to S3 using a Python-based tool called duplicity.

Here's what I did in order to get all this going on a CentOS 5.1 server running Python 2.5.

1) Signed up for Amazon S3 and got the AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY.

2) Downloaded and installed the following packages: boto, GnuPGInterface, librsync, duplicity. All of them except librsync are Python-based, so they can be installed via 'python setup.py install'. For librsync you need to use './configure; make; make install'.

3) Generated a GPG key pair using "gpg --gen-key". Made a note of the hex fingerprint of the key (you can list the fingerprints of your keys via "gpg --fingerprint").

4) Wrote a simple boto-based Python script to create and list S3 buckets (the equivalent of directories in S3 parlance). Note that boto uses SSL, so your Python installation needs to have SSL enabled.

5) Wrote a bash script (heavily influenced by Tim McCormack's post) that runs duplicity and backs up the root partition of my Linux server (minus some directories) to S3. The nice thing about duplicity is that it uses rsync, so it only transfers the diffs over the wire. Here's how my script looks like:

NOTE: duplicity will interactively prompt you for your GPG key's passphrase, unless you have a variable called PASSPHRASE that contains the passphrase. Since I wanted to run this script as a cron job, I chose the less secure way of specifying the passphrase in clear inside the script. YMMV.