Ganneff's Little Blog

Thoughts of a small and very unimportant Debian Developer

A nice feature from PostgreSQL are "write ahead log" (WAL) files. Quoting from the PostgreSQL website they describe every change made to the database's data files. [...] if the system crashes, the database can be restored to consistency by "replaying" the log entries made since the last checkpoint.

This basically enables online backup of your database cluster. And as you end up having the complete log of all actions within your database(s) you are also able to go back in time to any point you like.

I won't write a complete howto about it, as the PostgreSQL docs are pretty good, just some notes how I did my setup, as a kind of reminder for me when I have to do it again.

There must be a place to store WAL files which ends up not being on the same machine, otherwise it doesn't make much sense. For me that means scp to a different host, but an NFS mount or some storage array or drbd device might also be an option. Much of the following is for the scp method, but easily adaptable to other storage methods.

I need a target user on the host I copy files to, and as I did the setup for the pentabarf database running on host skinner, I helpfully named it pentabarf-backup. The copy process can't be interactive so the postgres user on the database host needs to have a passwordless SSH key to login to the pentabarf-backup user on the backup host. To limit my bad feeling about "passwordless key to login" I restrict the key by adding

The above tells PostGreSQL that i want it to archive the files using ssh to my backup host and transfer files with rsync (via ssh), but only if they do not exist yet. It also tells it to archive the files at least once every 12 hours. The effect is that WAL files get copied over when they are "full" or after 12 hours passed, whatever happens first. Don't set it too low, as the file will always have the full, fixed size and you waste space transferring mostly-empty files. In case you are concerned with the possibly slow speed of a network link, the PostgreSQL site has the following to say about this: The speed of the archiving command is not important, so long as it can keep up with the average rate at which your server generates WAL data. Normal operation continues even if the archiving process falls a little behind. I haven't had a problem with the speed yet. A 16MB file every (at least) 12 hours isn't that bad.

Now that we have done all the setup and are able to transfer WAL files, we need a base backup to start with. Issue a SELECT pg_start_backup('BASE_BACKUP'); as a database superuser and then backup the database cluster by backing up the whole clusters data/ directory (and every other location if you use tablespaces). When you are done issue SELECT pg_stop_backup(); and you are done with your base backup. You might want to exclude pg_xlog/ to save space, you don't need it. As I'm lazy I use a script (written by Peter 'weasel' Palfrader), you can get it here, but whatever you take is fine. If you use the script just make sure to adapt the variables in the script to values that fit your system.

From now on you should do such base backups in regular intervals, I run them once a week, as recovery does need the last base backup plus all WAL files from then on. Doing regular base backups limits the amount of WAL files you need to keep around / you need to replay.

Reading the above you might find that there is a window of up to 12 hours from which I do not have backups. True, and to make this window way smaller, I am using another script (also written by weasel), get it here, together with an init script. This synchronizes the pg_xlog/ directory to my backup host every 5 minutes, so in case of a server crash I can replay everything up to the last 5 minutes.

Copy possibly unarchived WAL files from /var/lib/postgresql/8.2_old/main/pg_xlog into the newly created pg_xlog.

Copy all the WAL segment files that are needed over from the backup host. That should be all the files on the backup host, directory pentabarf-WAL-skinner/. You should be able to limit it by taking all those files >= the latest *.backup file.

Create a recovery command file recovery.conf in the cluster data directory. You may also want to temporarily modify pg_hba.conf to prevent ordinary users from connecting until you are sure the recovery has worked. The content of recovery.conf is

restore_command = 'cp /tmp/pg-WAL/%f "%p"'

Make sure that the file location you specify is the location you put the WAL segment files into in the prior step.

Start the server, it will go into recover mode and proceed to read through all the archived WAL files it needs.

Look at the database and decide if you like what you see.

Of course the procedure can be slightly adapted, so you end up with a database state from 2 weeks ago. Or 3 days. Or whatever you may need (and have WAL files for). There are many possibilities, and there is a lot of text on the PostgreSQL page that I don't want to duplicate here, so please read it. Also, I'm not using this as the sole and only way to backup my database(s), they are all still dumped into SQL files every day. This is just additional.