Friday, February 12, 2010

Bacula provides a rather flexible method for specifying the files and directories to include in and/or exclude from backup jobs - the Fileset resource.

It is, in fact, so flexible that you can use an external script or program to generate the list of files to backup, on the fly. That program is expected to dump a list of paths to backup to standard output, which is piped to Bacula, either at the server side (director):

File="|/path/to/script <args>"

or at the client side (file daemon):

File="\\|/path/to/script <args>"

Running the fileset generation program at the client side has the advantage of running with super-user privileges.

Here's an example, adapted from the Bacula manual, for backing up all local Ext3 disk partitions

Note that, if there are other include/exclude criterions in the Fileset, the file daemon still has to determine which files and directories it has to backup, under each parent directory that is specified by the external program.

A similar method can be used to completely workaround Bacula's file selection logic. One reason to do this would be to select files according to criteria that cannot be expressed using the normal Fileset resource definition syntax (e.g. file selection by date).

I became interested in this after I learned that if I specify /grandparent/parent/child/file as a backup target, Bacula does not backup the permissions and ownership info of any of the parent directories. This happens because none of the parent directories is a backup target itself, or a sub-directory of a directory which is a backup target.

This isn't a bug, but rather just the way Bacula is designed. It actually makes sense when you think about it. But the end result is that if you cherry pick directories to backup (like I do), you may end up with some non-obvious permissions/ownership problems upon a full restore, due to the fact that some parent directories were not specified as backup targets.

Turns out it's so tricky and cumbersome to get the behavior I want using the usual Fileset definition constructs, that using an external script for selecting files is actually the easy solution to my problem.

There are, however, a few gotchas that I had to address before I could deploy this scheme.

Say we have a program (more specifically: a Python script called fileset.py) that, when run, dumps a list of files and directories to standard output, which we wish to backup. The Fileset resource we would use in this case looks like this (for a Linux client):

If you ponder this for a bit you'll note that this definition excludes any file/directory that does not appear in the list dumped by our script - which is exactly what we want. The tricky part here, is that the list has to be reverse sorted, such that any sub-directory path appears in it before its parent directory, otherwise Bacula will filter it out.

Another issue, which I was not aware of initially, is that the locale information isn't propagated to the sub-process running the script. The tricky bit here is that locale information is propagated after manually restarting the file daemon process - the restarted process seems to inherit the environment settings of the shell that was used to restart it. A simple solution is to explicitly specify the value of the LANG environment variable, as I've done above.

The next issue I had to tackle was that when the file list is generated by a script, it's apparently generated before the client executes any of the ClientRunBeforeJob scripts that are configured in the backup job definition. This means that, if you create new files as part of the operation of the pre-backup scripts, these files will not be included in the current backup job. This is different than the normal state of affairs.

I had to split the backup jobs for each of the client machine that I backup into two jobs: one job runs the ClientRunBeforeJob scripts, but uses an empty fileset (i.e. one that doesn't have any File directive), and a second job that runs afterwards and uses the dynamic fileset.

The last problem, but not the least, was getting this scheme to work both on Window$ and Linux, with filenames that happen to contain illegal characters, using the same Python script. This was an interesting exercise in its own right, but I'll leave that to a future post.

OK, so it's complicated, and the benefits are dubious, but you've read so far. You're too kind. Thanks.

Friday, February 5, 2010

I've managed to avoid this for quite a while, but there was no escape this time.

On my last blog post I mentioned that I've hit a problem with the new Debian/testing Kernel (2.6.32).

After a quick search I found Kernel bug #14791, which seemed to be my exact problem. The good news was that there was a patch available, the bad news was that it was "[dropped] from the list of recent regressions due to the lack of testers".

So I stepped forward and decided to test the patch, hoping, first, that it does indeed fix my problem, and second, that it would be incorporated in the next Kernel release, if and when I verified that it worked.

But this meant that I needed to compile an upstream Kernel and install it. Twice (once to verify that the most current Kernel has this problem, and another to verify that the patch fixes it).

Brrr.

Well, doing this, on a Debian box, turns out to be actually rather easy, as long as the upstream Kernel isn't too far removed from a Kernel that's already installed on your box. It's even easier than compiling the official Debian Kernel source package - trust me, I tried.

The official guide is at the Debian Linux Kernel Handbook, where chapter 4 describes common Kernel related tasks such as Building a custom kernel from the "pristine" kernel source (section 4.5).

So here's how I did it, using Kernel source code from the mainline Git repository:

you'll be presented with a series of configuration questions (mostly, I just selected the default options by hitting ENTER repeatedly) - this is likely to be a short process, as long as you're compiling a Kernel that's similar enough to the one from which the base configuration was taken