This module implements a way of taking directory structure snapshots using the rsync/hardlink method from Hack #74 in Linux Server Hacks. It's fairly basic for the moment, and limited to Unix platforms. Future versions will become more universal through the use of link and File::Find.

#!/usr/local/bin/perl
#
# Snapshot.pm
#
# Module containing methods relating to the backup and restore of stuf+f
# using the LSH style snapshots
=pod
=head1 Snapshot.pm
=head1 NAME
Snapshot.pm
Module to take snapshots using Linux Server Hacks rsync / cp -al metho+d
=head1 SYNOPSIS
use Snapshot qw/do_snapshot/;
my $return=do_snapshot({ source => '/home/robartes/',
snapshot_dir => '/snapshots/home',
}) or die $Snapshot::error;
=head1 DESCRIPTION
=head2 Introduction
Snapshot.pm provides a method to manage snapshots of directory trees. +It uses
the method given in Hack #74 in the excellent O'Reilly "Linux Server H+acks"
book (don't be put off by the Linux part - most hacks are applicable t+o any
Unix, and so is this module). More information about this method can b+e
found in that book and in this document under L</SNAPSHOT METHOD>.
'Snapshots' are defined as a number of copies of a directory tree,
reflecting the state of said tree I<at the moment the copy was taken>.+ This
means that a file in a snapshot of five days old will be the version o+f that
file that existed five days ago, even if the file was modified or even
deleted since then.
=head2 Series and MRS
Snapshot.pm keeps a configurable number of snapshots in so called I<se+ries>.
A series is defined as a number of snapshots using a common naming sch+eme
and following each other chronologically. One could for example imagin+e a
series to consist of daily snapshots. Multiple series can be defined a+nd
handled by Snapshot.pm.
The most recent snapshot (MRS) is treated specially. It is the same fo+r all
series and is named seperately from the series. It is also not include+d in
the count of series snapshots to keep.
+------+ +------+ +------+
+--| S1_1 | -- | S1_2 | -- | S1_3 |
/ +------+ +------+ +------+
+-----+ /
| MRS | --+
+-----+ \
\ +------+ +------+ +------+ +------+
+--| S2_1 | -- | S2_2 | -- | S2_3 | -- | S2_4 |
+------+ +------+ +------+ +------+
The figure represents a snapshot scheme consisting of two series - S1 +and S2
(the names are configurable). S1 is defined to keep 3 snapshots, S2 ke+eps 4.
When the do_snapshot() routine is called with these parameters, it wil+l
first rotate the series snapshots:
S1_3 ==> Deleted
S1_2 ==> S1_3
S1_1 ==> S1_2
MRS ==> S1_1
... and analogously for S2. After this, the MRS will be updated to ref+lect
the current state of the directory tree from which the snapshot is tak+en
(using rsync).
The rotation of MRS to first series member is done by hardlinking file+s.
This, combined with the fact that rsync creates new versions of modifi+ed
files instead of modifying them in place, is where all the magic of th+is
snapshot method lies.
After the rotations and the snapshot are done, a timetag file is writt+en in
the MRS. This timetag file has a timestamp of just before the rsync st+ep was
started, and contains that time formatted as by C<ctime(3)> (or, more
precisely, as the output from C<localtime> in scalar context). This is+ done
to ensure that one has a reference point as to the age of the snapshot+: the
versions of the files in that snapshot are guaranteed to be those of t+he
time recorded in the timetag (or later, to be precise). The timetags a+re not
changed by the series rotation process.
=head2 do_snapshot()
The core routine of Snapshot.pm, and the only one exported is
C<do_snapshot()>. This has to be imported explicitely:
use Snapshot qw/do_snapshot/;
C<do_snapshot()> is called with a hash or reference to a hash as argum+ent.
This hash lists configuration keys and values. Available keys and valu+es
are listed below. Mandatory options are marked with *.
=over 2
=item source *
The directory tree to take a snapshot from. Trailing slashes
are optional.
=item snapshot_dir *
The directory where the snapshots are located. MRS and
series snapshots are located in this directory.
=item series
A reference to an array containing hashrefs for series
configuration. Should contain one hashref per series, with mandatory k+eys
'stub' and 'keep'. stub is the name of the series (number will be appe+nded),
keep is how many snapshots should be kept in the series, excluding MRS+.
Default: C<[ { 'stub' => 'daily_', 'keep' => 6 } ]>
=item verbose
When set to a true value, C<Snapshot.pm> becomes talkative. Currently +this
only means that you get the output of the rsync step on STDOUT.
Default: false
=item MRS
The name of the Most Recent Snapshot.
Default: current
=item exclude
A reference to an array listing file patterns to exclude or include. T+his is
passed on to the rsync stage, and goes straight through to L<File::Rsy+nc>.
Default: empty
=item rsync_args
A hash reference of extra arguments to L<File::Rsync>. Any valid optio+n to
File::Rsync will work, and will override options set through other mea+ns,
notably C<verbose> and C<delete>. For the sake of the integrity of you+r
snapshots, you are advised to know what you are doing when you fiddle +with
these.
Default: empty
=item fatal_errors
When set to true, any error encountered will result in an exception be+ing
thrown. You can catch this with C<eval> and examine C<$@> in the usual+ way.
Default: false
=item timetag
The filename of the timetag file.
Default: .snaptime
=back
=head1 RETURN VALUE
C<Snapshot.pm> returns the output from the rsync step in a scalar if a+ll
goes well.
In case of an error, false is returned and $Snapshot::error will conta+in a
more or less informative message loosely related to the cause of the
error.
Unless asked to do so, C<Snapshot.pm> should never throw exceptions.
=head1 PORTABILITY
This version is restricted to platforms that have cp -al. That probabl+y
means Unix. Later versions will cater for additional platforms, namely+ those
that support the link function and the L</File::Find> module.
=head1 DEPENDENCIES
Snapshot.pm uses these non standard modules:
File::Rsync
=head1 SNAPSHOT METHOD
C<Snapshot.pm> uses a method from Hack #74 in Linux Server Hacks. The +method
is based on hard links and the fact that rsync creates new versions of+ files
it modifies, as opposed to modifying them in place.
The magic is in the copy step from MRS to the first series member. Thi+s step
is done using C<cp -al>, which creates hardlinks (C<-l>) of the files
instead of straight copies. If a file F is later changed in the MRS, t+he
hardlink still points to the I<original> version of it: rsync does an +unlink
on it, which decreases the reference count of the inode. The inode doe+s not
wink out of existence, as there still is a reference to it created by +hard
linking to it during the copy step.
Rotating the series elements is done with a simple C<move> (there is n+o need
to take more hard links).
The beauty of this method is that the copy step from MRS to S1 is a tr+ue
snapshot: only directories are actually created in S1, the files are s+imply
hard links. When files in the MRS are subsequently changed, only the c+hanged
files will take up extra space on the disk. So for each snapshot, you +only
need extra space for the files that have changed compared with the MRS+.
=head1 SEE ALSO
L<File::Rsync> -- Documentation for the File::Rsync module
L<http://www.oreilly.com/catalog/linuxsvrhack/> -- Linux Server Hacks +book
=head1 AUTHOR
Bart Vetters, L<robartes@nirya.be>
=head1 Copyrights
Copyright (c) 2003 Bart Vetters. All rights reserved.
This program is free software; you can redistribute it and/or modify i+t under the same terms as Perl itself.
=cut
use strict;
use warnings;
package Snapshot;
require Exporter;
our @ISA = qw/Exporter/;
use File::Rsync;
use File::Copy;
use File::Path;
our @EXPORT_OK=qw/do_snapshot/;
our $VERSION="0.02";
my $globals;
our $error;
my %rsync_out;
sub do_snapshot {
my $config = parse_args(@_) or return 0;
rotate_series($config->{'snapshot_dir'},
$config->{'MRS'},
$config->{'series'},
$config->{'num_series'},
) or return 0;
my $now=localtime();
my $retval=do_sync($config->{'source'},
$config->{'snapshot_dir'},
$config->{'MRS'},
$config->{'exclude'},
$config->{'rsync_args'},
) or return 0;
chdir $config->{'snapshot_dir'}.$config->{'MRS'} or return _croak("T+his is weird: MRS is not accessible after snapshot: $!\n");
unlink $config->{'timetag'} if (-f $config->{'timetag'});
if (open TIMETAG, ">".$config->{'timetag'} ) {
print TIMETAG $now."\n" or _warn("Could not write snapshot time to+ timetag file. Snapshot has been taken successfully though.\n");
close TIMETAG;
} else {
_warn("Could not write snapshot time to timetag file. Snapshot has+ been taken successfully though.\n");
}
return $retval;
}
sub parse_args {
my %allowed_args = ( 'source' => undef,
'snapshot_dir' => undef,
'exclude' => undef,
'MRS' => 'current',
'series' => [ { 'stub' => 'daily_',
'keep' => 6,
} ],
'rsync_args' => undef,
'fatal_errors' => undef,
'verbose' => undef,
'timetag' => '.snaptime'
);
my @mandatory_args = qw /source snapshot_dir/;
my @global_flags=qw/fatal_errors verbose/;
my @args=@_;
my %config;
eval {
if ( ref($args[0]) ) {
my $hashref=$args[0];
%config = %$hashref;
} else {
%config=@args;
}
};
if ($@) {return _croak("Please give a hashref or hash as arguments.\+n") };
foreach my $key (keys(%config)) {
return _croak("Unknown argument: $key.\n") unless exists $allowed_+args{$key};
$allowed_args{$key}=$config{$key};
}
foreach (@mandatory_args) {
return _croak("Missing arg: $_\n") unless defined ( $allowed_args{+$_} );
}
foreach (@global_flags) {
$globals->{$_}=$allowed_args{$_};
}
$allowed_args{'source'} =~ s|([^/])$|$1/|;
$allowed_args{'snapshot_dir'} =~ s|([^/])$|$1/|;
return \%allowed_args;
}
sub rotate_series {
my ($directory, $current, $series, $num_series) = @_;
chdir $directory or return _croak("Problem accessing snapshot direct+ory: $!\n");
if (defined ($num_series)) {
return _croak("Invalid number of series: $num_series\n") if ( $num+_series < 1 );
return _croak("Not enough series data.\n") if ( scalar @$series < +$num_series );
} else {
$num_series = @$series;
}
for (0 .. ($num_series - 1)) {
my $stub=$series->[$_]{'stub'};
my $keep=$series->[$_]{'keep'};
rotate_files($stub,$keep,$current) or return 0;
}
return 1;
}
sub rotate_files {
my ($stub, $max, $current) = @_;
return _croak("Invalid number of snapshots to retain: $max\n") if ($+max < 1);
_warn("Only 1 version to keep. No rotating of files will be done.\n"+) if $max == 1;
my $maxdir="$stub$max";
if ( -d $maxdir ) {
rmtree($maxdir,0,1) or return _croak("Could not delete $stub$max\n+");
};
if ($max > 2 ) {
my $x=($max - 1);
while ($x) {
if ( -d $stub.$x ) {
move( $stub.$x, $stub.( $x + 1 ) ) or return _croak("A move we+nt wrong: $!\nI'm aborting to avoid possible data corruption.\nThe mo+ve that failed is $stub$x to $stub".( $x + 1 )."\n");
}
$x--;
}
}
my $target=$stub."1";
if ( -d $current) {
return _croak("Problem with copy step: ".($? >> 8) ."\n") if syste+m("cp -al $current/ $target");
} else {
_warn("No MRS found. Will be created in rsync step.\n");
}
return 1;
}
sub do_sync {
my ($source, $snapshot_dir, $current, $exclude, $rsync_args) = @_;
my $dest=${snapshot_dir}.$current;
my %extra_args;
if (defined($rsync_args)) {
eval { %extra_args = %$rsync_args };
return _croak("Problem with extra rsync args: $@\n") if ($@);
}
unless (defined($extra_args{'exclude'})) {
$extra_args{'exclude'}=$exclude if defined($exclude);
}
my $rsync = File::Rsync -> new(archive => 1,
update => 1,
compress => 1,
verbose => 1,
delete => 1,
outfun => \&rsync_output,
errfun => \&rsync_output,
%extra_args,
);
$rsync->exec ( { 'src' => $source, 'dest' => $dest } ) or return _cr+oak("Rsync failed: ".$rsync_out{'err'}."\n");
return $rsync_out{'out'};
}
sub rsync_output {
my ($message,$type)=@_;
$rsync_out{$type}.="$message\n";
print $message if ( defined($globals->{'verbose'}) && $globals->{'ve+rbose'});
}
sub _warn {
my $message=shift;
print STDERR $message;
}
sub _croak {
$error=shift;
die $error if $globals->{'fatal_errors'};
return 0;
}
# I am a good little module
1;