NAME

File::Rsync::Mirror::Recent - mirroring via rsync made efficient

SYNOPSIS

The documentation in here is normally not needed because the code is considered to be run from several standalone programs. For a quick overview, see the file README.mirrorcpan and the bin/ directory of the distribution. For the architectural ideas see the section THE ARCHITECTURE OF A COLLECTION OF RECENTFILES below.

File::Rsync::Mirror::Recent establishes a view on a collection of File::Rsync::Mirror::Recentfile objects and provides abstractions spanning multiple time intervals associated with those.

EXPORT

No exports.

CONSTRUCTORS

my $obj = CLASS->new(%hash)

Constructor. On every argument pair the key is a method name and the value is an argument to that method name.

my $obj = CLASS->thaw($statusfile)

Constructor from a statusfile left over from a previous rmirror run. See also runstatusfile.

ACCESSORS

ignore_link_stat_errors

as in F:R:M:Recentfile

local

Option to specify the local principal file for operations with a local collection of recentfiles.

localroot

as in F:R:M:Recentfile

max_files_per_connection

as in F:R:M:Recentfile

remote

The remote principal recentfile in rsync notation. E.g.

pause.perl.org::authors/RECENT.recent

remoteroot

as in F:R:M:Recentfile

remote_recentfile

Rsync address of the remote RECENT.recent symlink or whichever name the principal remote recentfile has.

rsync_options

Things like compress, links, times or checksums. Passed in to the File::Rsync object used to run the mirror. Can be a hashref or an arrayref. Depending on the version of File::Rsync it is passed on as a hashref or as a flat list.

All parameters that can be passed to File:Rsync:Mirror:Recentfile::recent_events() can also be specified here.

One additional option is supported. If $Options{callback} is specified, it must be a subref. This sub is called whenever one chunk of events is found. The first argument to the callback is a reference to the currently accumulated array of events.

Note: all data are kept in memory.

overview ( %options )

returns a small table that summarizes the state of all recentfiles collected in this Recent object.

$file = $obj->runstatusfile ($set)

Getter/setter for _runstatusfile attribute. Defaults to a temporary file created by File::Temp. A status file is required for rmirror working. Since it may be interesting for debugging purposes, you may want to specify a permanent file for this.

$verbose = $obj->verbose ( $set )

Getter/setter method to set verbosity for this F:R:M:Recent object and all associated Recentfile objects.

my $vl = $obj->verboselog ( $set )

Getter/setter method for the path to the logfile to write verbose progress information to.

Note: This is a primitive stop gap solution to get simple verbose logging working. The program still sends error messages to STDERR. Switching to Log4perl or similar is probably the way to go. TBD.

THE ARCHITECTURE OF A COLLECTION OF RECENTFILES

The idea is that we want to have a short file that records really recent changes. So that a fresh mirror can be kept fresh as long as the connectivity is given. Then we want longer files that record the history before. So when the mirror falls behind the update period reflected in the shortest file, it can complement the list of recent file events with the next one. And if this is not long enough we want another one, again a bit longer. And we want one that completes the history back to the oldest file. The index files together do contain the complete list of current files. The longer a period covered by an index file is gone the less often the index file is updated. For practical reasons adjacent files will often overlap a bit but this is neither necessary nor enforced. Enforced is only that there must not ever be a gap between two adjacent index files that would have to contain a file reference. That's the basic idea. The following example represents a tree that has a few updates every day:

Each of these files represents a contract to hold a record for every filesystem event within the period indicated in the filename.

The first file is the principal file, in so far it is the one that is written first after a filesystem change. Usually a symlink links to it with a filename that has the same filenameroot and the suffix .recent. On systems that do not support symlinks there is a plain copy maintained instead.

The last file, the Z file, contains the complementary files that are in none of the other files. It may contain delete events but often delete events are discarded at the transition to the Z file.

SITE SEEING TOUR

This section illustrates the operation of a server-client couple in a fictious installation that has to deal with a long time of inactivity. I think such an edge case installation demonstrates the economic behaviour of our model of overlapping time slices best.

The sleeping beauty (http://en.wikipedia.org/wiki/Sleeping_Beauty) is a classic fairytale of a princess sleeping for a hundred years. The story inspired the test case 02-aurora.t.

Given an upstream server where the people stop feeding new files for one hundred years. That upstream server has no driving energy to do major changes to its RECENT files. Cronjobs will continue to shift things towards the Z file but soon will stop doing so since all of them have to keep their promise to record files covering a certain period. Soon all RECENT files will cover exactly their native period.

Downstream servers will stubbornly ask their question to the rsync server whether there is a newer RECENT.recent. As soon as the smallest RECENT file has reached the state of maximum possible merge with the second smallest RECENT file, the answer of the rsync server will always be: nothing new. And downstream servers that were uptodate on the previous request will be satisfied and do nothing. Never will they request a download. The answer that there is no change is sufficient to determine that there is no change in the whole tree.

Let's presume the smallest RECENT file on this castle is a 1h file and downstream decides to ask every 30 minutes. Now the hundred years are over and upstream starts producing files again. One file every minute. After one minute it will move old files over to the, say, 1d file. In the next sixty minutes it will not be allowed to move any other file over to the 1d file. At some point in time downstream will ask the obligatory question "anything new?" and it will get the current 1h file. It will recognize in the meta part of the current file which timestamps have been moved to the 1d file, it will recognize that it has all those. It will have no need to download the 1d file, it will download the missing files and be done. No second RECENT file needs to be downloaded.

Downstream only decides to download another RECENT file when not doing so would result in a gap between two recent files. Such that consistency checks would become impossible. Or for potentially interested third parties, like down-down-stream servers.

Downloads of RECENT files are subject to rsync optimizations in that rsync does some level of blockwise checksumming that is considered efficient to avoid copying blocks of data that have not changed. Our format is that of an ordered array, so that large blocks stay constant when elements are prepended to the array. This means we usually do not have to rsync full RECENT files. Only if they are really small, the rsync algorithm will not come into play but that's OK for small files.

Upstream servers are extremely lazy in writing the larger files. See File::Rsync::Mirror::Recentfile::aggregate() for the specs. Long before the one hundred years are over, the upstream server will stop changing files. Slowly everything that existed before upstream fell asleep trickles into the Z file. Say, the second-largest RECENT file is a 1Y file and the third-largest RECENT file is a 1Q file, then it will take at least one quarter of a year that the 1Y file will be merged into the Z file. From that point in time everything will have been merged into the Z file and the server's job to call aggregate regularly will become a noop. Consequently downstream will never again download anything. Just the obligatory question: anything new?

THE INDIVIDUAL RECENTFILE

A recentfile consists of a hash that has two keys: meta and recent. The meta part has metadata and the recent part has a list of fileobjects.

THE META PART

Here we find things that are pretty much self explaining: all lowercase attributes are accessors and as such explained in the manpages. The uppercase attribute Producers contains version information about involved software components.

Even though the lowercase attributes are documented in the F:R:M:Recentfile manpage, let's focus on the important stuff to make sure nothing goes by unnoticed: meta contains the aggregator levels in use in this installation, in other words the names of the RECENT files, eg:

aggregator:
- 3s
- 8s
- 21s
- 55s
- Z

It contains a dirtymark telling us the timestamp of the last protocol violation of the upstream server:

dirtymark: '1325093856.49272'

Plus a few things convenient in a situation where we need to do some debugging.

And it contains information about which timestamp is the maximum timestamp in the neighboring file. This is probably the most important data in meta:

merged:
epoch: 1307159461.94575

This keeps track of the highest epoch we would find if we looked into the next RECENT file.

Another entry is the minmax, eg:

minmax:
max: 1307161441.97444
min: 1307140103.70322

The merged/epoch and minmax examples above illustrate one case of an overlap (130715... is between 130716... and 130714...). The syncing strategy for the client is in general the imperative: if the interval covered by a recentfile (minmax) and the interval covered by the next higher recentfile (merged/epoch) do not overlap anymore, then it is time to refresh the next recentfile.

THE RECENT PART

This is the interesting part. Every entry refers to some filesystem change (with path, epoch, type).

The epoch value is the point in time when some change was registered but can be set to arbitrary values. Do not be tempted to believe that the entry has a direct relation to something like modification time or change time on the filesystem level. They are not reflecting release dates. (If you want exact release dates: Barbie is providing a database of them. See http://use.perl.org/~barbie/journal/37907).

All these entries can be devided into two types (denoted by the type attribute): news and deletes. Changes and creations are news. Deletes are deletes.

Besides an epoch and a type attribute we find a third one: path. This path is relative to the directory we find the recentfile in.

The order of the entries in the recentfile is by decreasing epoch attribute. These are unique floating point numbers. When the server has ntp running correctly, then the timestamps are usually reflecting a real epoch. If time is running backwards, we trump the system epoch with strictly monotonically increasing floating point timestamps and guarantee they are unique.

CORRUPTION AND RECOVERY

If the origin host breaks the promise to deliver consistent and complete recentfiles then it must update its dirtymark and all slaves must discard what they cosider the truth.

In the worst case that something goes wrong despite the dirtymark mechanism the way back to sanity can be achieved through traditional rsyncing between the hosts. But please be wary doing that: mixing traditional rsync and the F:R:M:R technique can lead to gratuitous extra errors. If you're the last host in a chain, there's nobody you can disturb, but if you have downstream clients, it is possible that rsync copies a RECENT file before the contained files are actually available.

BACKGROUND

This is about speeding up rsync operation on large trees. Uses a small metadata cocktail and pull technology.

rersyncrecent solves this problem with a couple of (usually 2-10) lightweight index files which cover different overlapping time intervals. The master writes these files and the clients/slaves can construct the full tree from the information contained in them. The most recent index file usually covers the last seconds or minutes or hours of the tree and depending on the needs, slaves can rsync every few seconds or minutes and then bring their trees in full sync.

The rersyncrecent model was developed for CPAN but as it is both convenient and economic it is also a general purpose solution. I'm looking forward to see a CPAN backbone that is only a few seconds behind PAUSE.

COMPETITORS

The problem to solve which clusters and ftp mirrors and otherwise replicated datasets like CPAN share: how to transfer only a minimum amount of data to determine the diff between two hosts.

Normally it takes a long time to determine the diff itself before it can be transferred. Known solutions at the time of this writing are csync2, and rsync 3 batch mode.

For many years the best solution was csync2 which solves the problem by maintaining a sqlite database on both ends and talking a highly sophisticated protocol to quickly determine which files to send and which to delete at any given point in time. Csync2 is often inconvenient because it is push technology and the act of syncing demands quite an intimate relationship between the sender and the receiver. This is hard to achieve in an environment of loosely coupled sites where the number of sites is large or connections are unreliable or network topology is changing.

Rsync 3 batch mode works around these problems by providing rsync-able batch files which allow receiving nodes to replay the history of the other nodes. This reduces the need to have an incestuous relation but it has the disadvantage that these batch files replicate the contents of the involved files. This seems inappropriate when the nodes already have a means of communicating over rsync.

HONORABLE MENTION

instantmirror at https://fedorahosted.org/InstantMirror/ is an ambitious project that tries to combine various technologies (squid, bittorrent) to overcome the current slowness with the main focus on fedora. It's been founded in 2009-03 and at the time of this writing it is still a bit early to comment on.

LIMITATIONS

If the tree of the master server is changing faster than the bandwidth permits to mirror then additional protocols may need to be deployed. Certainly p2p/bittorrent can help in such situations because downloading sites help each other and bittorrent chunks large files into pieces.

INOTIFY

Currently the origin server has two options. The traditional one is to strictly keep track of injected and removed files through all involved processes and call update on every file system event. The other option is to let data come in and use the assistance of inotify. PAUSE is running the former, the cpan master site is running the latter. Both work equally well for CPAN because CPAN has not yet had any problem with upload storms. On installations that have to deal with more uploaded data than inotify+rrr can handle it's better to use the traditional method such that the relevant processes can build up some backpressure to throttle writing processes until we're ready to accept the next data chunk.

FUTURE DIRECTIONS

Convince other users outside the CPAN like http://fedoraproject.org/wiki/Infrastructure/Mirroring