Bidirectional rsync flaws

Conclusion:

rsync does a fantastic job to sync and keep in sync content from one to another system. Backups are perfect with it, use rsnapshot to get even more out of it.
But rsync is not good in bi-directional syncing in the case of deletion of content. Despite, that you can delete all file on all content directories on all hosts, which is not an option for a the very-day user.
Keep in mind, that you need same timezones and synced clocks, when using the –update option

Look out to keep things in sync, with deletion too, take “unison” for example.Unison

Here is how I get to this conclusion:

Rsync is a very good program to sync data from one host (Alice) to another host (Bob).
The advantage of rsync vs. normal methods of data copying lies in its algorithms to transmit only changes, after the first initial upload.

For the example I use two local directories, it would also work identical via network, with the following initial layout and content:

--delete
This tells rsync to delete extraneous files from the receiving
side (ones that aren't on the sending side), but only for the
directories that are being synchronized. You must have asked
rsync to send the whole directory (e.g. "dir" or "dir/") without
using a wildcard for the directory's contents (e.g. "dir/*")
since the wildcard is expanded by the shell and rsync thus gets
a request to transfer individual files, not the files' parent
directory. Files that are excluded from the transfer are also
excluded from being deleted unless you use the --delete-excluded
option or mark the rules as only matching on the sending side
(see the include/exclude modifiers in the FILTER RULES section).
Prior to rsync 2.6.7, this option would have no effect unless
--recursive was enabled. Beginning with 2.6.7, deletions will
also occur when --dirs (-d) is enabled, but only for directories
whose contents are being copied.
This option can be dangerous if used incorrectly! It is a very
good idea to first try a run using the --dry-run option (-n) to
see what files are going to be deleted.
If the sending side detects any I/O errors, then the deletion of
any files at the destination will be automatically disabled.
This is to prevent temporary filesystem failures (such as NFS
errors) on the sending side from causing a massive deletion of
files on the destination. You can override this with the
--ignore-errors option.
The --delete option may be combined with one of the
--delete-WHEN options without conflict, as well as
--delete-excluded. However, if none of the --delete-WHEN
options are specified, rsync will choose the --delete-during
algorithm when talking to rsync 3.0.0 or newer, and the
--delete-before algorithm when talking to an older rsync. See
also --delete-delay and --delete-after.

It does exaclty what, we want, it makes a 1:1 copy of Alice on Bob, so the new file on Bob was deleted and the content of file 1 was overwritten.

rsync has the –update option. The manual reads:

-u, --update
This forces rsync to skip any files which exist on the destina-
tion and have a modified time that is newer than the source
file. (If an existing destination file has a modification time
equal to the source file's, it will be updated if the sizes are
different.)
Note that this does not affect the copying of symlinks or other
special files. Also, a difference of file format between the
sender and receiver is always considered to be important enough
for an update, no matter what date is on the objects. In other
words, if the source has a directory where the destination has a
file, the transfer would occur regardless of the timestamps.
This option is a transfer rule, not an exclude, so it doesn't
affect the data that goes into the file-lists, and thus it
doesn't affect deletions. It just limits the files that the
receiver requests to be transferred.

This does not look good for bi-directional sync.

What’s that any way?
Bi-directional syncing is the idea, that an user can add, change or delete any thing in the content, without caring about, where the user makes the change.
The “syncing” mechanim will take care to propagate the “latest” changes to all other content copies.

In principal rsync does that, with one caveat.
But first things first, let me show how far we can get with rsync.
We start with the following content:

Hmm, not what we want. I prevented deletion at all, perfect for backup and restore.

But how can we actually delete stuff and prevent it from coming back?
Remember the –delete option, let’s try this, we make the changes as in the previous test and do the rsyncing with –delete on both sides:

Hmm, the deletion on Alice worked, but not the one on Bob, because Alice first copied the file 3 back to Bob.

We could run the commands the other way round, but then file 2 would survive. Using cron to start the sync at the same time, will run into a race condition, on of the hosts will be faster and the result is still not what we want.

There is another flaw in the “–update”, which I need to mention. Both systems need to be in the same timezone and have synchronized clocks or the test on which change is “newer” will bitterly fail. (This has almost was a show stopper at a big migration once)

Conclusion:

rsync does a fantastic job to sync and keep in sync content from one to another system. Backups are perfect with it, use rsnapshot to get even more out of it.
But rsync is not good in bi-directional syncing in the case of deletion of content. Despite, that you can delete all file on all content directories on all hosts, which is not an option for a the very-day user.
Keep in mind, that you need same timezones and synced clocks, when using the –update option

2 Comments

Excellent point and demonstrations. I’ve run headlong into the short-comings of rsync when files are moved (not deleted) between folders on the two drives. Depending on your choices of –update and –delete you end up with multiple copies of the moved files in both initial and moved locations, or you lose data.