The more I use rsync the more I realise that it's a swiss army knife of file transfer. There are so many options. I recently found out that you can go --remove-source-files and it'll delete a file from the source when it's been copied, which makes it a bit more of a move, rather than copy programme. :)

15 Answers
15

Try to use rsync version 3 if you have to sync many files! V3 builds its file list incrementally and is much faster and uses less memory than version 2.

Depending on your platform this can make quite a difference. On OSX version 2.6.3 would take more than one hour or crash trying to build an index of 5 million files while the version 3.0.2 I compiled started copying right away.

One thing to note there is that if you use some options (like --delete-before for instance) the old "build list first" behaviour is used as it is required for these options to work correctly - so if you don't see this behaviour check if the other options you are using are known to stop it being possible. This can be useful if you are using rsync interactively on a large tree and want to force the initial scan so the output of --progress is accurate (i.e. the "objects to compare" count will never rise as no new objects will be found after the initial scan).
–
David SpillettAug 3 '12 at 14:33

When this option is used rsync will stop after T minutes
and exit. I think this option is useful when rsyncing a large amount
of data during the night (non-busy hours), and then stopping when it is
time for people to start using the network, during the day (busy hours).

--stop-at=y-m-dTh:m

This option allows you to specify at what time to stop rsync.

Batch Mode

Batch mode can be used to apply the same set of updates to many identical systems.

If you are wondering how far along a slow-running rsync has gotten, and didn't use -v to list files as they are transferred, you can find out which files it has open:

ls -l /proc/$(pidof rsync)/fd/*

on a system which has /proc

E.g. rsync was hung for me just now, even though the remote system seemed to have a bunch of space left. This trick helped me find the unexpectedly huge file which I didn't remember, which wouldn't fit on the other end.

It also told me a bit more interesting information - the other end apparently gave up, since there was also a broken socket link:

--archive is a standard choice (though not the default) for backup-like jobs, which makes sure most metadata from the source files (permissions, ownership, etc.) are copied across.

However, if you don't want to use that, oftentimes you'll still want to include --times, which will copy across the modification times of files. This makes the next rsync that runs (assuming you are doing it repeatedly) much faster, as rsync compares the modification times and skips the file if it's unchanged. Surprisingly (to me at least) this option is not the default.

If you need to update a website with some huge files over a slowish link, you can transfer the small files this way:

rsync -a --max-size=100K /var/www/ there:/var/www/

then do this for the big files:

rsync -a --min-size=100K --bwlimit=100 /var/www/ there:/var/www/

rsync has lots of options that are handy for websites. Unfortunately, it does not have a built-in way of detecting simultaneous updates, so you have to add logic to cron scripts to avoid overlapping writes of huge files.

I've used it to change the cipher on ssh to something faster (--rsh="ssh -c arcfour") also to set up a chain of sshs (recommend using it with ssh-agent) to sync files between hosts that can not talk directly. (rsync -av --rsh="ssh -TA userA@hostA ssh -TA -l userB" /tmp/foobar/ hostB:/tmp/foobar/).

Using --link-dest to create space-efficient snapshot based backups, whereby you appear to have multiple complete copies of the backedup data (one for each backup run) but files that don't change between runs are hard-linked instead of creating new copies saving space.

The one major disadvantage of this technique is that if a file is corrupted due to disk error it is just as corrupt in all snapshots that link to that file, but I have offline backups too which would protect against this to a decent extent. The other thing to look out for is that your filesystem has enough inodes or you'll run out of them before you actually run out of disk space (though I've never had a problem with the ext2/3 defaults).

Also, never forget the very very useful --dry-run for a little healthy paranoia, especially when you are using the --delete* options.