rsync oneliner: a study of a complex commandline

It seems silly to make a blog post about this, but I keep on
forgetting the answer to "what if I really want to just transfer
EVERYTHING with rsync?". Since the rsync(1) manpage is 28,000
words, I basically never go there to find the answer and instead
grep around this wiki and find other instances, which are never quite
as good as what I've come up with with the help of my (new) colleague
weasel.

The common answer is "just use -av":

rsync -av A/ B/

... but that has a few limitations:

it shows every file transfered, which can overwhelm the terminal
for large transfers

it won't transfer hardlinks, ACLs and other extended attributes

it might break if /etc/password is not synchronized across hosts

The answer, of course, is instead the very intuitive:

rsync -PaSHAX --numeric-ids --info=progress2 A/ B/

If you don't trust the filesystem time and files sizes, also throw in
-c to do a (MD5!?) checksum of the files instead, but that's
much slower. (A better hashing algorithm could be SHA-2 or
Meow, obviously.)

The --numeric-ids parameter is really relevant only when you archive
files across servers that might not share the same UID space. This is
especially important when restoring from backups because you might be
creating /etc/passwd along the way (!).

The last bit, --info=progress2 is not directly documented in the
manpage, at least not in the --info section. Strangely, there's some
information in the -P flag where it says:

outputs statistics based on the whole transfer, rather than
individual files.

I found this was extremely useful during large transfers because, by
default, -P (or, more specifically, --progress) shows progress for
each individual file. That's fine if you transfer large files, but
for large transfers (with a large number of files), that's much
less useful and possibly incredibly noisy. --info=progress2,
according to --info=help, does instead:

PROGRESS Mention 1) per-file progress or 2) total transfer progress

... which I admit is not much clearer.

Note that this is similar to how at least one backup system runs its
test suite, against, interestingly, rsync. Indeed, bupuses
rsync to check that the files it restores are identical to the
original. They use the also super-intuitive -niaHAX (maybe with
-c), which I find slightly less intuitive than my ordering, which
sounds like "fax"pacha in french.

So there you go. -PaSHAX is now your new best friend. And don't
forget the obvious--numeric-ids (and not uids, they talk
about groups too) and --info=progress2 (grrr) and maybe--checksum if you're nostalgic about the good old MD5 days.

Notice the trailing slashes at the end of A/ and B/. Those,
stupidly, matter to rsync. This is one of the most confusing things
about rsync and I have gotten around that problem by always
specifying a trailing slash to both arguments, which gives a
consistent experience all the time. But, if you want to know all the
nasty details, try to figure out this bit:

A trailing slash on the source changes this behavior to avoid
creating an additional directory level at the destination. You can
think of a trailing / on a source as meaning "copy the contents of
this directory" as opposed to "copy the directory by name", but in
both cases the attributes of the containing directory are
transferred to the containing directory on the destination. In other
words, each of the following commands copies the files in the same
way, including their setting of the attributes of /dest/foo:

rsync -av /src/foo /dest
rsync -av /src/foo/ /dest/foo

They ommitted, obviously, that this is also identical:

rsync -av /src/foo/ /dest/foo/

At this point, I would understand if you want to throw the "fine
manual" out the window and yell like crazy.

update: added -S
On pabs's recommendation, I also added -S, changing the acronym from "fax" (-PHaAX) to "pacha(x)" (-PaSHAX) which still sounds good and is a better mapping to the transliteration...
Comment by
anarcat
— au milieu de la matinée de Monday, July 8th, 2019