Main menu

Post navigation

The Art of “rsync”

As a migratory systems engineer, I have lived, or stayed extensively, in cities all over my country, The United States of America. Due to this, I belong to many mailing lists and technical groups in CONUS (CONtinental United States.) One of the groups I belong to is the the DCLUG, or more extensively stated, the Washington, DC Linux Users Group. A recent dialogue of correspondence covered a very mundane topic; the topic of “rsync,” and it’s behavior while trying to do incremental copies. A member of the group, a Mr. Michael Henry, replied with a very in-depth answer and I felt it should be recorded for posterity’s sake, as even I, being a Unix/Linux user for over 20 years, learned some rsync nuance from this walk-through. You will find the contents of his reply copied here.

Alan (Original Post) wrote:

Assembled Wisdom!

I need rsync for something very primitive: to copy incremental additions and
subtractions from directories on my home hard drive to a thumb drive.
I have just installed rsync and have a couple of tutorials installed. But
they seem to be much more complex than I need. Of course I could
experiment, but I’m lazy and hope that someone can tell me how he/she
does it. For I’m sure that y’all use rsync, right?

TIA for anticipated help/guidance!

Alan

Peter wrote:
> I’m not sure if you posed the question well enough, or my
> interpretation sucks. Is the goal that you want to mirror
> things on your thumb drive? In that case the
>
> rscync -av –delete SRC DEST
>
> is what i use. man page will explain what they do.

Alan wrote:
> Yes. This seems short and sweet. I want to be sure
> that nothing in the ‘DEST’ — the thumb drive — is ever
> deleted. Except of course: suppose I
> fix/alter/shorten/improve some script that I have
> written, do I want my back-up to contain only the new
> version, or should I keep the old version as a
> ‘historical record’? Of course this is a personal
> decision that I must work out for myself<g>

Michael Henry wrote:

Peter’s invocation includes the “–delete“ flag which will
delete any files in the destination that aren’t present in the
source. Since you’d like to never delete things, you wouldn’t
want to use “–delete“.

You might like to use the “–dry-run“ flag in addition to the
verbose (“-v“) flag to see what “rsync“ intends to do. I
find that seeing a dry run can clarify things that aren’t always
crystal clear in the manual.

Also, “rsync“ places heavy significance on directories with
trailing slashes in “SRC“. With a trailing slash, “rsync“
copies only the contents of the directory to the destination;
without the slash, the directory name itself is copied as well,
adding a possibly unwanted extra directory layer in the
destination. Consider some test cases, which can be pasted
directly from this email into a Bash prompt::

sent 417 bytes received 112 bytes 1,058.00 bytes/sec
total size is 0 speedup is 0.00

Now examine the files in the tree::

find -type f | sort

In the below output, there is a new “src“ directory below
“dest“ due to the lack of a trailing slash on “src“. Note
also that the file “./src/top-level-file.txt“ was not copied
to the corresponding location “./dest/top-level-file.txt“::

sent 364 bytes received 89 bytes 906.00 bytes/sec
total size is 0 speedup is 0.00

“New” files like “top-level-file.txt“ are copied, but not
files with the same size and timestamp (such as “file1.txt“).
The invocation with the trailing slash on “src/“ means to copy
everything below “src“ into “dest“, which is most often what
I’m trying to do.

With no changes to the file trees, a second invocation of
“rsync -a“ does nothing::

rsync -av src/ dest/

The output shows a total size of zero bytes of changes::

sending incremental file list

sent 229 bytes received 14 bytes 486.00 bytes/sec
total size is 0 speedup is 0.00

If we now modify a single file and try again, just that file’s
modifications are transferred::

echo ‘changes’ >> src/file1.txt
rsync -av src/ dest/

Note that “file1.txt“ was transferred along with “./“
(because its modification time changed):

sending incremental file list
./
file1.txt

sent 319 bytes received 40 bytes 718.00 bytes/sec
total size is 8 speedup is 0.02

Finally, add in the “–delete“ flag along with “–dry-run“
to see the proposed effects without actually changing anything::

rsync -av –delete –dry-run src/ dest/

Note that “rsync“ would delete the extra “src“ tree that was
created due to the lack of a trailing slash in the earlier
steps, as well as the “dest-only/“ directory and the file
“common/dest-only1.txt“ which don’t exist in “src“::

I find that “rsync“ behaves most intuitively for me when I use
the trailing slash on source directories. To simplify things, I
generally use a slash on both sources and destinations, which
makes the rule easier to remember and apply in practice. This
works because the presence or absence of a trailing slash on a
directory doesn’t matter to “rsync“.

Also, if you decide you do want to keep historical records of
old versions of your files, I highly recommend “rsnapshot“:http://rsnapshot.org/

The “rsnapshot“ script provides a way of taking
space-efficient “snapshots” of your file tree. Files that
haven’t changed since the previous snapshot do not take up much
extra space because of the way “rsnapshot“ uses hard links.
The original article by Mike Rubel linked by the above web site
was the inspiration for “rsnapshot“ and as I recall, it was an
enlightening look at the underlying mechanism. The site appears
to be down now, but the Wayback Machine has it:https://web-beta.archive.org/web/20170104080412/http://www.mikerubel.org/computers/rsync_snapshots/

If you want to hand-roll something custom, you may find Mike’s
article enlightening.

Another approach is to use a version control system like Git,
Mercurial, Subversion, etc., to track your changes. This works
well, especially for text files. You can track historical
changes to files in a way that is easy to examine later. Using
Git as an example, you could create a Git repository on your
main machine, then “rsync“ the entire directory to the thumb
drive as a backup. Things that are deleted will still be in the
Git history, so it’s safe to use “rsync –delete“ to backup
the tree. Depending on your needs, this might be another
approach to pursue, though there is a larger learning curve than
using “rsnapshot“ if you haven’t used source control in the
past.