Of course, this product is for someone—and to those would-be users, this really will matter. Fully appreciating the new rsync.net (spoiler alert: it's pretty impressive!) means first having a grasp on basic data transfer technologies. And while ZFS replication techniques are burgeoning today, you must actually begin by examining the technology that ZFS is slowly supplanting.

A love affair with rsync

Further Reading

Revisiting a first love of any kind makes for a romantic trip down memory lane, and that's what revisiting rsync—as in "rsync.net"—feels like for me. It's hard to write an article that's inevitably going to end up trashing the tool, because I've been wildly in love with it for more than 15 years. Andrew Tridgell (of Samba fame) first announced rsync publicly in June of 1996. He used it for three chapters of his PhD thesis three years later, about the time that I discovered and began enthusiastically using it. For what it's worth, the earliest record of my professional involvement with major open source tools—at least that I've discovered—is my activity on the rsync mailing list in the early 2000s.

Rsync is a tool for synchronizing folders and/or files from one location to another. Adhering to true Unix design philosophy, it's a simple tool to use. There is no GUI, no wizard, and you can use it for the most basic of tasks without being hindered by its interface. But somewhat rare for any tool, in my experience, rsync is also very elegant. It makes a task which is humanly intuitive seem simple despite being objectively complex. In common use, rsync looks like this:

root@test:~# rsync -ha --progress /source/folder /target/

Invoking this command will make sure that once it's over with, there will be a /target/folder, and it will contain all of the same files that the original /source/folder contains. Simple, right? Since we invoked the argument -a (for archive), the sync will be recursive, the timestamps, ownership, permission, and all other attributes of the files and folders involved will remain unchanged in the target just as they are on the source. Since we invoked -h, we'll get human-readable units (like G, M, and K rather than raw bytes, as appropriate). Progress means we'll get a nice per-file progress bar showing how fast the transfer is going.

So far, this isn't much more than a kinda-nice version of copy. But where it gets interesting is when /target/folder already exists. In that case, rsync will compare each of those files in /source/folder with its counterpart in /target/folder, and it will only update the latter if the source has changed. This keeps everything in the target updated with the least amount of thrashing necessary. This is much cleaner than doing a brute-force copy of everything, changed or not!

When rsyncing remotely, rsync still looks over the list of files in the source and target locations, and the tool only messes with files that have changed. It gets even better still—rsync also tokenizes the changed files on each end and then exchanges the tokens to figure out which blocks in the files have changed. Rsync then only moves those individual blocks across the network. (Holy saved bandwidth, Batman!)

You can go further and further down this rabbit hole of "what can rsync do." Inline compression to save even more bandwidth? Check. A daemon on the server end to expose only certain directories or files, require authentication, only allow certain IPs access, or allow read-only access to one group but write access to another? You got it. Running "rsync" without any arguments gets you a "cheat sheet" of valid command line arguments several pages long.

To Windows-only admins whose eyes are glazing over by now: rsync is "kinda like robocopy" in the same way that you might look at a light saber and think it's "kinda like a sword."

If rsync's so great, why is ZFS replication even a thing?

This really is the million dollar question. I hate to admit it, but I'd been using ZFS myself for something like four years before I realized the answer. In order to demonstrate how effective each technology is, let's go to the numbers. I'm using rsync.net's new ZFS replication service on the target end and a Linode VM on the source end. I'm also going to be using my own open source orchestration tool syncoid to greatly simplify the otherwise-tedious process of ZFS replication.

First test: what if we copy 1GB of raw data from Linode to rsync.net? First, let's try it with the old tried and true rsync:

Time-wise, there's really not much to look at. Either way, we transfer 1GB of data in two minutes, 36 seconds and change. It is a little interesting to note that rsync ate up 26 seconds of CPU time while ZFS replication used less than three seconds, but still, this race is kind of a snoozefest.

So let's make things more interesting. Now that we have our 1GB of data actually there, what happens if we change it just enough to force a re-synchronization? In order to do so, we'll touch the file, which doesn't do anything but change its timestamp to the current time.

Now things start to get real. Rsync needed 13 seconds to get the job done, while ZFS needed less than two. This problem scales, too. For a touched 8GB file, rsync will take 111.9 seconds to re-synchronize, while ZFS still needs only 1.7.

Touching is not even the worst-case scenario. What if, instead, we move a file from one place to another—or even just rename the folder it's in? For this test, we have synchronized folders containing 8GB of data in /test/linodetest/1. Once we've got that done, we rename /test/linodetest/1 to /test/linodetest/2 and resynchronize. Rsync is up first:

Yep—it took the same old 1.7 seconds for ZFS to re-sync, no matter whether we touched a 1GB file, touched an 8GB file, or even moved an 8GB file from one place to another. In the last test, that's almost three full orders of magnitude faster than rsync: 1.7 seconds versus 1,479.3 seconds. Poor rsync never stood a chance.