as well as in /etc/exports (using the syntax of that file), on the first netbook.

The above works fine, but the files and directories are huge. The files average about half a gigabyte a piece, and the directories are all between 15 and 50 gigabytes.

I'm using rsync to transfer them, and the command (on 192.168.1.2) is

$ rsync -avxS /mnt/network1 ~/somedir

I'm not sure if there's a way to tweak my NFS settings to handle huge files better, but I'd like to see if running an rsync daemon over plain old TCP works better than rsync over NFS.

So, to reiterate, how do I set up a similar network with TCP?

UPDATE:

So, after a good at few hours of attempting to pull myself out of the morass of my own ignorance (or, as I like to think of it, to pull myself up by my own bootstraps) I came up with some useful facts.

But first of all, what led me on this rabbit trail instead of simply accepting the current best answer was this: nc is an unbelievably cool program that resolutely fails to work for me. I've tried the netcat-openbsd and netcat-traditional packages with no luck whatsoever.

If both directories are local you're better off just using plain old /bin/cp or not use NFS at all
–
KarlsonSep 17 '12 at 12:46

Runing rsync against a file accessed over NFS means that the entire contents of the file needs to copied over the network at least once. You don't need a daemon to invoke a client/server rsync - just run it over ssh. (it's theoretically possible to invoke the remote end over telnet / rsh - but rather silly to run such a service in practice - ssh doesn't add a lot of overhead).
–
symcbeanSep 17 '12 at 12:55

actually, ssh adds a pretty large amount of overhead, crypto is not cheap. Over normal Internet speeds, it doesn't matter, but over a LAN (or direct cross-connect, in this case) you may notice. Over gigabit, except on the very fastest machines (or ones with AES-NI instructions, if SSH uses those) I'm pretty sure it'll be noticeable.
–
derobertSep 17 '12 at 16:49

2 Answers
2

The quick way

The quickest way to transfer files over a LAN is likely not rsync, unless there are few changes. rsync spends a fair bit of time doing checksums, calculating differences, etc. If you know that you're going to be transferring most of the data anyway, just do something like this:

That uses netcat (nc) to send tar over a raw TCP connection on port 1234. There is no encryption, authenticity checking, etc, so its very fast. If your cross-connect is running at 100mbps, you'll peg the network; if its gigabit, you'll peg the disk (unless you have a storage array, or fast disk). The v flags to tar make it print file names as it goes (verbose mode). With large files, that's practically no overhead. If you were doing tons of small files, you'd turn that off. Also, you can insert something like pv into the pipeline to get a progress indicator:

user@dest:/target$ nc -q 1 -l -p 1234 | pv -pterb -s 100G | tar xv

You can of course insert other things too, like gzip -1 (and add the z flag on the receiving end—the z flag on the sending end would use a higher compression level than 1, unless you set the GZIP environment variable, of course). Though gzip will probably actually be slower, unless your data really compresses.

If you really need rsync

If you're really only transferring a small portion of the data that has changed, rsync may be faster. You may also want to look at the -W/--whole-file option, as with a really fast network (like a cross-connect) that can be faster.

The easiest way to run rsync is over ssh. You'll want to experiment with ssh ciphers to see which is fastest, it'll be either AES or Blowfish, depending on if your chip has Intel's AES-NI instructions (and your OpenSSL uses them). rsync-over-ssh looks like this:

Blowfish would be blowfish-cbc, and you could also try aes128-cbc. OpenSSH does not allow running without a cipher. You can of course use whichever rsync options you like in place of -avP. And of course you can go the other direction, and run the rsync from the destination machine (pull) instead of the source machine (push).

Making rsync faster

If you run an rsync daemon, you can get rid of the crypto overhead. First, you'd create a daemon configuration file (/etc/rsyncd.conf), for example on the source machine (read the rsyncd.conf manpage for details):

How? Or TL;DR

Using this I've achieved sustained local network transfers over 950 Mb/s on 1Gb links. Replace the paths in each tar command to be appropriate for what you're transferring.

Why? mbuffer!

The biggest bottleneck in transferring large files over a network is, by far, disk I/O. The answer to that is mbuffer or buffer. They are largely similar but mbuffer has some advantages. The default buffer size is 2MB for mbuffer and 1MB for buffer. Larger buffers are more likely to never be empty. Choosing a block size which is the lowest common multiple of the native block size on both the target and destination filesystem will give the best performance.

Buffering is the thing that makes all the difference! Use it if you have it! If you don't have it, get it! Using (m}?buffer plus anything is better than anything by itself. it is almost literally a panacea for slow network file transfers.

If you're transferring multiple files use tar to "lump" them together into a single data stream. If it's a single file you can use cat or I/O redirection. The overhead of tar vs. cat is statistically insignificant so I always use tar (or zfs -send where I can) unless it's already a tarball. Neither of these is guaranteed to give you metadata (and in particular cat will not). If you want metadata, I'll leave that as an exercise for you.

Finally, using ssh for a transport mechanism is both secure and carries very little overhead. Again, the overhead of ssh vs. nc is statistically insignificant.

openssl speed on an i7-3770 gives ~126–146 MB/sec for blowfish CBC and ~138–157 MB/sec for AES CBC (this chip has AES-NI instructions). Then ~200–300 MB/sec for sha256. So it can just barely push 1 gigabit. With OpenSSH 6.1+, you could use AES GCM, which it can do at blinding rates (370–1320 MB/sec, depending on message size). So I think its only true that OpenSSH has little overhead if you are running 6.1+ on a chip with AES-NI and using AES-GCM.
–
derobertNov 12 '13 at 15:27

1

Ugh, I changed that to 6.1+ instead of 6.2+ at the last minute, having quickly re-checked. Of course, that was a mistake, it is changes since 6.1. So OpenSSH 6.2+ is the correct version. And it won't let me edit the comment any more now. Comments older than 5 minutes must remain incorrect. Of course, if less than OpenSSH 6.4, see openssh.com/txt/gcmrekey.adv as without a patch, there was an exploitable flaw in OpenSSH's AES-GCM implementation.
–
derobertNov 12 '13 at 15:34

The overhead for ssh (or rsync over ssh) is very, VERY important. I have a NAS that uses an Intel Atom CPU. The SSH encryption ABSOLUTELY TANKS the transfer speed. I get consistently < 400 Mbit/sec for RSA, manually overriding it to RC4 gets me ~600 Mbits/sec, and if I use rsync as a daemon, it runs at the link native speed (> 900 MBit/sec, on a gigabit connection).
–
Fake NameOct 24 '14 at 8:20

While it's true that for many situations, the transport is not critical, it is absolutely important to consider it, particularly if you're not running on extremely high-end hardware. In my case, the Atom (it's a D525, dual core 1.8 Ghz) makes for a completely fine NAS, with plenty of speed for SMB, but encryption absolutely kills it.
–
Fake NameOct 24 '14 at 8:22

I get an fatal error due to the parametrization of mbuffer: 'mbuffer: fatal: total memory must be larger than block size \n Terminated' . To correct, I suspect it should read something like 'mbuffer -s 1K -m 512M' with the final 'M' standing for MByte (source: man mbuffer)
–
Peter LustigApr 19 at 6:52