I'm backing up a Linux box over SMB to a NAS. I mount the NAS locally and then I rsync a lot of data (100GB or so). I believe it's taking an awfully long time to do it: more than 12 hours. I would expected to be much faster once everything is copied since almost nothing is changed from day to day.

Is there a way to speed this up?

I was thinking that maybe rsync thinks it's working with local hard disks and uses checksum instead of time/size comparisons? But I didn't find a way to force time and date comparisons. Anything else I could check?

Check the NAS's capabilities using a port mapper, like nmap. I've run into several NAS units that ran a native rsync service, even though there was no mention in the documentation, and no mention in the config.
–
Kyle__Aug 16 '11 at 17:36

7 Answers
7

I think you're having a misunderstanding of the rsync algorithm and how the tool should be applied.

Rsync's performance advantage comes from doing delta transfers-- that is, moving only the changed bits in a file. In order to determine the changed bits, the file has to be read by the source and destination hosts and block checksums compared to determine which bits changed. This is the "magic" part of rsync-- the rsync algorithm itself.

When you're mounting the destination volume with SMB and using rsync to copy files from what Linux "sees" as a local source and a local destination (both mounted on that machine), most modern rsync versions switch to 'whole file' copy mode, and switch off the delta copy algorithm. This is a "win" because, with the delta-copy algorithm on, rsync would read the entire destination file (over the wire from the NAS) in order to determine what bits of the file have changed.

The "right way" to use rsync is to run the rsync server on one machine and the rsync client on the other. Each machine will read files from its own local storage (which should be very fast), agree on what bits of the files have changed, and only transfer those bits. They way you're using rsync amounts of a trumped-up 'cp'. You could accomplish the same thing with 'cp' and it would probably be faster.

If your NAS device supports running an rsync server (or client) then you're in business. If you're just going to mount it on the source machine via SMB then you might as well just use 'cp' to copy the files.

I can't run rsync server on the NAS, otherwise I would be doing so. When not using an rsync server, rsync can use the checksum or the size and datetime to find out whether a file changed or not. According to the man page, it'll use the size and datetime by default, but my experience is that it is not doing that and I don't see a way to force it. I only see a way to force checksumming. --checksum: Without this option, rsync uses a "quick check" that (by default) checks if each file's size and time of last modification match between the sender and receiver.
–
J. Pablo FernándezSep 24 '09 at 8:46

What behaviour are you seeing that's telling you that it's checksumming the files? The "quick check" behaviour is the default behaviour, so there's no way to "force" it. If you can't run rsync on the NAS just use 'cp'. It'll be as fast or faster.
–
Evan AndersonSep 24 '09 at 8:52

According to how I understand rsync work, it should check the local date and time, the remote date and time and if they match not copy the file. Which means it shouldn't copy 99% of the files, but the fact that it takes more than 12hs for 60GB or so tells me that is either copying everything (which seems to be what you are implying by saying that cp will be faster) or that it is actually checksumming, which means it's not copying everything, but it is downloading everything.
–
J. Pablo FernándezSep 24 '09 at 8:59

My brother has just installed a Buffalo NAS on his office network. He's now looking at off-site backups, so that should the office burn down, at least he still has all his business documents elsewhere (many hundreds of miles away).

My first hurdle was to get the VPS he has (a small Linux virtual private server, nothing too beefy) to dial-in as a VPN user to his broadband router (he's using a DrayTek for this) so that it itself can be part of his VPN, and so it can then can access the NAS directly, in a secure fashion. Got that sorted and working brilliantly.

The next problem was then transferring the files from the NAS to the VPS server. I started off by doing a Samba mount and ran into exactly the same (or even worse) issue that you've described. I did a dry-run rsync and it took over 1 hour 30 mins just to work out what files it was going to transfer, because as Evan says, under this method, the other end isn't rsync so it has to do many filing system calls/reads on the Samba mount (across a PPTP/tunnelled connection, with a round trip time of about 40ms). Completely unworkable.

Little did I know that the Buffalo actually runs an rsync daemon so, using that instead, the entire dry-run takes only 1 minute 30 seconds for 87k files totalling 50Gb. Obviously, to transfer 50Gb of files (from a NAS that is on a broadband link with only 100k/sec outbound bandwidth) is another matter entirely (this will take several days) but, once the initial rsync is complete, any incremental backups should be grease lightening (his data is not going to change much on a daily basis).

My suggestion is use a decent NAS, that supports rsync, for the reasons Evan has said above. It will solve all your problems.

Yes, you can speed it up. You need to make either the source or destination look like a remote machine, say by addressing it as "localhost:".

You stated that you are mounting the SMB share locally. This makes the source or destination look like a local path to rsync. The rsync man page states that copies where the source and destination are local paths will copy the whole file. This is stated in the paragraph for the "--whole-file" option in the man page. Therefore, the delta algorithm isn't used. Using the "localhost:" workaround will restore the delta algorithm functionality and will speed up transfers.

Smells like you have a cheaper NAS. It could also be from your network bandwidth...

"Standard" consumer NAS are really weak when it comes to heavy IO which is what you are trying to do here. It could also be a cheap switch connecting your PC and your NAS that is not strong enough to handle all the packets correctly.

There are two potential sources of the problem - either you use incorrect comman line options or your NAS has issues with timestamping (or both :-). Please check this thread "rsync to NAS copies everything every time" for more info.