I need to backup many sites over slow, high latency satellite links. We are talking pings of 650ms to each site. I can get a dump of the initial data by sending out USB disk to site and restoring it at central office. From then on I intend to use rsync or DFS-R for byte-level incremental copies across the link. All the machines are Windows 2003 SP2 R2. I have read that rsync can hang on large files and is still a bit flakey for Windows?

Alternatively should I use DFS-R which also does byte-level copies?

I have tried DFS-R in the past and was not impressed by the lack of logging, it was very hard to find out what was actually going on. That's why I'm interested in rsync. Has anyone any real world experience of both methods?

Unix and Windows permissions work in very different ways and mapping between them is not a trivial process. Unless you set the environment variable CYGWIN=nontsec Cygwin rsync will attempt to copy the ACL attached to files, but it will almost certainly mess it up in the process and the results won't be useful.

One obvious problem is that an Access Control Entry (an ACE is an entry in an Access Control List) identifies the user or group that it applies to by a GUID. While there there are some "well known" GUIDs that apply to all Windows installations, e.g. "SYSTEM" and "Everyone", most GUIDs only have a meaning on the server or domain in which they were created. So if an ACL is rsynced to another server it will probably contain GUIDs that don't refer to any user account on that server, and the ACL will be useless.

This applies to any program that copies files, not just rsync. A cynic might suggest that Cygwin rsync is especially effective at mangling ACLs, but this seems unfair to me as handling Windows security is such a hard thing for Cygwin to do.

Bearing in mind that with inherited permissions you normally only have to set permissions on a few high level folders, I strongly recommend you set CYGWIN=nontsec and don't attempt to replicate security.

Thanks, I've used your page before to setup rsync. I will try the batch files as well to automate the processes. You might want to update your page because I used the latest rsync (3.04?) and it required an extra cygwin dll (cygiconv-2.dll).
–
PowerApp101Jun 17 '09 at 12:10

One more thing - when I played with rsync I found it didn't copy any NTFS permissions. Worse, I found I had no access to the copied folder until I changed permissions on it. Is this expected behaviour? Is there any way to get rsync to copy permissions on Windows?
–
PowerApp101Jun 17 '09 at 12:11

1

Ah, yes, I do need to update my article to include the new dll. I've edited my post to include more stuff about security as it's too big to fit in a comment.
–
John RennieJun 17 '09 at 13:40

I tried rsync on a large directory of 40,000 files (16GB) over a slow link. It failed with "read error: connection reset by peer (104)". I'm thinking it is not robust over very slow links with a large amount of files?
–
PowerApp101Jun 19 '09 at 7:37

Was this over the LAN? "read error: connection reset by peer (104)" normally means the network connection between the two computers failed. I see it a lot when syncing through ADSL because ADSL lines do dropout fairly frequently. The other possible reason is that rsync at the remote end crashed, but I've never had that happen. If you restart the sync it will pick up from where it left off.
–
John RennieJun 19 '09 at 8:09

Given you had 650ms latency to start with that's probably the root cause of the network failure message. If the link got busy or some other factor caused the latency to increase resulting in something timing out. I'm not massively familiar with Rsync but if there's a config option to increase time-out periods and retry counts I would try that and see what happens.

What sort of bandwidth links are these?

DFS would have been my first choice between 2003 R2 boxes. Logging wise you might not get a lot from DFS itself but you can use file access and change logging at either end to give you more idea of what's being modified and when. You should also use the bandwidth throttling function if you didn't already so that you don't saturate your links as this might cause disconnects/time-outs even when the links are functioning.