This question came from our site for professional and enthusiast programmers.

2

what about a "nice rsync" ? The system load spiking is exactly what you want: Finish the process AFAP, and this is OK as long as it doesn't interfere with the website's operation.
–
Eugen RieckDec 27 '11 at 13:39

Thanks - I'm already doing a "nice rsync", which does help
–
David LaingDec 27 '11 at 14:23

2 Answers
2

Actually I would suggest using a balanced mix of both. Your main backup should be committed (at least) every night to git. Sync it once or twice a week to another machine which is kept way far from the production box using rsync.

Git will help you with immediate recovery and it also makes analysis of data easier owing to the fact that you backup is version-ed and has a changelog. After any major change to the data, you can do a commit and push to git manually and put the reason in changelog. In case git goes bad then rsync will come to the rescue but keep in mind that you'll still loose data depending upon the frequency of rsync.

Rule of thumb: when it comes to backups and disaster recovery, nothing can guarantee to give you 100% recovery.

Rsync is a wonderful sync tool, but you get a lot more versatility when running Git on the server(s), and pushing or pulling updates.

I have to track and backup user generated content on our server. The production server has a copy of the git repo, and every night it automatically adds and commits all of the new files via cron. Those are pushed to our gitolite server, which then uses hooks to sync the rest of the servers.

Since the servers have copies of the repo on-board, you get not only a snapshot, but detailed history information that could easily save you if anything happened to your server.

I think you pretty much have a good understanding of what both offer, I'd just change your line of thinking from servers checking out/exporting the codebase to just having their own repos. Another thought is that you could rsync your media files (you said 2GB for some of these sites, which makes me think there are a lot of media type of files?) and not track them in Git.

I've added some performance data; which shows that rsync is nearly always faster than git. However, I like your points about the extra power of having git repos on the live server - I'm wondering if a hybrid approach isn't best, with changes being pushed into the git repo, and then git repos being rsynced to the backup server...
–
David LaingDec 28 '11 at 0:58