Monday, January 3, 2011

This article originally appeared quite some time ago. But for some unknown reason, it was lost from the indexes. I've just come back to upgrade it with some new error observations.

We now return you to your regularly scheduled read...

rsync is an amazing and powerful tool for moving files around. I know of people that use it for file transfers, keeping dns server records up-to-date, and along with sshd to remote restart the services when rsync reports a file change (how they do that, I don't know, I'm just told they do it).

This article describes how you can use rsync to synchronize file trees. In this case, I'm using two websites to make sure one is a backup of the other. As an example, I'll be making sure that one box contains the same files as the other box in case I need to put the backup box into production, should a failure occur.

Overview

rsync can be used in six different ways, as documented in man rsync:

for copying local files. This is invoked when neither source nor destination path contains a : separator

for copying from the local machine to a remote machine using a remote shell program as the transport (such as rsh or ssh). This is invoked when the destination path contains a single : separator.

for copying from a remote machine to the local machine using a remote shell program. This is invoked when the source contains a : separator.

for copying from a remote rsync server to the local machine. This is invoked when the source path contains a :: separator or a rsync:// URL.

for copying from the local machine to a remote rsync server. This is invoked when the destination path contains a :: separator.

for listing files on a remote machine. This is done the same way as rsync transfers except that you leave off the local destination.

I'll only be looking at copying from a remote rsync server (4) to a local machine and when using a remote shell program (2).

InstallingThis was an easy port to install (aren't they all, for the most part?). Remember, I have the entire ports tree, so I did this:

# cd /usr/ports/net/rsync

# make install

If you don't have the ports tree installed, you have a bit more work to do.... As far as I know, you need rsync installed on both client and server, although you do not need to be running rsyncd unless you are connecting via method 4.

Setting up the serverIn this example, we're going to be using a remote rsync server (4). On the production web server, I created the /usr/local/etc/rsyncd.conf file. The contents is based on man rsyncd.conf.

uid = rsync

gid = rsync

use chroot = no

max connections = 4

syslog facility = local5

pid file = /var/run/rsyncd.pid

[www]

path = /usr/local/websites/

comment = all of the websites

You'll note that I'm running rsync as rsync:rsync. I added lines to vipw and /etc/group to reflect the new user. Something like this:

rsync:*:4002:4002::0:0:rsync daemon:/nonexistent:/sbin/nologin

and

rsync:*:4002:

Then I started the rsync daemon and verified it was running by doing this:

I also wanted deleted server files to be deleted on the client. So I did this:

rsync -avz --delete ducky::www /home/dan/test

Of course, you can combine all of these arguments to suit your needs.

I found the --stats option interesting:

Number of files: 2707

Number of files transferred: 0

Total file size: 16022403 bytes

Total transferred file size: 0 bytes

Literal data: 0 bytes

Matched data: 0 bytes

File list size: 44388

Total bytes written: 132

Total bytes read: 44465

SecurityMy transfers are occur on a trusted network and I'm not worried about the contents of the transfer being observed. However, you can use ssh as the transfer medium by using the following command:

rsync -e ssh -avz ducky:www test

Note that this differs from the previous example in that you have only one : (colon) not two as in the previous example. See man rsync for details. In this example, we will be grabbing the contents of ~/www from host ducky using our existing user login. The contents of the remote directory will be synchronized with the local directory test.

Now if you try an rsync, you'll see this:

$ rsync -e ssh -avz --delete ducky:www /home/dan/test

Password:

@ERROR: auth failed on module www

Here I supplied the wrong password and I didn't specify the user ID. I suspect it used my login. A check of the man page confirmed this. This was my next attempt. You can see that I added the user name before the host, ducky..

$ rsync -e ssh -avz --delete susan@ducky:www /home/dan/test

Password:

receiving file list ... done

wrote 132 bytes read 44465 bytes 1982.09 bytes/sec

total size is 16022403 speedup is 359.27

In this case, nothing was transferred as I'd already done several successful rsyncs.

The next section deals with how to use a password in batch mode.

Do it on a regular basisThere's no sense in having an rsync set up if you aren't going to use it on a regular basis. In order to use rsync from a cron job, you should supply the password in a non-world readable file. I put my password in /home/dan/test/rsync.password. Remember to chmod 640 that password file!

I put the command into a script file (rsync.sh), which looks like this:

If you want to use ssh as your transport medium, I suggest using using the authorized_keys feature.

My comments

I think rsync is one of the most powerful tools I've seen for transferring files around a network and the Internet. It is just so powerful! Although I actually use cvsup to publish the Diary, I am still impressed with rsync.

Some recent errors I encounteredI was recently adding some new files to my rsync tree. I found these errors:

receiving file list ... opendir(log): Permission denied

opendir(fptest): Permission denied

opendir(example.com): Permission denied

opendir(example.org): Permission denied

readlink dan: Permission denied

opendir(default): Permission denied

It took me a while to understand the problem. It's a read issue. rsyncd didn't have permission to read the files in question. You can either make rsynd run as a different user, or change the permissions on the files.