I needed a regular way to back this data up regardless of my location. At home I have a Linux server that I use for this purpose. Of course, that server is behind a firewall, so getting to it while at home vs while remote can be interesting. The diagram below shows the general situation:

The tricky part being that my laptop may appear on the internal network and may appear out on the internet and I want the backup to magically work regardless of the location.

Rsync combined with SSH seemed an ideal solution to the problem. I had to configure things so that they would work in the following scenarios:

At home where the backup server is available via a local network connection. This is the most efficient since I'm on a local connection and the VPN status doesn't matter since the connection is to a local IP address which is excluded from the VPN traffic.

On the road with the corporate VPN running. In this case I have to go through one of the corporate SOCKS servers.

On the road without the VPN. In this case I just go directly through my firewall (so similar to the local connection but instead of a local IP address, I use the network visible IP address).

In order to handle these scenarios, I've developed a layered model that takes several steps:

Step 1: Create an SSH tunnel from my laptop to my internal firewall:

This would go through our corporate SOCKS proxy if I'm at the office or on the VPN.

Step 2: Create an SSH Tunnel through the tunnel created in Step 1 to my backup server

Step 3: Run Rsync across the SSH tunnel created in Step 2.

When I'm home, I use the same layered model (probably because I'm too lazy to add the code to skip one of the layers) which looks like:

The first 3 options allow the location to be manually forced (useful in some situations where it looks like I have connectivity that I don't).

The next option checks to see if I can ping the local address (${GatewayIP}) on the firewall (which would mean I am home or there just happens to be another system with the same IP address in my local network -- possible, but I haven't run into that problem often).

The next option attempts to ping the external address on the gateway (which would mean that I'm away from home, but without the corporate VPN).

And, of course, if none of them work, the assumption is that I'm behind the corporate firewall and have to use the corporate SOCKS proxy.

-L 2222:${DESTIP}:22 - setup a port forward of local port 2222 to port 22 on ${DESTIP} (from the remote system). This means that any connections to port 2222 on the local system will be forwarded to connections to port 22 on ${DESTIP}. ${DESTIP} is the ip address of the backup server.

-l username - use username as the login on the gateway system.

${GATEWAY} - the system that this SSH connects to.

Since the SSH is started in the background, I have a small loop running waiting until a listerner starts up on port 2222 before we can proceed with the next step.

Establish the 2nd SSH tunnel (through the firewall to the backup server):

This ssh command uses the tunnel established in the previous step to connect to the backup server (${DESTHOST}) with the following options:

-2 - use SSH protocol version 2.

-n - don't ready anything from (needed if running SSH in background).

-C - compress data on this connection. I didn't compress data on the outer SSH since you don't want to compress twice and this layer would see the original data while the outer layer would see encrypted (and likely less compressable) data.

-N - don't execute a remote command. Normally ssh would startup a login shell on the remote system. In this case, we're just creating a tunnel so no need for a remote command.

-p 2222 - connect to port 2222 (the port we have the listener configured for from the outer ssh layer).

-L 1873:127.0.0.1:873 - setup a port forward of local port 1873 to port 873 (the rsyncd port) on localhost(the remote system). This means that any connections to port 1873 on the local system will be forwarded to port 873 on the remote system.

-l ${DESTUSER} - use ${DESTUSER} as the login on the backup system.

${DESTHOST} - this would normally be localhost since the listener from the outer layer ssh is listening on the local system. However, ssh really gets upset when you have remote entities that look like the same entity but have different private keys, so the host here will be an alias for localhost defined in the /etc/hosts file. So my local /etc/hosts file has a line that looks like:

127.0.0.1 mysystem localhost backupsys

And ${DESTHOST} has the value "backupsys".

Since the SSH is started in the background, I have a small loop running waiting until a listerner starts up on port 1873 before we can proceed with the next step.

--times - keep the file access/modification times on the files on the remote system the same as those on the local system.

--delete-after - delete files that are not present on the local system after the backup is done (as opposed to first).

--archive - use archive mode (shortcut for specifying a series of other flags)

--exclude */Cache/ - exclude any cache files from the backup

. - backup starting in the current directory

user@backupsys::data/home - backup to the system backupsys using the user name "user" and into the data/home directory on the backup server. The rsync password for "user" is specified in the environment variable "RSYNC_PASSWORD" so that it isn't easily visible on the process list.

Finally, now that the backup is complete, tear down the SSH tunnels that were started in the background:

The PIDs for the ssh sessions were saved during the creation of the tunnels.

This system has worked for me for several years and does quite a good job of allowing me to backup my data where ever I am (and since I travel an awful lot, being able to do so on the road is extremely useful).

7 comments:

hey there, conor! great article, and I wanted to give you some credit when paraphrasing some of what you said, but there doesn't seem to be a "trackback" link on this page. If you later care to add one, you can tip me off. Ref. article.