Introduction

This is a series of suggestions on how to easily and quickly keeping in sync files of your file system with files on
/nobackup on one of the HPC clusters. There are three main elements:

setting up your file systems in a smart way

syncing with rsync

make it work smoothly without need to enter passwords every time.

The scope is to make the syncing easy and smooth, so that you can delete or let files expire without the worries of copying those files back. This works if you are using a Mac or a Linux workstation.

If you have any suggestions or tips on how this might be accomplished on a Windows PC, please let us know and we’ll update this documentation accordingly.

Notation: For clarity, commands to be issued on the HPC machine will be prefixed with the prompt
[jsmith@login1~]$ , to correspond with what you would see if logged in to the ARC systems. Commands to be entered on your local system will be prefixed
[jsmit@your_system~]$ and
[jsmith@hostname~]$ if it can be entered on either machine.

Setting up The File Systems

We’ll assume you have your data in one or a few folders in your local
$HOMEdirectory.

The idea is to have your local file system (your workstation) and your remote file system (on the HPC service) look the same. We will use symbolic links to do so.

Let’s assume your ID is
jsmith on both systems and your data is in your local
$HOME/data

– now make a symbolic link in your HPC home directory that links to that directory

1

2

3

[jsmith@login1~]$ln-s/nobackup/jsmith/data data

now you have a link in your HPC home directory called
data . If you
cd into it you it it will show the content of what it is on
/nobackup/jsmith/data (nothing at the moment) but if you type
pwd it will tell you that you are in
$HOME/data .

Syncing

Now we want to sync some data. Because both filesystems look the same it is much easier to do.

Let’s say you want to sync data in your local system’s
$HOME/data/project1 to the HPC cluster. Here is the command to do it.

BE CAREFUL: What we are asking is to copy the directory project1. Don’t add a trailing
/ at the end, or it will copy the content of the directory into ARC2’s (in this case)
~/data without creating the folder
project1 .

rsync works over
ssh , so if you can connect using ssh, you should not have problems. It is secure and has several advantages, in particular, before starting, it checks which files have changed or are missing and only transfer the data required to re-sync the two folders. Furthermore, it means that if the connection fails, you don’t have to start from scratch.

After you run your analysis you can copy data back in the same way, either “pushed” from the HPC service or “pulled” from your computer.

One advantage of having the same filesystem structure on your local and remote computer is that you only need to “find” the folder/file you want to sync on your system.

--times preserve modification times. rsync, by default, checks file size and modification times to decide if a file was changed. If modification time differs, performs a checksum of both files. This might be slower than the transfer itself! To always force checksum use
-c , to only compare file size use
--size-only . Be careful with this last option.

--perms keep file permissions. Destination files will be set with same permissions as origin.

--recursive recurse into directories.

--progress and
--stats give you something to look at if bored (and want to monitor the connection) and a final report to be impressed by

Automatic Authentication

Once you are familiar with
rsync , you will notice that having to enter the password every time becomes the annoying bit. Fortunately there is a solution for this too! It is a bit lengthy, but worth it.

To set up the automatic authentication (sometimes known as passwordless login), follow these steps:

Generate a private and public ssh keypair on your local system

upload the public key to remote system e.g., ARC2

generate ssh key pair on remote system, ARC2

send public key from ARC2 to your local system.

Generate ssh Keypair on Your Local System

To generate the key pair, from your computer:

1

2

3

[jsmith@your_system~]$ssh-keygen

Accept the default key location when prompted, typically
~/.ssh/id_rsa and
~/.ssh/id_rsa.pub for private and public key repectively, and provide ssh-keygen with a secure passphrase. Once
ssh-keygen completes, you’ll have a public key as well as a passphrase-encrypted private key. The passphrase should not be the same as the one you use to log in.

Upload Public Key to The HPC Cluster

Now we need to upload the public key to the HPC cluster. As it will be useful setting up the automated login to and from the HPC machine, we will copy the public key in the “authorized_keys” file and then transfer it over to, instead of coping the public key,
id_rsa.pub directly.

Don’t copy the private key- that should be kept securely on your local system.

On your local machine issue the following commands:

1

2

3

4

[jsmith@your_system~]$cat~/.ssh/id_rsa.pub>>~/.ssh/authorized_keys

[jsmith@your_system~]$scp~/.ssh/authorized_keys jsmith@arc2:~/

You will still be prompted for your password at this point.

IMPORTANT: the private key should be, well, private. It is encrypted but still you should be the only one with reading rights on that file. Never share it. It is possible to leave the passphrase blank but this will substantially weaken the security of your keys. If left blank, anyone who gets hold of your private keys will be able to login as you.

Now log on to the HPC machine it will still require a password at this point:

Using
keychain for Automatic Authentication

If everything is in place, now you should be able to log in using the keys. However, because they are encrypted you will still be prompted for the passphrase. So “what’s the point?” you might ask. Well, in MacOSX you will have the option to keep the passphrase in keychain. If you do, you will not be not prompted for the passphrase anymore.

In a Linux box you can achieve the same using the
keychain application.

Download it using:

1

2

3

wget http://www.funtoo.org/archive/keychain/keychain-2.7.1.tar.bz2

OR

1

2

3

wget http://www.net-security.org/dl/software/keychain-2.7.1.tar.bz2

extract it:

1

2

3

[jsmit@hostname~]$tar-xjf keychain-2.7.1.tar.bz2

and now run it providing you private key:

1

2

3

[jsmith@hostname~]$keychain-2.7.1/keychain id_rsa

You will be prompted for the passphrase. And finally

1

2

3

[jsmith@hostname~]$source$HOME/.keychain/$HOSTNAME-sh

now you should be able to ssh in (or rsync) as many time as you want without entering the passphrase!

If you exit your session, then you’ll need to repeat the last two steps. You can put them in your
.bash_profile though, so that it is automatically executed each time. Or make a bash script with those two commands.