Rclone

Rclone allows one to sync files and directories to and from cloud storage via the command line. In combination with box.byu.edu, where BYU students and faculty get unlimited free storage, it can make storing and backing up archival data much easier. Rclone+Box will help users who routinely run up against storage space constraints and who wish to back up data that can only fit in compute. Those who wish to collaborate without making others get FSL accounts can upload to Box with Rclone, then share their data with collaborators (even if those collaborators don't have Box accounts).

This tutorial will show how to configure Rclone with Box, a few of the most useful commands, and a couple worked examples. It is by no means comprehensive, so those wanting to learn more should reference the documentation, which is excellent.

Note that while the storage on box is unlimited, expansive storage comes at a cost: Box is slow, so it takes a while to move big chunks of data. Additionaly, files stored there are cannot exceed 32 GB in size.

Configuration

Keep in mind that Rclone need only be configured once--as soon as you've finished the steps below, you should never need to do so again.

Port Forwarding

When setting up box, Rclone needs access to a web browser; since the supercomputer doesn't have a browser, you'll need to connect one of its ports to your computer. To do so, add a little bit to your ssh command:

ssh -L localhost:53682:localhost:53682 username@ssh.fsl.byu.edu

This tells ssh to make a tunnel, allowing your local machine (and its browser) to access the port Rclone will use for configuration. This tunnel is no longer required after Rclone is configured, so you needn't add '-L localhost:53682:localhost:53682' on subsequent logins.

rclone config

To access Rclone itself, load the rclone module:

module load rclone

Once that's done, run rclone config. This will give you a few options:

No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n

Enter n to make a new remote. Give it a name (e.g. box), then choose which storage service you'd like to configure (you can type box for box.byu.edu, drive for Google Drive, etc.).

It'll ask for Box App Client Id and Box App Client Secret; most users should simply hit enter to leave these blank. You'll then be asked if you want to "Edit advanced config" (most users should enter n):

Edit advanced config? (y/n)
y) Yes
n) No
y/n> n

Next, you will be asked whether to use auto config:

Remote config
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine
y) Yes
n) No
y/n> y

Even though you are on a remote machine, you'll still say yes--your computer has access to the remote port that Rclone is about to use, so you are effectively using a local machine. When you enter y, you'll see the following:

If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth
Log in and authorize rclone for access
Waiting for code...

Ctrl-clickthe link or copy-paste it into your browser. If you're not logged in to box, it will ask for your credentials; use yournetid@byu.edu for the email address. You'll then see a screen with a big blue Grant access to Box button--click it, and you should be greeted with a success message. Go back to the terminal and type y at the prompt:

Usage

This tutorial will only cover the basics due to the clarity and breadth of Rclone's exceptional documentation, which should be your first resource when learning its usage. Typing rclone --help will result in a deluge of information, but the "Available Commands" section of the help message gives a good synopsis of each command. For help on a specific command, you can also use rclone <command> --help (e.g. rclone copy --help).

Listing files

Rclone gives a few methods for listing files; none of them are quite like Unix's ls, but rclone lsf --max-depth 1 remote:path/to/dir comes close. A few more examples:

Creating Directories

rclone mkdir behaves like Unix's mkdir; to create a new directory on a remote, you would use something like:

rclone mkdir box:fsl/myNewDirectory

Examples

Move Archival Data to Box

Say you have a directory with data that needs to be kept, but you don't expect to do any work on it with the supercomputer, and you're running out of space. You can either move it directly, or compress it and move it. Moving it directly is easier and you'll be able to look at the data directly at box.byu.edu, but compressing then moving could be much faster.

Generally, if you have a few big files (which must be under 32 GB, of course) you won't be slowed down too much by copying directly, but if you have many small files it will take a long time. Under ideal conditions, you can copy 4 files per second (across all processes--Box limits transfers by user). If you have a million files, that means it will take at least a few days to transfer them, no matter how small they each are.

To move without compressing, simply use:

rclone move ~/compute/dataset box:fsl/dataset

There are two main ways to compress then move data. This one is slower and more reliable:

This one is faster and doesn't use significant disk space, but the work will be lost of the command is interrupted:

tar -czf - ~/compute/dataset | rclone rcat box:fsl/dataset.tar.gz

Backup compute with Box

Perhaps you have a large set of data in ~/compute/dataset, which is too big to fit in your home directory, that you would like to back up weekly. Say you set up the following directory structure to store the backups:

The current backup will live at box:fsl/dataset/primary, while older snapshots, organized by date, will go in box:fsl/dataset/old/. To get started, let's copy over dataset to the current backup directory at box:fsl:

rclone copy ~/compute/dataset box:fsl/backup/dataset/primary

Keep in mind that Box is slow, so this may take some time. If you want to exit your ssh session while the copy is going, you may want to use screen or tmux.

Once the copy is done, you'll need to back up every week (or however frequently you would like to). This could go something like:

If you want to do this regularly, you can put it in a script and run it at your convenience; on the new operating system, you can use cron to run it automatically at regular intervals. To make the script (we'll call it do_rclone_backup.sh) execute weekly, use crontab -e to edit your crontab and enter something along the lines of 0 X * * Y bash /path/to/do_rclone_backup.sh (replacing X with an hour, 0-24, and Y with a day of the week, 0-6). Your backup script will now run once a week with no intervention from you. This tutorial goes into more depth in case you want to back up more or less frequently or would like to learn more about cron generally.

Tuesday, May 28

News / Notices

About 2/3 of all nodes have been migrated to the RHEL 7 image, with more planned in the future. A few pieces of software have not been migrated yet so please let us know if you find that something is missing.

Due to the transition to RHEL 7, our utilization statistics are inaccurate. We apologize for the inconvenience.

Mission of the Office of Research Computing: To facilitate and enhance computationally-intensive research at BYU by providing reliable, state-of-the-art, high performance computing resources to faculty and students.