High performance computing

NOTE: We are in the process of trialing this service to users so that we can
make the service as accommodating and secure as possible. This means that items
concerning the service, including this documentation, are subject to change.
We will do our best to keep everyone updated and notified of changes as they come.

Introduction

At the OCF we offer a High Performance Computing (HPC) service for individuals
and groups that need to run computationally demanding software. We currently
have one main HPC server; however, we have plans to expand the cluster to make
use the resources at our disposal.

Gaining Access

In order to access the HPC cluster, please send an access request to
help@ocf.berkeley.edu. Make sure to include
your OCF username or group account name
and a detailed technical description of the projects you plan to run on our
HPC infrastructure. This would include information about the nature of the
software being run, as well as the amount of computational resources that are
expected to be needed.

Connecting

Once you submit your proposal and are approved access you will be able to
connect to our Slurm master node via SSH by running the following command:

ssh my_ocf_username@hpcctl.ocf.berkeley.edu

If you have trouble connecting please contact us at
help@ocf.berkeley.edu, or come to
staff hours when the lab is open and chat with us in person.
We also have a #hpc_users channel on slack and irc
where you can ask questions and talk to us about anything HPC.

The Cluster

As of Fall 2018, the OCF HPC cluster is composed of one server, with the
following specifications:

We have plans to expand the cluster with additional nodes of comparable
specifications as funding becomes available. The current hardware was
generously funded by a series of grants from the Student Tech Fund.

Slurm

We currently use Slurm as our workload manager for the cluster.
Slurm is a free and open source job scheduler that evenly distributes jobs
across an HPC cluster, where each computer in the cluster is referred to
as a node.
The only way to access our HPC nodes is through Slurm.

Dependencies

For managing application dependencies, you currently have two options:

Virtual Environments

First you can use a virtual environment if you are using Python
packages. To create a virtual environment navigate to your home directory
and run the following commands:

virtualenv -p python3 venv
. venv/bin/activate

This will allow you to pip install any Python packages that the OCF does not
already have for your program.

Singularity

For those who need access to non-Python dependencies or have already integrated
their program into Docker, the second option is to use Singularity containers.
Singularity is a containerization platform developed at Lawrence
Berkeley National Laboratory that is designed specifically for HPC environments.
To read more about the benefits of Singularity you can look
here. We suggest a particular workflow, which will help
simplify deploying your program on our infrastructure.

Installing

We recommend that you do your development on our HPC infrastructure, but you
can also develop on your own machine if you would like. If you are running
Linux on your system, you can install Singularity from the official apt repos:

sudo apt install singularity-container

If you do not have an apt based Linux distribution, installation instructions
can be found here. Otherwise, if you are running Mac you can
look here, or Windows here.

Building Your Container

singularity build --sandbox ./my_container docker://ubuntu

This will create a Singularity container named my_container. If you are
working on our infrastructure you will not be able to install non-pip
packages on your container, because you do not have root privileges.

If you would like to create your own container with new packages, you must
create the container on your own machine, using the above command with
sudo prepended, and then transfer it over to our infrastructure.

The docker://ubuntu option notifies Singularity to bootstrap the container from
the official Ubuntu docker container on Docker Hub. There is also
a Singularity Hub, from which you can directly pull
Singularity images in a similar fashion. We also have some pre-built containers
that you may use to avoid having to build your own. They are currently located
at /home/containers on the Slurm master node.

Using Your Container

singularity shell my_container

The above command will allow you to shell into your container. By default your
home directory in the container is linked to your real home directory outside
of the container environment, which helps you avoid having to transfer files
in and out of the container.

singularity exec --nv my_container ./my_executable.sh

This command will open your container and run the my_executable.sh script in
the container environment. The --nv option allows the container to interface with
the GPU. This command is useful when using srun so you can run your program
in a single command.

Working on HPC Infrastructure

If you were using a sandboxed container for testing, we suggest you convert it
to a Singularity image file. This is because images are more portable and
easier to interact with than sandboxed containers. You can make this
conversion using the following command:

sudo singularity build my_image.simg ./my_sandboxed_container

If you were working on the image on your own computer, you can transfer it over
to your home directory on our infrastructure using the following command:

scp my_image.simg my_ocf_username@hpcctl.ocf.berkeley.edu:~/

To actually submit a Slurm job that uses your Singularity container and runs
your script my_executable.sh, run the following command:

This will submit a Slurm job to run your executable on the ocf-hpc Slurm
partition. The --gres=gpu option is what allows multiple users to run jobs
on a single node so it is important to include. Without it, you will not be
able to interface with the GPUs.