What is it?

Why you need it

You must use slurm commands on "premise.sr.unh.edu" to submit your jobs for
scheduling on the "premise" cluster's compute nodes. The options you provide
define the required hardware and restrict the nodes on which your job
will run.

Commands

All of the slurm commands support a --help option that provides a lot
good usager information.

sinfo

Shows the status of the compute nodes.

srun

Interactively run the given command on a remote node.

squeue

Shows your jobs that are running or waiting to run.

sacct

Shows your jobs that have completed or failed.

sbatch

Submit a job into the job queue.

Common sbatch Options

The expectation is that your slurm job uses no more resources than you
have requested. Unless you specify it the default is to run 1 task on
1 Node with 1 cpu (also called core or thread) and reserving 2MB of
physical RAM.

--help

Display the full list of options. Available for most slurm commands.

-n #

Number times this task will be executed. Assumed to be a
single thread unless specified otherwise. All tasks will be run on
one or more nodes simultaneously depending availability of the required
resources on each node.

-N #

Minimum number of nodes to run this task on.

--ntasks-per-node=n

Number of tasks to invoke on each node. Another way of
requesting your code be run on multiple nodes.

-c #

Number of CPUs (also called threads or cores) required by each task.

--mem=MB

Maximum number of MegaBytes RAM your code needs to run. Default is 2MB of RAM.

--mem-per-cpu=MB

Alternate way of providing --mem that is distributed by thread.

--gres=list

List of Generic RESources being requested. (See section
below.) Can be a single resource like "ramdisk", have a
quantity like "gpu:2". Normally only one item is listed, but
it could be a true comma separated list of resources.

-D path

Change directory to the provided path before executing the task.

--mail-user=ops@sr.unh.edu

Send job status email to the address given

--mail-type=type

The state changes you wish to be notified about. Values
include: BEGIN, END, FAIL or ALL

--time=#

Execution time limit in minutes. Your job will be killed if
not complete after running for this many minutes.

-p name

Partition to submit jobs to, defaults to "shared". Users who
have purchased nodes can use their exlusive "partition" to
queue their jobs with a higher priority the nodes they paid
for.

Your jobs will get a scheduling priority over jobs in the
"shared" queue, but they will only be run on your hardware,
even if "shared" nodes are idle. Using this option to gain
priority is not always the best choice.

srun -n 3 hostname

node105.rcchpc
node105.rcchpc
node105.rcchpc

Schedules the given command to be run 3 independant tasks. More than
one task may run simultaneously on a single node, until the resources
on that node are depleted. The above shows each single threaded job
ran on the same node. Running with "-n 30" would exceed the 24 cores
on a single node and require more than one node to run this job.

srun --exclude=node117,node118 hostname

node133.rcchpc

Exclude tells slurm not to consider these nodes when scheduling your
job. Use the "--exclude" option on "srun" or "sbatch" to reduce the
pool of available machines on which your job will be scheduled. This
can be very useful if your code doesn't run well on certain hardware,
or you would wish to ensure that your long running job does not tie up
a higher end node than needed.

At this time the Premise cluster has two nodes (node117 & node118)
with AMD (not Intel) based CPUs. Code optimized for Intel may not run
as efficiently (or at all) on these two nodes.

Node ranges may also be given as node[117-118], but the brackets may
need to be escaped so they are not interpretted by the calling shell.

For example: --exclude=node\[117-118\]

Consider using the --exclude option to keep long runnning jobs from
these enhanced nodes:

GPU

node[101-104,119-124]

High RAM

node[109-112],node125

AMD CPU

node[117-118]

Generic RESource (gres)

Slurm allows for the definition of resources that cam be requested in
a generic way. The premise cluster has two "gres" resources defined.
Those resources are:

gpu

Four nodes are equiped with a NVidia K80 GPUs. These appear
as two individual GPU chips, each of which is equivolent to a
slightly under clocked NVidia K40 GPU.

Regardless of which physical GPU you are allocated they
will are referenced as if they are the only GPUs on the
system starting at 0. If you request a single GPU and are
allocated the second GPU you must access it as GPU 0 in your
code. If you request both "K40" GPUs then your code will have
access to both 0 and 1.

Please note that the GPU node purchaser has priority on these
nodes and keeps them busy most of the time. Wait times
for this resource is expected to be significant.

ramdisk

Four nodes have been configured to allow jobs to make use of
available ram as a disk. The RAM on each node is limited so
only the high memomry nodes offer this resource. We have
chosen to not automatically delete the contents when a job
finishes, so it is important that upon normal exit your jobs clean
up after themselves (and that you manually clean up after them when
they exit abnormally).

To request a gres resource you use the --gres argument to srun or
sbatch with a value that specifies the generic resource you are
requesting. Resource quantity defaults to 1, but can be explicit like "--gres=gpu:2".

Example gres gpu request

sbatch--gres=gpu:1myscript.csh

The myscript.csh should run the desired code as if there was only
one GPU on the system. Most likely this will be the default for most
GPU utilizing codes, but you may need to reference the GPU as index 0.

Non-exclusive node usage

Jobs are given exclusive use of an entire node by default. RCC
has configured slurm to allow users to share a node by opting in with
the "--share" option. Unfortunately the "--share" option is not listed
by "sbatch --help".

When memory is unspecified it defaults to the total amount of RAM on
the node. For slurm to know how much available memory remains you
must specify the memory needed in MB (--mem=32).

CPU cores are similar to memory and you are given everything available
on the node by default. For slurm to know the remaining cpu resources
you must specify your job cpu core needs (-c 6).

Non-exclusive active nodes with sufficient resources are allocated
before idle nodes. This means more efficient use of the HPC cluster,
since two or more jobs will share a single node and leave more nodes
available.

Import notes:

Exceeding the specified memory cancels your running job.

Your job is restricted to the cpu cores specified. Slowed down but not
cancelled.