Queue System Concepts

This page explains some of the core concepts of the Slurm queue system.

For an overview of the Slurm concepts, Slurm has a beginners guide:
Quick Start.

Partition

The nodes on a cluster is divided into sets, called partitions. The
partitions can be overlapping.

Several job types on our clusters are implemented as
partitions, meaning that one specifies --partition to select job
type -- for instance bigmem, accel and optimist.

QoS - Quality of Service

A QoS is a way to assign properties and limitations to jobs. It can
be used to give jobs different priority, and add or change the
limitations on the jobs, for instance the size or lenght of jobs, or
the number of jobs running at one time.

Several job types on our clusters are implemented as a
QoS, meaning that one specifies --qos to select job type -- for
instance preproc, develshort. The jobs will then (by default)
run in the standar (normal) partition, but have different
properties.

Account

An account is an entity that can be assigned a quota for resource
usage. All jobs run in an account, and the job's usage is subtracted
from the account's quota.

Accounts can also have restrictions, like how man jobs can run in it
at the same time, or which reservations its jobs can use.

On our cluster, each project has its own account, with the same name
"nnXXXXk". Some projects also have an account "nnXXXXo" for running
optimist jobs. We use accounts mainly for accounting resource
usage.

Jobs

Jobs are submitted to the job queue, and starts running on assigned
compute nodes when there are enough resources available.

Job step

A job is divided into one or more job steps. Each time a job runs
srun or mpirun, a new job step is created. Job steps are
normally executed sequentially, one after each
other. In addition to these, the batch job script itself, which runs
on the first of the allocated nodes, is considered a job step
(batch).

The first line here is the job allocation. Then comes the job script
step (batch), and an artificial step that we can ignore here
(extern), and finally a job step corresponding to an mpirun or
srun (step 0). Further steps would be numbered 1, 2, etc.

Tasks

Each job step starts one or more tasks, which corresponds to
processes. So for instance the processes (mpi ranks) in an mpi job
step are tasks. This is why one specifies --ntasks etc in job
scripts to select the number of processes to run in an mpi job.

Each task in a job step is started at the same time, and they run in
parallel on the nodes of the job. srun and mpirun will take care
of starting the right number of processes on the right nodes.

(Unfortunately, Slurm also calls the individual instances of an array
job for array tasks.)