The General Setup

Partitions - logical groupings of nodes that can be thought of as queues. normal, bigmem, optimist partitions.

QoS (Quality of Service). QoSes are administered to normal and bigmem
partitions. Special QoSes are devel, preproc.

Account "nn9999k" per project, with CPU hour quota. "nn9999x" are optimist jobs, with a QoSoptimist
and separate CPU hour quotas.

For an overview of the Slurm concepts, Slurm has an excellent beginners guide: Quick Start.

Job Types

Slurm accepts options to decide how to run applications. For example, setting the --partition option tells Slurm on which partition the job runs.
Mostly, the types correspond to the partition types, but other options give Slurm an idea about the resources applications need.

Project Environment

Jobs have access to temporary storage, backedup storage, environment variables, and
software modules (read Installed Software) for use during execution.

Work Directory

All jobs can use a work area ($SCRATCH and $USERWORK variables) that is created for the job and deleted
afterwards. The work area is mounted on a fast file system that can handle large amounts of input and output.

The directory where you ran sbatch is stored in the environment variable $SLURM_SUBMIT_DIR. This variable is suitable for
copying files relative to the current directory.

Prolog and Epilog

Slurm has several Prolog and Epilog programs that perform setup and cleanup
tasks, when a job or job step is run. On Fram, the Prolog and Epilog programs handle
the environment variables and cleanup of temporary files.

The idea is that once a job has been in the queue long enough to get a
reservation, no other job should delay it. But before it gets a reservation,
the queue system backfiller is free to start any other job that can start now,
even if that will delay the job.

1. Currently, only the priority of 10 jobs for each user within each project increase with time. As jobs start, more priorities start to increase. This is done in order to avoid problems if a user submits a large amount of jobs over a short time. Note that the limit is per user and project, so if a user has jobs in several projects, 10 of the user's jobs from each project will increase in priority at the same time. This limit might change in the future. ↩

Job Placement

The compute nodes on Fram are divided into four groups, or islands. The
network bandwidth within an island is higher than the throughput between
islands. Some jobs need high network throughput between its nodes, and will
usually run faster if they run within a single island. Therefore, the queue
system is configured to run each job within one island, if possible. See
Job Placement on Fram for details and for how this can
be overridden.