Job Scripts

The HPC machines run applications as jobs in the queuing system which schedules the tasks and process to run on compute nodes. Machines in the Metacenter do currently use the Slurm Workload Manager. There is an example script found at Sample MPI Batch Script.

For more information about the queue system and how to set the proper priorities, visit the Queue System page.

Creating Job Scripts

A job script is a script file that contains commands to be executed by Slurm or
environment variables. The start of the file should specify the shell environment.

#!/bin/bash

It can contain a mixture of shell commands, variables, and sbatch settings.

Submitting Job Scripts

To submit a job, use the sbatch command followed by the batch script.

sbatch <scriptfile>

The sbatch command returns a jobid, an id number that identifies the submitted job. The job will be waiting in the job queue until there are free compute resources it can use. A job in that state is said to be pending (PD). When it has started, it is called running (R). Any output (stdout or stderr) of the job script will be written to a file called slurm-jobid.out in the directory where you ran sbatch, unless otherwise specified.

All commands in the job script are performed on the compute-node(s) allocated by the queue system. The script also specifies a number of requirements (memory usage, number of CPUs, run-time, etc.), used by the queue system to find one or more suitable machines for your job.

To see the status of the submitted job, use the command pending and for related commands, read the Managing Jobs

sbatch variables

sbatch variables begin with #SBATCH and commands are entered verbatim as
running in a shell environment. sbatch variables set up information about the job
such as account information and priority levels.

All batch scripts must contain the following sbatch variables:

#Project name as seen in the queue
#SBATCH --account=<project_name>
#Application running time. The wall clock time helps the scheduler assess priorities and tasks.
#SBATCH --time=<wall_clock_time>
#Memory usage per core
#SBATCH --mem-per-cpu=<size_megabytes>

--account - specifies the project the job will run in.

--time - specifies the maximal wall clock time of the job. Do not specify the limit too low, because the job will be killed if it has not finished within this time. On the other hand, shorter jobs will be started sooner, so do not specify longer than you need. The default maximum allowed --time specification is 1 week; see details here.

--mem-per-cpu - specifies how much RAM each task (default is 1 task) of the job needs.

Here are a couple of examples for how to specify nodes and tasks for some
special cases: