Basics to know

A Slurm Batch is a file in which are defined the different elements necessary for the creation of a Job. It describes a request for resource allocation for the execution of one or more processes, and mainly contains the following information (not exhaustive) :

A Job Step represents a step or section of the processing performed by the Job. It executes one or more Tasks via the "srun" command. This division into Job Steps offers great flexibility in the organization of Job steps and the management and analysis of allocated resources :

- Steps can be executed sequentially or in parallel,

- a Step can initiate one or more Tasks executed in parallel,

- Steps are supported by sstat / sacct commands, allowing both tracking of the Step-by-Step progress of the Job during execution, and detailed resource usage statistics for each Step (during and after after execution).

A Task is a process to which the resources defined in the Batch are allocated by the option "--cpus-per-task". A Task can have these resources like any other process (creation of threads, sub-processes, possibly themselves multi-threaded).

This is the resource allocation unit of the Job. CPUs not used by a Task will be "lost", not usable by any other Task or Step. If the Task creates more processes / threads than allocated CPUs, these threads will share the allocation.

A Partition is a logical grouping of compute nodes. The OSIRIM Cluster is split into two separate partitions: "64CPUNodes" and "8CPUNodes", each composed of compute nodes of different size. This separation makes it possible to specialize and optimize each partition for a particular type of job.