Resource Requests

Aside from the time, RAM, and CPU requirements listed on the SlurmBasics page, we have a couple other requestable resources:

Valid gres options are:
gpu[[:type]:count]
fabric[[:type]:count]

Generally, if you don't know if you need a particular resource, you should use the default. These can be generated with the command

srun --gres=help

Fabric

We currently offer 3 "fabrics" as request-able resources in Slurm. The "count" specified is the line-rate (in Gigabits-per-second) of the connection on the node.

Infiniband

First of all, let me state that just because it sounds "cool" doesn't mean you need it or even want it. InfiniBand does absolutely no good if running on a single machine. InfiniBand is a high-speed host-to-host communication fabric. It is (most-often) used in conjunction with MPI jobs (discussed below). Several times we have had jobs which could run just fine, except that the submitter requested InfiniBand, and all the nodes with InfiniBand were currently busy. In fact, some of our fastest nodes do not have InfiniBand, so by requesting it when you don't need it, you are actually slowing down your job. To request Infiniband, add --gres=fabric:ib:1 to your sbatch command-line.

ROCE

ROCE, like InfiniBand is a high-speed host-to-host communication layer. Again, used most often with MPI. Most of our nodes are ROCE enabled, but this will let you guarantee the nodes allocated to your job will be able to communicate with ROCE. To request ROCE, add --gres=fabric:roce:1 to your sbatch command-line.

Ethernet

Ethernet is another communication fabric. All of our nodes are connected by ethernet, this is simply here to allow you to specify the interconnect speed. Speeds are selected in units of Gbps, with all nodes supporting 1Gbps or above. The currently available speeds for ethernet are: 1, 10, 40, and 100. To select nodes with 40Gbps and above, you could specify --gres=fabric:eth:40 on your sbatch command-line.

CUDA

CUDA is the resource required for GPU computing. We have a very small number of nodes which have GPUs installed. To request one of these gpus on of of these nodes, add --gres=gpu:1 to your sbatch command-line.

Parallel Jobs

There are two ways jobs can run in parallel, intranode and internode. Note: Beocat will not automatically make a job run in parallel. Have I said that enough? It's a common misperception.

Intranode jobs

Intranode jobs are easier to code and can take advantage of many common libraries, such as OpenMP, or Java's threads. Many times, your program will need to know how many cores you want it to use. Many will use all available cores if not told explicitly otherwise. This can be a problem when you are sharing resources, as Beocat does. To request multiple cores, use the sbatch directives '--cpus-per-task=n', where n is the number of cores you wish to use. If your command can take an environment variable, you can use $SLURM_CPUS_ON_NODE to tell how many cores you've been allocated.

Internode (MPI) jobs

"Talking" between nodes is trickier that talking between cores on the same node. The specification for doing so is called "Message Passing Interface", or MPI. We have OpenMPI installed on Beocat for this purpose. Most programs written to take advantage of large multi-node systems will use MPI. You can tell if you have an MPI-enabled program because its directions will tell you to run 'mpirun program'. Requesting MPI resources is only mildly more difficult than requesting single-node jobs. Instead of using '--cpus-per-task=n', you would use '--nodes=n --tasks-per-node=m' or '--ntasks=o' for your sbatch request, where n is the number of nodes you want, m is the number of cores per node you need, and o is the total number of cores you need.

Some quick examples:

--nodes=4 --ntasks-per-node=4 will give you 4 chunks of 4 cores apiece, for a total of 16 cores.

--ntasks=40 will give you 40 cores, on any number of nodes.

--ntasks=100 will give you 100 cores, on any number of nodes.

Requesting memory for multi-core jobs

Memory requests are easiest when they are specified per core. For instance, if you specified the following: '--tasks=20 --mem-per-core=20G', your job would have access to 400GB of memory total.

Other Handy Slurm Features

Email status changes

One of the most commonly used options when submitting jobs not related to resource requests is to have have Slurm email you when a job changes its status. This takes may need two directives to sbatch: --mail-user and --mail-type.

--mail-type

--mail-type is used to tell Slurm to notify you about certain conditions. Options are comma separated and include the following

Option

Explanation

NONE

This disables event-based mail

BEGIN

Sends a notification when the job begins

END

Sends a notification when the job ends

FAIL

Sends a notification when the job fails.

REQUEUE

Sends a notification if the job is put back into the queue from a running state

STAGE_OUT

Burst buffer stage out and teardown completed

ALL

Equivalent to BEGIN,END,FAIL,REQUEUE,STAGE_OUT

TIME_LIMIT

Notifies if the job ran out of time

TIME_LIMIT_90

Notifies when the job has used 90% of its allocated time

TIME_LIMIT_80

Notifies when the job has used 80% of its allocated time

TIME_LIMIT_50

Notifies when the job has used 50% of its allocated time

ARRAY_TASKS

Modifies the BEGIN, END, and FAIL options to apply to each array task (instead of notifying for the entire job

--mail-user

--mail-user is optional. It is only needed if you intend to send these job status updates to a different e-mail address than what you provided in the Account Request Page. It is specified with the following arguments to sbatch: --mail-user=someone@somecompany.com

Job Naming

If you have several jobs in the queue, running the same script with different parameters, it's handy to have a different name for each job as it shows up in the queue. This is accomplished with the '-J JobName' sbatch directive.

Separating Output Streams

Normally, Slurm will create one output file, containing both STDERR and STDOUT. If you want both of these to be separated into two files, you can use the sbatch directives '--output' and '--error'.

option

default

example

--output

slurm-%j.out

slurm-206.out

--error

slurm-%j.out

slurm-206.out

%j above indicates that it should be replaced with the job id.

Running from the Current Directory

By default, jobs run from your home directory. Many programs incorrectly assume that you are running the script from the current directory. You can use the '-cwd' directive to change to the "current working directory" you used when submitting the job.

Running in a specific class of machine

If you want to run on a specific class of machines, e.g., the Dwarves, you can add the flag "--constraint=dwarves" to select any of those machines.

Processor Constraints

Because Beocat is a heterogenous cluster (we have machines from many years in the cluster), not all of our processors support every new and fancy feature. You might have some applications that require some newer processor features, so we provide a mechanism to request those.

--contraint tells the cluster to apply constraints to the types of nodes that the job can run on. For instance, we know of several applications that must be run on chips that have "AVX" processor extensions. To do that, you would specify --constraint=avx on you sbatchorsrun command lines.

Slurm Environment Variables

Within an actual job, sometimes you need to know specific things about the running environment to setup your scripts correctly. Here is a listing of environment variables that Slurm makes available to you. Of course the value of these variables will be different based on many different factors.

Sometimes it is nice to know what hosts you have access to during a job. You would checkout the SLURM_JOB_NODELIST to know that. There are lots of useful Environment Variables there, I will leave it to you to identify the ones you want.

Some of the most commonly-used variables we see used are $SLURM_CPUS_ON_NODE, $HOSTNAME, and $SLURM_JOB_ID.

Running from a sbatch Submit Script

No doubt after you've run a few jobs you get tired of typing something like 'sbatch -l mem=2G,h_rt=10:00 -pe single 8 -n MyJobTitle MyScript.sh'. How are you supposed to remember all of these every time? The answer is to create a 'submit script', which outlines all of these for you. Below is a sample submit script, which you can modify and use for your own purposes.

#!/bin/bash## A Sample sbatch script created by Kyle Hutson#### Note: Usually a '#" at the beginning of the line is ignored. However, in## the case of sbatch, lines beginning with #SBATCH are commands for sbatch## itself, so I have taken the convention here of starting *every* line with a## '#', just Delete the first one if you want to use that line, and then modify## it to your own purposes. The only exception here is the first line, which## *must* be #!/bin/bash (or another valid shell).## Specify the amount of RAM needed _per_core_. Default is 1G##SBATCH --mem-per-cpu=1G## Specify the maximum runtime. Default is 1 hour (1:00:00)##SBATCH --time=1:00:00## Require the use of infiniband. If you don't know what this is, you probably## don't need it.##SBATCH --gres=fabric:ib:1## GPU directive. If You don't know what this is, you probably don't need it##SBATCH --gres:gpu:1## number of cores/nodes:## quick note here. Jobs requesting 16 or fewer cores tend to get scheduled## fairly quickly. If you need a job that requires more than that, you might## benefit from emailing us at beocat@cs.ksu.edu to see how we can assist in## getting your job scheduled in a reasonable amount of time. Default is##SBATCH --cpus-per-task=1##SBATCH --cpus-per-task=12##SBATCH --nodes=2 --tasks-per-node=1##SBATCH --tasks=20## Constraints for this job. Maybe you need to run on the elves##SBATCH --constraint=elves## or perhaps you just need avx processor extensions##SBATCH --constraint=avx## Output file name. Default is slurm-%j.out where %j is the job id.##SBATCH --output=MyJobTitle.o%j## Split the errors into a seperate file. Default is the same as output##SBATCH --error=MyJobTitle.e%j## Name my job, to make it easier to find in the queue##SBATCH -J MyJobTitle## And finally, we run the job we came here to do.## $HOME/ProgramDir/ProgramName ProgramArguments## OR, for the case of MPI-capable jobs## mpirun $HOME/path/MpiJobName## Send email when certain criteria are met.## Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL (equivalent to## BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage## out and teardown completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent## of time limit), TIME_LIMIT_80 (reached 80 percent of time limit),## TIME_LIMIT_50 (reached 50 percent of time limit) and ARRAY_TASKS (send## emails for each array task). Multiple type values may be specified in a## comma separated list. Unless the ARRAY_TASKS option is specified, mail## notifications on job BEGIN, END and FAIL apply to a job array as a whole## rather than generating individual email messages for each task in the job## array.##SBATCH --mail-type=ALL## Email address to send the email to based on the above line.## Default is to send the mail to the e-mail address entered on the account## request form.##SBATCH --mail-user myemail@ksu.edu

File Access

Beocat has a variety of options for storing and accessing your files.
Every user has a home directory for general use which is limited in size, has decent file access performance,
and will soon be backed up nightly. Larger files should be stored in the /bulk subdirectories which have the same decent performance
but are not backed up. The /scratch file system will soon be implemented on a Lustre file system that will provide very fast
temporary file access. When fast IO is critical to the application performance, access to the local disk on each node or to a
RAM disk are the best options.

Home directory

Every user has a /homes/username directory that they drop into when they log into Beocat.
The home directory is for general use and provides decent performance for most file IO.
Disk space in each home directory is limited to 1 TB, so larger files should be kept in the /bulk
directory, and there is a limit of 100,000 files in each subdirectory in your account.
This file system is fully redundant, so 3 specific hard disks would need to fail before any data was lost.
All files will soon be backed up nightly to a separate file server in Nichols Hall, so if you do accidentally
delete something it can be recovered.

Bulk directory

Each user also has a /bulk/username directory where large files should be stored.
File access is the same speed as for the home directories, and the same limit of 100,000 files
per subdirectory applies. There is no limit to the disk space you can use in your bulk directory,
but the files there will not be backed up. They are still redundantly stored so you don't need to
worry about losing data to hardware failures, just don't delete something by accident. Unused files will be automatically removed after two years.
If you need to back up large files in the bulk directory, talk to Dan Andresen (dan@ksu.edu) about
purchasing some hard disks for archival storage.

Scratch file system

The /scratch file system will soon be using the Lustre software which is much faster than the
speed of the file access on /homes or /bulk. In order to use scratch, you first need to make a
directory for yourself. Scratch offers greater speed, no limit to the size of files nor the number
of files in each subdirectory. It is meant as temporary space for prepositioning files and accessing them
during runs. Once runs are completed, any files that need to be kept should be moved to your home
or bulk directories since files on the scratch file system get purged after 30 days. Lustre is faster than
the home and bulk file systems in part because it does not redundantly store files by striping them
across multiple disks, so if a hard disk fails data will be lost. When we get scratch set up to use Lustre
we will post the difference in file access rates.

mkdir /scratch/$USER

Local disk

If you are running on a single node, it may also be faster to access your files from the local disk
on that node. This can be done conveniently using the environment variable $TMPDIR which
is set to point to a subdirectory on /tmp set up for each job. You may need to copy files to
local disk at the start of your script, or set the output directory for your application to point
to a file on the local disk, then you'll need to copy any files you want off the local disk before
the job finishes since Slurm will remove all files in your job's directory on /tmp on completion
of the job or when it aborts. When we get the scratch file system working with Lustre, it may
end up being faster than accessing local disk so we will post the access rates for each. Most nodes
have around 600 GB of file space accessible on the local disk.

# Copy input files to the tmp directory if needed
cp $input_files $TMPDIR
# Make an 'out' directory to pass to the app if needed
mkdir $TMPDIR/out
# Example of running an app and passing the tmp directory in/out
app -input_directory $TMPDIR -output_directory $TMPDIR/out
# Copy the 'out' directory back to the current working directory after the run
cp -rp $TMPDIR/out .

RAM disk

If you need ultrafast access to files, you can use a RAM disk which is a file system set up in the
memory of the compute node you are running on. The RAM disk is limited to the requested memory on that node, so you should account for this usage when you request
memory for your job. Below is an example of how to use the RAM disk.

# Copy input files over if necessary
cp $any_input_files /dev/shm/
# Run the application, possibly giving it the path to the RAM disk to use for output files
app -output_directory /dev/shm/
# Copy files from the RAM disk to the current working directory and clean it up
cp /dev/shm/* .

When you leave KSU

If you are done with your account and leaving KSU, please clean up your directory, move any files
to your supervisor's account that need to be kept after you leave, and notify us so that we can disable your
account. The easiest way to move your files to your supervisor's account is for them to set up
a subdirectory for you with the appropriate write permissions. The example below shows moving
just a user's 'data' subdirectory to their supervisor. The 'nohup' command is used so that the move will
continue even if the window you are doing the move from gets disconnected.

Array Jobs

One of Slurm's useful options is the ability to run "Array Jobs"

It can be used with the following option to sbatch.

--array=n[-m[:s]]
Submits a so called Array Job, i.e. an array of identical tasks being differentiated only by an index number and being treated by Slurm
almost like a series of jobs. The option argument to --arrat specifies the number of array job tasks and the index number which will be
associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SLURM_ARRAY_TASK_ID. The option
arguments n, and m will be available through the environment variables SLURM_ARRAY_TASK_MIN and SLURM_ARRAY_TASK_MAX.
The task id range specified in the option argument may be a single number, a simple range of the form n-m or a range with a step size.
Hence, the task id range specified by 2-10:2 would result in the task id indexes 2, 4, 6, 8, and 10, for a total of 5 identical tasks, each
with the environment variable SLURM_ARRAY_TASK_ID containing one of the 5 index numbers.
Array jobs are commonly used to execute the same type of operation on varying input data sets correlated with the task index number. The
number of tasks in a array job is unlimited.
STDOUT and STDERR of array job tasks follow a slightly different naming convention (which can be controlled in the same way as mentioned above).
slurm-%A_%a.out

%A is the SLURM_ARRAY_JOB_ID, and %a is the SLURM_ARRAY_TASK_ID

Examples

Change the Size of the Run

Array Jobs have a variety of uses, one of the easiest to comprehend is the following:

I have an application, app1 I need to run the exact same way, on the same data set, with only the size of the run changing.

Not only is this needlessly complex, it is also slow, as sbatch has to verify each job as it is submitted. This can be done easily with array jobs, as long as you know the number of lines in the dataset. This number can be obtained like so: wc -l dataset.txt in this case lets call it 5000.

This uses a subshell via `, and has the sed command print out only the line number $SLURM_ARRAY_TASK_ID out of the file dataset.txt.

Not only is this a smaller script, it is also faster to submit because it is one job instead of 5000, so sbatch doesn't have to verify as many.

To give you an idea about time saved: submitting 1 job takes 1-2 seconds. by extension if you are submitting 5000, that is 5,000-10,000 seconds, or 1.5-3 hours.

Running jobs interactively

Some jobs just don't behave like we think they should, or need to be run with somebody sitting at the keyboard and typing in response to the output the computers are generating. Beocat has a facility for this, called 'srun'. srun uses the exact same command-line arguments as sbatch, but you need to add the following arguments at the end: --pty bash. If no node is available with your resource requirements, srun will tell you something like the following:

Note that, like sbatch, your interactive job will timeout after your allotted time has passed.

Altering Job Requests

We generally do not support users to modify job parameters once the job has been submitted. It can be done, but there are numerous catches, and all of the variations can be a bit problematic; it is normally easier to simply delete the job and resubmit it with the right parameters. If your job doesn't start after modifying such parameters (after a reasonable amount of time), delete the job and resubmit it.

As it is unsupported, this is an excercise left to the reader. A starting point is man scontrol

Killable jobs

There are a growing number of machines within Beocat that are owned by a particular person or group. Normally jobs from users that aren't in the group designated by the owner of these machines cannot use them. This is because we have guaranteed that the nodes will be accessible and available to the owner at any given time. We will allow others to use these nodes if they designate their job as "killable." If your job is designated as killable, your job will be able to use these nodes, but can (and will) be killed off at any point in time to make way for the designated owner's jobs. Jobs that are marked killable will be re-queued and may restart on another node.

The way you would designate your job as killable is to add -p killable.q to the sbatch or srun arguments. This could be either on the command-line or in your script file.

Note: This is a submit-time only request, it cannot be added by a normal user after the job has been submitted. If you would like jobs modified to be killable after the jobs have been submitted (and it is too much work to qdel the jobs and re-submit), send an e-mail to the administrators detailing the job ids and what you would like done.

Scheduling Priority

The scheduler uses a complex formula to determine the order that jobs get scheduled in. Jobs in general get run in the order that they are submitted to the queue with the following exceptions. Jobs for users in a group that owns nodes will immediately get scheduled on those nodes even if that means bumping existing jobs off. Users in groups that have contributed funds to Beocat may have higher scheduling priority. You can check the base scheduling priority of each group using qconf -sst. If you do not have a group your jobs are scheduled using BEODEFAULT. The higher the priority, the faster your job will be moved to the front of the queue. A fair scheduling algorithm adjusts this scheduling priority down as users in that group submit more jobs.

Since all users not in a group having higher priority get put into BEODEFAULT, the priority is always very low and each job gets scheduled in the order it was submitted. Groups with a higher priority may jump ahead of the BEODEFAULT jobs, but if these groups are submitting lots of jobs their priority will become low as well. Groups with the highest priority that are submitting the fewest jobs may see those jobs moved to the front of the queue quickly.

When processing cores become available, the scheduler looks at the head of the queue to find jobs that will fit within the resources available. Shorter jobs of 12 hours or less get marked as killable and will be run on nodes owned by other groups. These jobs will jump past longer jobs when resources become available on owned nodes. Many jobs in the queue may require more memory than is available on some nodes, so smaller memory jobs will be scheduled ahead of larger memory jobs on hosts with more limited memory. kstat -q will show you the order in the queue and allow you to see jobs marked as "killable" and those that require large memory.

Job Accounting

Some people may find it useful to know what their job did during its run. The sacct tool will read Slurm's accounting database and give you summarized or detailed views on jobs that have run within Beocat.

sacct

This data can usually be used to diagnose two very common job failures.

Job debugging

It is simplest if you know the job number of the job you are trying to get information on.

# if you know the jobid, put it here:
sacct -j 1122334455 -l
# if you don't know the job id, you can look at your jobs started since some day:
sacct -S 2017-01-01

My job didn't do anything when it ran!

JobID

JobIDRaw

JobName

Partition

MaxVMSize

MaxVMSizeNode

MaxVMSizeTask

AveVMSize

MaxRSS

MaxRSSNode

MaxRSSTask

AveRSS

MaxPages

MaxPagesNode

MaxPagesTask

AvePages

MinCPU

MinCPUNode

MinCPUTask

AveCPU

NTasks

AllocCPUS

Elapsed

State

ExitCode

AveCPUFreq

ReqCPUFreqMin

ReqCPUFreqMax

ReqCPUFreqGov

ReqMem

ConsumedEnergy

MaxDiskRead

MaxDiskReadNode

MaxDiskReadTask

AveDiskRead

MaxDiskWrite

MaxDiskWriteNode

MaxDiskWriteTask

AveDiskWrite

AllocGRES

ReqGRES

ReqTRES

AllocTRES

218

218

slurm_simple.sh

batch.q

12

00:00:00

FAILED

2:0

Unknown

Unknown

Unknown

1Gn

cpu=12,mem=1G,node=1

cpu=12,mem=1G,node=1

218.batch

218.batch

batch

137940K

dwarf37

0

137940K

1576K

dwarf37

0

1576K

0

dwarf37

0

0

00:00:00

dwarf37

0

00:00:00

1

12

00:00:00

FAILED

2:0

1.36G

0

0

0

1Gn

0

0

dwarf37

65534

0

0.00M

dwarf37

0

0.00M

cpu=12,mem=1G,node=1

218.0

218.0

qqqqstat

204212K

dwarf37

0

204212K

1420K

dwarf37

0

1420K

0

dwarf37

0

0

00:00:00

dwarf37

0

00:00:00

1

12

00:00:00

FAILED

2:0

196.52M

Unknown

Unknown

Unknown

1Gn

0

0

dwarf37

65534

0

0.00M

dwarf37

0

0.00M

cpu=12,mem=1G,node=1

If you look at the columns showing Elapsed and State, you can see that they show 00:00:00 and FAILED respectively. This means that the job started and then promptly ended. This points to something being wrong with your submission script. Perhaps there is a typo somewhere in it.

My job ran but didn't finish!

JobID

JobIDRaw

JobName

Partition

MaxVMSize

MaxVMSizeNode

MaxVMSizeTask

AveVMSize

MaxRSS

MaxRSSNode

MaxRSSTask

AveRSS

MaxPages

MaxPagesNode

MaxPagesTask

AvePages

MinCPU

MinCPUNode

MinCPUTask

AveCPU

NTasks

AllocCPUS

Elapsed

State

ExitCode

AveCPUFreq

ReqCPUFreqMin

ReqCPUFreqMax

ReqCPUFreqGov

ReqMem

ConsumedEnergy

MaxDiskRead

MaxDiskReadNode

MaxDiskReadTask

AveDiskRead

MaxDiskWrite

MaxDiskWriteNode

MaxDiskWriteTask

AveDiskWrite

AllocGRES

ReqGRES

ReqTRES

AllocTRES

220

220

slurm_simple.sh

batch.q

1

00:01:27

TIMEOUT

0:0

Unknown

Unknown

Unknown

1Gn

cpu=1,mem=1G,node=1

cpu=1,mem=1G,node=1

220.batch

220.batch

batch

370716K

dwarf37

0

370716K

7060K

dwarf37

0

7060K

0

dwarf37

0

0

00:00:00

dwarf37

0

00:00:00

1

1

00:01:28

CANCELLED

0:15

1.23G

0

0

0

1Gn

0

0.16M

dwarf37

0

0.16M

0.00M

dwarf37

0

0.00M

cpu=1,mem=1G,node=1

220.0

220.0

sleep

204212K

dwarf37

0

107916K

1000K

dwarf37

0

620K

0

dwarf37

0

0

00:00:00

dwarf37

0

00:00:00

1

1

00:01:27

CANCELLED

0:15

1.54G

Unknown

Unknown

Unknown

1Gn

0

0.05M

dwarf37

0

0.05M

0.00M

dwarf37

0

0.00M

cpu=1,mem=1G,node=1

If you look at the column showing State, we can see some pointers to the issue. The job ran out of time (TIMEOUT) and then was killed (CANCELLED).

JobID

JobIDRaw

JobName

Partition

MaxVMSize

MaxVMSizeNode

MaxVMSizeTask

AveVMSize

MaxRSS

MaxRSSNode

MaxRSSTask

AveRSS

MaxPages

MaxPagesNode

MaxPagesTask

AvePages

MinCPU

MinCPUNode

MinCPUTask

AveCPU

NTasks

AllocCPUS

Elapsed

State

ExitCode

AveCPUFreq

ReqCPUFreqMin

ReqCPUFreqMax

ReqCPUFreqGov

ReqMem

ConsumedEnergy

MaxDiskRead

MaxDiskReadNode

MaxDiskReadTask

AveDiskRead

MaxDiskWrite

MaxDiskWriteNode

MaxDiskWriteTask

AveDiskWrite

AllocGRES

ReqGRES

ReqTRES

AllocTRES

221

221

slurm_simple.sh

batch.q

1

00:00:00

CANCELLED by 0

0:0

Unknown

Unknown

Unknown

1Mn

cpu=1,mem=1M,node=1

cpu=1,mem=1M,node=1

221.batch

221.batch

batch

137940K

dwarf37

0

137940K

1144K

dwarf37

0

1144K

0

dwarf37

0

0

00:00:00

dwarf37

0

00:00:00

1

1

00:00:01

CANCELLED

0:15

2.62G

0

0

0

1Mn

0

0

dwarf37

65534

0

0

dwarf37

65534

0

cpu=1,mem=1M,node=1

If you look at the column showing State, we see it was "CANCELLED by 0", then we look at the AllocTRES column to see our allocated resources, and see that 1MB of memory was granted. Combine that with the column "MaxRSS" and we see that the memory granted was less than the memory we tried to use, thus the job was "CANCELLED".