Introduction

When you first log in, you will be directed to one of a small number of login nodes. These allow regular command line access to the system which is necessary for the setup of runs, compiling code and some analysis work. Login nodes are shared among all who are logged in and therefore they can very quickly become overloaded.

The compute power behind the system is accessible through the scheduler, a batch submission system. When a job executes through the batch system, processors on the back-end are made available exclusively for the purposes of running the job.

The batch queue system installed is Son of Grid Engine, plus locally developed and implemented patches.

To interact with the batch system the user must request resources that are sufficient for their needs. At a minimum these are:

how long the job needs to run for

on how many processors (assumed 1 unless otherwise told)

With this information, the scheduler is able to dispatch the jobs at some point in the future when the resources become available. A fair-share policy is in operation to guide the scheduler towards allocating resources fairly between different faculties.

This fair-share policy takes into account both an individual user’s past usage and usage of a faculty as a whole. Essentially, this means that a user with recent (the last 7 days) heavy usage will have their jobs reduced in priority to allow other user’s jobs to run.

Faculty shares are allocated on the basis of funding; faculties do not have equal shares of system capacity.

Resource Reservation and Backfill

By default all jobs are eligible for resource reservation, in that the scheduler will ensure the highest priority jobs will have their start times booked in the future. The
qsched-a command can be used to generate a list of the anticipated start times of these jobs. At the moment, only the top 128 jobs are considered for resource reservation. The system will backfill jobs if they will start and finish before the highest priority jobs are scheduled to start. Therefore indicating a realistic runtime for a job (rather than the queue maximum) will make short jobs eligible to be backfilled, potentially shortening their wait-time.

There is also a facility to book an amount of HPC resource for some time in the future, through advance reservation. Jobs eligible to run in that reservation can then be submitted to run within it. Advance reservation is not enabled for users by default, however these reservations can be enabled upon request provided there is a valid case for their use and the fairness policies allow it.

Queue Configuration

Currently the facility is configured with a single general access queue, allowing submission to all available compute resources. Thus, there is no need to specify a queue name in job submissions.

Time Limits

Jobs requesting a time up to the maximum runtime of the queue are eligible to be run. At the moment the maximum runtime is 48 hours.

Should a job run beyond the length of time requested, it will be killed by the queuing system. To change the time requested by a batch job, change the time specified in the
-lh_rtflag e.g.:

1

2

3

$qsub-lh_rt=6:00:00script.sh

Will request six hours of runtime.

Memory Usage

In order that programs do not compete for the available memory in a machine, memory usage is consumable. This helps ensure that if one job is consuming 100GB memory on a node that has total of 128GB memory, the maximum total size of all other jobs which are allowed to execute on that system is 28GB.

By default, a 1GB per process (or 1GB per slot) limit is defined for all batch jobs. To override this behaviour use the
-lh_vmem switch to
qsub . E.g. to run a 1 process code using 6GB of memory for 6 hours:

1

2

3

$qsub-lh_vmem=6G-lh_rt=6:00:00script.sh

As memory is specified per slot:

1

2

3

$qsub-lh_vmem=2G-lh_rt=6:00:00-pe smp4script.sh

Will request a total of 8GB of memory, shared between 4 processes.

Jobs will be run on nodes, provided that the total memory requested per node does not exceed the physical memory of that node. Please note that if a job requests more memory than is physically available the job will not run though it will still show up in the queue. If an executing program exceeds the memory it requested, it will be automatically terminated by the queuing system.

Job Submission

The general command to submit a job with the qsub command is as follows:

1

2

3

$qsub[options]script_file_name[--script-args]

where
script_file_name is a file containing commands to executed by the batch request.

For commonly used options and more details about qsub please look at our Qsub page.

Submitting Shared-Memory Parallel Jobs

Shared memory parallel jobs are jobs that run multiple threads or processes on a single multi-core machine. For instance OpenMP programs are shared memory parallel jobs.

There is a shared memory parallel environment (pe) called smp that is set up to enable the submission of these type of jobs. The option needed to submit this type of job is:

1

2

3

-pe smp

For example:

1

2

3

$qsub-lh_rt=6:00:00-pe smp4script.sh

will request 4 processes in a shared memory processor running for 6 hours.

Distributed Parallel Jobs with the Node Syntax

This type of parallel job runs multiple processes over multiple processors, either on the same machine or more commonly over multiple machines.

A significant change made to the batch system on ARC2 is that in addition to the standard Grid Engine submission syntax, an alternative “nodes” syntax has also been implemented. This is designed to give jobs dedicated access to entire nodes. This should provide more predicable job performance, for instance due to placement and dedicated use of Infiniband cards as well as providing a more flexible specification of processes or threads for mixed-mode programming.