Job Sizing

CX2 is intended for large multinode parallel jobs, in particular those that are parallelised with MPI and perform significant communication between processes.

As on CX1, all jobs should be submitted with a #PBS resource request of the form:

#PBS -lselect=N:ncpus=X:mem=Ygb

#PBS -lwalltime=HH:MM:SS

Where N is the number of node, and X and Y are the number of cores and amount of memory per node.

Additionally the optional parameters mpiprocs=Z and ompthreads=W may be added. These specify the number of MPI ranks per node and the number of OpenMP threads per MPI rank and are only required for hybrid MPI/threaded jobs. Z x W should equal or be less than ncpus X.

The following job geometries are supported:

Job class

Number of nodes N

ncpus/node

Max mem/node

Max walltime/hr

Number of concurrently running jobs per user

short

1-18

24 or 48

120GB

2hr

2

general

2-18

16 or 32

62GB

72hr

3

large

18-72

24 or 48

120GB

48hr

2

capability

72-270

28 or 56

120GB

24hr

1

As on CX1, you should not submit to a specific queue directly. PBS will automatically route the job based on the resource request.

The short class is for testing jobs.

The general class is for small parallel jobs. If you are coming to CX2 from CX1, this is where you should initially target your jobs.

The large class is for big parallel jobs. You should only run jobs of this size if you understand the parallel efficiency of your code.

The capability class is for the very largest parallel jobs. Only run here if you are really sure you need it, generally as a precursor to using Archer

Choosing the number of CPUs/node for large jobs

The large queue is backed by a mix of 24 core and 28 core nodes. We strongly recommend that you always specify ncpus=24. This gives PBS the most flexibility in scheduling the job, as it can be allocated to nodes of either core count. Although the additional 4 cores will be left idle, the job turnaround time is likely to significantly be improved through reduction in queue time.

If you wish to always run on nodes of the same core count, use the cpumodel resource to indicate the requirement, eg ncpus=24:cpumodel=24 or ncpus=28:cpumodel=28. Note that by being more prescriptive about placement the queue time is likely to increase.

Each node has 16, 24 or 28 physical cores, depending on the CPU model. Each physical core presents as two logical cores. Using all logical cores may improve the performance of some applications, particularly hybrid MPI/threaded ones. Test by requesting ncpus=32,48,56 and tailoring mpiprocs and ompthreads as described above.

Checking the availability of resources

Run the command availability for an instantaneous view of what resources are available for immediate execution of each job class. Note that jobs so-sized may still remain queued if the jobs/user concurrency limit is reached.

Estimating job start times

The command qstat -w -T can be used to show projected start times for jobs. These are estimates, not guarantees. Use them for guidance but do not rely on them. They may in some cases be quite inaccurate.