SGE Array Jobs

Rationale

Consider using array jobs whenever you might otherwise submit a
large number of jobs — likely auto-generated — to do the same
processing on a lot of individual input datasets or values. Examples
include image processing or rendering with one job for each frame in a
sequence, and parameter sweeps, with each job running the same
calculation with a different parameter set from a pre-defined
collection.

Array jobs effectively constitute a parallel loop over the input data
which can be manipulated as a unit by qsub, qdel, qhold, qalter, etc.
They have a single entry in qstat output when waiting (but an entry
for each running member of the array). This is easier on you than
generating and manipulating a large number of individual jobs, and
easier on the system, which doesn’t have to manage a very large number
of jobs.

Each instance of the job in the effective loop is called a task,
which has an index exported to the task’s environment for use in the
job script. The tasks may be parallel. Each has initially the same
SGE parameters, though qalter may be used to change some for waiting
tasks.

See qsub(1) for
details of array job options -t, -tc, -hold_jid, and
-hold_jid_ad in addition to the guidance below.

Basic Use

The job script for an array job is usually essentially a template in
which substitutions can be made on the basis of the task index
environment variable $SGE_TASK_ID.

Consider processing the contents of a collection of 1000 directories
with regular names of the form casen, with n running from
1–1000. Submitting the job as

qsub -t 1-1000 ... array.sh

asks for array.sh to be run as 1000 separate tasks and it might
contain something like:

This is equivalent to 1000 jobs, each executing commands from an
element of the sequence

cd case1; program <input
...
cd case1000; program <input

so processing data in case1, … case1000 and
putting the job output in output in that directory. See
qsub(1)
for details of substitutions in the values specified with
-e and -o; note the lack of an SGE_ prefix in those contexts.

Tasks are started in order of the array index, but you can’t make more
assumptions than that about when they execute; a task could be
rescheduled to start effectively after a later one by array index.

Note that the
sge_conf(5)
parameters max_aj_tasks and max_aj_instances control for each job
the maximum total number of tasks and the maximum number of concurrent
tasks.

Refinements and Clichés

Indexing Arithmetic

The task index is always 1-based, but you can always do arithmetic
with it if your data are essentially 0-based. In a POSIX-conformant
shell like bash(1), the expression $(($SGE_TASK_ID-1)) converts the
index to a 0-based one.

The array stride need not be 1. This is useful, for instance, to
unroll the loop when the tasks would be short compared with
overheads of running them:

Variables SGE_TASK_FIRST, SGE_TASK_LAST, and SGE_TASK_STEPSIZE
provide the task range and stride to the script.

Non-shell script Jobs

It isn’t necessary to use a shell script — the index is accessible in
the environment of a binary job, though it would probably need to be
an SGE-specific program, and a job script could be in a non-shell
language such as Python, accessing the environment in the
appropriate way. Here is a trivial example of constructing a file
name in Python in a two-task array job, assuming the cluster
configuration sets shell_start_mode to unix_behavior:

The -C argument of qsub might help if # isn’t a comment in
the script language.

Selecting from Lists in Files

You needn’t be restricted to simple template-like jobs. You can put
complex arguments or commands in a file and pull out a different line
each time, e.g. with the right number of commands to be executed
listed one/line in file. awk is convenient for this,
e.g. selecting a complete command line:

array task n of step2 will only run after task n of step1 has
finished, and task n of step3 will only run after tasks n and
n+1 of step2 has finished. Similarly, with

qsub -t 1:50:2 step1
qsub -t 1:50 -hold_jid_ad step1 step2

task n and n+1 of step2 depend on task n of step1, where n
is odd. NB ‘finished’ doesn’t mean ‘finished successfully’, so
subsequent tasks may need to check the results of previous ones.

In contrast, with

qsub -t 1:50 step1
qsub -t 1:50 -hold_jid step1 step2

tasks of step2 will only run after all those of step1 have finished.

Task Concurrency

The qsub-tc option can be used to restrict the number of tasks of
the job that run concurrently, usually to be socially conscious in not
dominating the cluster. You might use -tc 1 to ensure only one task
can run at once, e.g. for a series of tests that update some state in
the directory that you don’t care about but when it will cause
problems if more than one task does so. This might also be useful if
each task was dependent on results from the previous index, though
care would have to be taken in case a task failed.