A suggestion for pbsdsh improvement:
pbsdsh allows processes to be launched on either:
(a) specified hosts in the job
(b) once for every allocated processors on every allocated node in the job
(c) all unique nodes in the job
I'd like to suggest an improvement to the (c) case. Some job programs
manage the number of processors to use on a given node (e.g. the Hadoop
task tracker). However, if you allocate only processors, not whole
nodes, then this can end up with too many processes running on a given
node, as assumptions are drawn on the number of allocated processors per
sister (e.g. my job asked for 12 processors. Nodes have 4 procs each,
but one nodes already had a single processor job running - how should
the spawned process know this?)
Instead, I'd like to propose that pbsdsh -u sets an environment variable
in the resulting spawn processes, detailing the number of allocated
processes. This should be fairly easy, as tm_spawn accepts an argument
to alter the target environment of the spawned process.
This is pulled from:
http://www.clusterresources.com/pipermail/torquedev/2009-June/001583.html