Hi All,
We recently updated torque from 3.0.2 to 3.0.3-snap.201108261653 and have found that at least in some cases, if we submit a job with a hold (with qsub -a to run after a given time) to a routing queue, when the job is released and moves to an execution queue it will still not run because moab 6.0.2 sees a procct GRES. qstat -f shows a procct resource only while the job is held and in the routing queue.
Does anyone else with a recent torque version see this problem. You can test with:
echo sleep 300 | qsub -a `date -d 'now + 5 minutes' +'%Y%m%d%H%M'`
This should hold for 5 minutes then run and sleep for 5 minutes.
Gareth
For reference, I've worked around the issue by defining in moab a GLOBAL gres called procct with a large count. The same technique would probably work with maui