On Mon, Feb 14, 2005 at 05:45:06PM +0100, Bas van der Vlies alleged:
> At our side we run one job per nodes and have an MAXPS setting of 600
> hours and max walltime 120 hours. Our nodes have 2 processors. When the
> user submits
> an job for eg:
> 1) qsub -I -lnodes=60:ppn=1 -lwalltime=10:00:00 ( will run )
> 2) qsub -I -lnodes=60:ppn=2 -lwalltime=10:00:00 ( wil not run MAXPS
> violation)
>> Now when job 1 runs is allocates the whole node and maui sees that it
> oocupies 4 task ( 2 nodes and each node two cpu's = 4 tasks). So the
> used tme will becalculated as 60 * 2 * 10 = 1200 hours. What is far more
> then allowed!
>> The next example will only run one job instead of 2:
> qsub -I -lnodes=30:ppn=1 -lwalltime=10:00:00 ( will run )
> qsub -I -lnodes=30:ppn=1 -lwalltime=10:00:00 ( will not run MAXPS
> violation )
>> I have an patch that checks if NODEACCESSPOLICY SINGLEJOB is set. If so
> then it forgets the the cpu's per node.
I understand what you are doing (and the patch looks fine to me), and I could
even see myself using it, but I'm not sure this is the right thing to do. If
nothing else, could this behaviour be a configuration option?
What we really need is a policy on "node seconds". It's what you are actually
trying to control. It would be simple in the SINGLEJOB world, and might only
be valid there. But I can also imagine assigning fractional seconds to jobs on
a shared node too, but that would be complicated.
I've always worked around this with routing queues in pbs. First route
to a queue with small nodes and walltime, if that fails, route to a queue with
medium nodes and walltime, etc. If the job doesn't fit through any of the
queues, then you reject the job.
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20050214/b8c8f71c/attachment.bin