> Lennart,
> Thank you for the input. I attempted the following with no luck.
>> 1) as the queues were set up in maui by the vendor, I set the
> resources_max.nodect for the server by the command
> set server resources_max.nodect = 140. restarted PBS on the master.
> repeat the test and get the same output from checkjob & diagnose
> (diagnose output included below).
>> 2) dug into bugzilla report on bug 99 as you suggested. i'm not quite
> sure that this is the exact problem that i'm experiencing as the
> diagnose reports just that its been put on batch hold rather than
> violating a maxproc limit.
>> I'm including the maui.cfg as well if this can provide some insight to
> anyone.
> Ok, I just made a big newbie mistake, pardon my repost to correct it.
>> I finally got qmgr to list me the settings for the queues. The setting
> that Lennart suggested was not set. So I added it and restarted the
server. It still reports a a policy violation of 128 > 70.
>> This is the current setting for the queue low:
> Queue low
> queue_type = Execution
> Priority = 10
> total_jobs = 2
> state_count = Transit:0 Queued:2 Held:0 Waiting:0 Running:0 Exiting:0
> max_running = 10
> resources_max.ncpus = 70
> resources_max.nodect = 140
> resources_max.walltime = 96:00:00
> mtime = Wed Jul 18 11:33:12 2007
> resources_assigned.ncpus = 0
> resources_assigned.nodect = 0
> enabled = True
> started = True
>> This is the information from PBS about one of the jobs waiting because
> of the policy violation:
> Resource_List.ncpus = 1
> Resource_List.nodect = 32
> Resource_List.nodes = 32:ppn=4
>>What is the difference between .ncpus and .nodect? And which one does
>the maui scheduler look at?
Your Torque configuration may be summarized with a snip from the
serverpriv/nodes file, as for example,
n1 np=4
n2 np=4
.
.
n35 np=4
together with the output from the qmgr command "print server" (please note
that the "list server" command does not give the full configuration).
>From your "list server" output, it looks like you have set
"resources_max.ncpus = 70". I propose that you remove this setting.
I do not use this resources_max.ncpus restriction myself, but it would not
surprise me if it gets Maui to limit your job size to 70 processors/cores.
I see nothing related to your problems in your Maui configuration.
BTW, I do not think that you need to restart Torque when you make
configuration changes in qmgr. When making changes in the nodes file,
you do need to restart Torque (in a stop-change-start sequence).
Please list your Torque configuration ("print server") and your nodes
file (serverpriv/nodes), if it does not help to remove your
resources_max.ncpus configuration lines.
Best wishes,
-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
National Supercomputer Centre in Linkoping, Sweden
http://www.nsc.liu.se