Hello all,
I've been working on trying to get a very simple standard reservation to
work for the past few days now. The version of Maui and Torque are
listed below as is the relevent information about the configuration.
Now I should note that I have another, slightly older cluster, that has
the EXACT same configuration, minus the host names and fair share
quotas, where this reservation seems to work just fine.
I did some digging in the log files on the two machines and it appears
that when maui checks the nodes during its scheduling iteration on the
old cluster, it does so in a different order than on the new cluster.
That statement will make more sense as you read on.
The problem is, when I submit a job to my batch queue, which does not
have any standard reservations or acl_hosts in pbs, it winds up running
on the hosts specified as dedicated in the debug standard reservation.
On the old cluster, it seems that the reservation successfully keeps
jobs off of those hosts during the time frame mentioned.
Does anyone have any suggestions as to what I am doing wrong? I'm sure
it's something small that I am missing and that the docs on the site
don't mention. And I wonder if the difference in the order of the
queues being mentioned in the log file has anything to do with it. It's
the only real difference I found in the logs between the two machines.
Version Info:
New cluster:
Maui version: 3.2.6p21
Moab Scheduling Library, version 3.2.6p20
Torque: 2.3.6
Old Cluster:
Maui version: 3.2.6p14
Moab Scheduling Library, version 3.2.6p14
Torque: 2.0.0p8
Relevant config (same between both machines):
maui.cfg:
SRCFG[debug] ACCESS=DEDICATED
SRCFG[debug] CLASSLIST=debug
SRCFG[debug] STARTTIME=8:00:00 ENDTIME=18:00:00
SRCFG[debug] HOSTLIST=node00[1-4]
SRCFG[debug] DEPTH=10
SRCFG[debug] DAYS=MON,TUE,WED,THU,FRI
SRCFG[debug] TIMELIMIT=30:00
qmgr listing for debug queue:
queue_type = Execution
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0
Exiting:0
acl_host_enable = False
acl_hosts = node004,node003,node002,node001
resources_max.walltime = 01:00:00
resources_default.walltime = 00:15:00
enabled = True
started = True
Possible relevantn log entries:
new cluster:
MPBSNodeUpdate(node001,node001,Idle,head)
MPBSLoadQueueInfo(head,node001,SC)
INFO: queue 'debug' started state set to True
INFO: class to node mapping enabled for queue 'debug'
INFO: queue 'batch' started state set to True
INFO: class to node not mapping enabled for queue 'batch' adding
class to all nodes
old cluster:
MPBSNodeUpdate(node001,node001,Idle,head)
MPBSLoadQueueInfo(head,node001,SC)
INFO: queue 'batch' started state set to True
INFO: class to node not mapping enabled for queue 'batch' adding
class to all nodes
INFO: queue 'debug' started state set to True
INFO: class to node mapping enabled for queue 'debug'
--
Jason Williams