I'm having trouble with job reservations in 6.2u3.
We're trying to achieve two things with reservations:
1) Service days - there's a calendar entry setting the queue to off for
a few hours every six weeks. This works as it should and jobs with h_rt
such that they would run into the service period don't start until after
the service period.
2) Forcing the system to drain when parallel jobs get to the front of
the queue, stopping smaller jobs with lower queue priority from filling
the gaps (other than by back fill).
To achieve the second goal, we set the default for all jobs to use "-R
y" and then allow a small number of reservations. My understanding of
this is that the first few jobs in the queue should have reservations
created and the scheduler will only back fill jobs around these.
I've never been completely convinced this works and so, a few days ago,
when the queue state was such that the reservation was important (a 64
cpu job was stuck at the front of the queue) I enabled scheduler
monitoring and looked at "schedule" file to see when the reservation was
created.
What I discovered was that all of the reservations started at the end of
the next service period which is still five weeks away. This was despite
it being obvious from the state of what was running that there should be
a suitable sized slot on the system within a few days. The stuck 64 cpu
job also only requested 1 day of runtime so it would also complete
before the next service day once started.
My assumption from this is that, because the service day calendar
setting relies on creating a reservation (if you set the allowed number
of reservations to 0, service days with calendars don't work), the
reservations take on a priority such that the job reservations can't
happen until after the calendar reservation.
Is my assumption correct? If so, is this the intended behaviour? Again,
if so, how do I achieve what I was trying to do be another method?
Regards,
Chris
--
Dr Chris Rudge - Research Computing Services Manager
IT Services, University of Leicester, LE1 7RH
Tel. +44 (0)116 2522223
chris.rudge at le.ac.uk
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247866
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].