I don't want 2 processors from one machine, 3 processors from another, and so on. I have a quadcore cluster and I want to reserve 4 complete machines, each having 4 slots. I cannot just specify that I want 16 slots because it does not guarantee that I will have 4 slots on 4 machines each.

Changing the allocation rule to FILL_UP isn't enough because if there are no machines that are completely idle, SGE will simply "fill up" the least loaded machines as much as possible instead of waiting for 4 idle machines and then scheduling the task.

Is there any way I can do this? Is there a better place to ask this question?

7 Answers
7

SGE is weird with this, and I haven't found a good way to do this in the general case. One thing that you can do, if you know the memory size of the node you want, is to qsub while reserving an amount of memory almost equal to the full capacity of the node. This will ensure it grabs a system with nothing else running on it.

Thanks, I considered something similar earlier except it was with load_avg instead of memory. Basically, like you say, it should be possible to specify a hard limit on a resource, that would probably only be satisfied if the machine is idle.
–
artifMay 6 '11 at 8:40

After thinking about it some more, I think using resource limits is probably the best solution if your SGE doesn't support exclusive scheduling. Otherwise, the person should refer to my link to Sun's wiki. Anyways, thanks for the input!
–
artifMay 6 '11 at 8:51

Another possibility I've considered, but quite error prone, is to use qlogin instead of qsub and manually reserve 4 slots on each desired quadcore machine. Understandably, automating this is not particularly easy or fun.

Lastly, maybe this is a situation where hostgroups can be used. So for example, creating a hostgroup with 4 quadcore machines in it and then qsubbing to this specific subset of a queue, requesting a number of processors equal to the maximum total number in the group. Unfortunately this is kind of like hardcoding and has a lot of drawbacks eg having to wait for people to vacate a particular hardcoded hostgroup and requiring changes if you want to switch to 8 instead of 4 machines etc.

I'm trying to do almost exactly the same thing and am looking for ideas. I think a pe_hostsfile is the best option, but I'm not a manager of our SGE system, and there's no hosts files configured, so I need a quick work around. Just checked out the Configuring Exclusive scheduling link, and see that that also requires managerial rights...

I think a wrapper script could do it. I wrote a bash one-liner to figure out the number of available cores left on a machine (below). Our grid is heterogeneous, with one node having 24 cores, some 8, and the majority only 4, which makes things a little awkward.

Problem now is how to get this bash variable into a SGE startup script preprocessing directive??
Maybe I'll just provide the below arg in my shell script, as the pvm environment ships with SGE. Doesn't mean it's configured though...

We set the allocation rule to the number of slots available on the node (in this case, 4). This means you can only start jobs with n*4 CPUs, but it will achieve the desired result: 16 CPUs will be allocated as 4 nodes with 4 CPUs each.

Note here that the allocation_rule is set to 12, this means that the job MUST use 12 cores on a node. If you submit a job requesting 48 CPUs, it will wait and grab 4 FULL nodes when they are available.

I still use the -l excl=True option, but i suspect this is irrelevant now.

If I have jobs that require only one CPU (and I do), I submit them to the same queue, but without the -l exel=True option, and I use my original pe_environment which has the allocation_rule = 'fillup'
Any job submitted with the mpich_12 environment will wait till there are complete nodes free.
My cluster works so much better now.