Hello!If i have to maintain a host, i would like to take it temporarilyoffline, but afterwards but it back online to the cluster. I need to dothis, because i want all running jobs to finish, but no new job shouldstart on this host.

In PBS i could do it with one command: pbsnodes -o hostname1And back online: pbsnodes -c hostname1Is there something similar in GE?

Read the manpage for 'qmod' , you want the 'qmod -d all.q@<nodename>' --this will (d)isable the node which allows running jobs to finish butwill prevent new work from landing.

Wildcards work as well: qmod -d 'all.q@*'

-Chris

Post by neubauerHello!If i have to maintain a host, i would like to take it temporarilyoffline, but afterwards but it back online to the cluster. I need to dothis, because i want all running jobs to finish, but no new job shouldstart on this host.In PBS i could do it with one command: pbsnodes -o hostname1And back online: pbsnodes -c hostname1Is there something similar in GE?

Post by neubauerHello!If i have to maintain a host, i would like to take it temporarilyoffline, but afterwards but it back online to the cluster. I need to dothis, because i want all running jobs to finish, but no new job shouldstart on this host.In PBS i could do it with one command: pbsnodes -o hostname1And back online: pbsnodes -c hostname1Is there something similar in GE?

Post by craffiHi Sebastian,this will (d)isable the node which allows running jobs to finish butwill prevent new work from landing.

See http://www.nw-grid.ac.uk/LivScripts for a simple script(disable-nodes) which does that, and the reverse. However, you mightfind it better to restrict the nodes to a specific ACL to allow testingthem (e.g. sge-restrict-nodes from the same page); we'll often have suchnodes running HPL to see if they stand up. Note those scripts use nodenumbers, and work cross-cluster, courtesy of genders. (I think thereare similar things by other people lying around, but those are the onesI can point to.)

--Dave LoveAdvanced Research Computing, Computing Services, University of LiverpoolAKA ***@gnu.org

Another approach is to create a host group, e.g. @disabled, and an RQSthat limits "hosts @disabled to slots=0". To disable a host, just addit to the host group. The benefit of this approach is that it works forall queues on the host without needing to enumerate them.

Post by craffiHi Sebastian,this will (d)isable the node which allows running jobs to finish butwill prevent new work from landing.

See http://www.nw-grid.ac.uk/LivScripts for a simple script(disable-nodes) which does that, and the reverse. However, you mightfind it better to restrict the nodes to a specific ACL to allow testingthem (e.g. sge-restrict-nodes from the same page); we'll often have suchnodes running HPL to see if they stand up. Note those scripts use nodenumbers, and work cross-cluster, courtesy of genders. (I think thereare similar things by other people lying around, but those are the onesI can point to.)

Post by craffiHi Sebastian,this will (d)isable the node which allows running jobs to finish butwill prevent new work from landing.

See http://www.nw-grid.ac.uk/LivScripts for a simple script(disable-nodes) which does that, and the reverse. However, you mightfind it better to restrict the nodes to a specific ACL to allow testingthem (e.g. sge-restrict-nodes from the same page); we'll often have suchnodes running HPL to see if they stand up. Note those scripts use nodenumbers, and work cross-cluster, courtesy of genders. (I think thereare similar things by other people lying around, but those are the onesI can point to.)

Post by templedfit to the host group. The benefit of this approach is that it works forall queues on the host without needing to enumerate them.

The approach I recommended uses a host group. Do people not normallytest nodes in batch before letting users back on them, which theadditional ACL allows?

I should have mentioned the refinement of maintaining a host commentcomplex recording why the node is (semi-)disabled, which isn't in theversion I referred to. I.e. the sge-restrict-nodes should have a--reason arg, which sets the string-valued `problem' complex, andsge-unrestrict-nodes nullifies it. (The hostgroup isn't redundant withthe complex defined, because an RQS can't restrict on the basis of thecomplex as far as I know.)

--Dave LoveAdvanced Research Computing, Computing Services, University of LiverpoolAKA ***@gnu.org