Hi,Is there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.Thanks,

Post by llikethatIs there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.

Post by llikethatIs there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.

Post by llikethatIs there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.

Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes-- ReutiHi Reuti,Thank you for the reply, I read about configuring resource attributes, but i'm not understanding how this can be set for the mount points

You define a complex as RESTRING and fill it with the mount points available on a machine in each exechost's specification:

Post by llikethatIs there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.

Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes-- ReutiHi Reuti,Thank you for the reply, I read about configuring resource attributes, but i'm not understanding how this can be set for the mount points

You define a complex as RESTRING and fill it with the mount points available on a machine in each exechost's specification:

$ qconf -me node01...complex_values mounts=:/nfs/fubar1:/nfs/fubar2:

and then you request e.g.

$ qsub -l mounts="*:/nfs/fubar2:*" ...

-- Reuti

Thank you very much for the reply will try this out and get back to you.

Post by llikethatIs there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.

Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes-- ReutiHi Reuti,Thank you for the reply, I read about configuring resource attributes, but i'm not understanding how this can

be set for the mount points

You define a complex as RESTRING and fill it with the mount points available on a machine in each exechost's specification:

$ qconf -me node01...complex_values mounts=:/nfs/fubar1:/nfs/fubar2:

and then you request e.g.

$ qsub -l mounts="*:/nfs/fubar2:*" ...

-- Reuti

Thank you very much for the reply will try this out and get back to you.Hi Reuti,I had doubt in setting the complexes,If i'm setting the mount points in the complex, and configuring it on the node configuration, how does SGE undestand it? Will SGE check for the presence of these mount points before submitting the job to the node?-Bharani

Post by llikethatIs there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.

Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes-- ReutiHi Reuti,Thank you for the reply, I read about configuring resource attributes, but i'm not understanding how this can be set for the mount points

$ qconf -me node01...and then you request e.g.$ qsub -l mounts="*:/nfs/fubar2:*" ...-- ReutiThank you very much for the reply will try this out and get back to you.Hi Reuti,I had doubt in setting the complexes,If i'm setting the mount points in the complex, and configuring it on the node configuration, how does SGE undestand it? Will SGE check for the presence of these mount points before submitting the job to the node?

No, it's just a fixed string - SGE doesn't know what it is, and it's also not necessary. Normally I would assume that you don't change mount points twice an hour. So they are fixed bound to machines. There is nothing to check for SGE.

You could nvertheless setup a load sensor, which will report the string of found mount points in a generic way for all machines. In a format as described (to avoid that a substring matches a found mount point) you can then fill the values automatically.

Post by llikethatIs there an option by which SGE can check for the mount points, licenses etc before starting a job on a node?By doing this I want to restrict SGE not to submit jobs on the nodes which do not satisfy this.

Please check out complexes: http://wikis.sun.com/display/gridengine62u5/Configuring+Resource+Attributes-- ReutiHi Reuti,Thank you for the reply, I read about configuring resource attributes, but i'm not understanding how this can be set for the mount points

$ qconf -me node01...and then you request e.g.$ qsub -l mounts="*:/nfs/fubar2:*" ...-- ReutiThank you very much for the reply will try this out and get back to you.Hi Reuti,I had doubt in setting the complexes,If i'm setting the mount points in the complex, and configuring it on the node configuration, how does SGE undestand it? Will SGE check for the presence of these mount points before submitting the job to the node?

No, it's just a fixed string - SGE doesn't know what it is, and it's also not necessary. Normally I would assume that you don't change mount points twice an hour. So they are fixed bound to machines. There is nothing to check for SGE.

You could nvertheless setup a load sensor, which will report the string of found mount points in a generic way for all machines. In a format as described (to avoid that a substring matches a found mount point) you can then fill the values automatically.

-- Reuti

Hi,Oh ok, now i got to understand it better. Instead of doing a load sensor. What if i do a prolog which will be the mount commands to mount the NFS shares before submitting the job. Will this work?-Bharani

Post by llikethatIf i'm setting the mount points in the complex, and configuring it on the node configuration, how does SGE undestand it? Will SGE check for the presence of these mount points before submitting the job to the node?

No, it's just a fixed string - SGE doesn't know what it is, and it's also not necessary. Normally I would assume that you don't change mount points twice an hour. So they are fixed bound to machines. There is nothing to check for SGE.You could nvertheless setup a load sensor, which will report the string of found mount points in a generic way for all machines. In a format as described (to avoid that a substring matches a found mount point) you can then fill the values automatically.-- ReutiHi,Oh ok, now i got to understand it better. Instead of doing a load sensor. What if i do a prolog which will be the mount commands to mount the NFS shares before submitting the job. Will this work?

You would have to tell the job, which particular mount points are necessary for this job. When I get you right, you don't want to mount all mount points all the time.

A place for such information (which is unrelated to SGE in any way), are is the job context. This is so called meta-informastion and not used by SGE in any way. But you on your own can set an access this information:

$ qsub -ac MOUNTS=/nfs/app1,/nfs/app2 myjob.sh

Then you can access this information with `qstat -j $JOB_ID` in the line with the entry "context:". It may be necessary to run the prolog and epilog then as root, which can be achieved by prefixing the path to the script with root@/usr/sge/cluster/myprolog.sh

Pitfall: when you have more than one job on a node at a time, it might be necessary to check, whether any other job running on this particular node is still using one mount point which you would like to unmout in an epilog. To avoid in addition a race condition, the clean solution in this case would even be to disable the queue instance in the epilog, check for other jobs using the mount point, unmount them, enable the queue instance again.

Missing mount points representing OS and cluster problems are usuallychecked by non-SGE cluster tools although you could presumably write aJSV or Prolog script that could check for these things.

Best implementation I saw was at a site where the admins had a scriptthat probed for every OS issue they had ever encountered in the past.The script ran at node boot time and periodically afterwards. As soon asany problem was detected the node gets put into disabled state 'd' andthe admins get notified. The same script also puts the node into 'd'state for the first 5 minutes after boot to make sure that there is timefor problems to show up and be detected before jobs start landing on it.

If the mounts are supposed to be missing (perhaps because differentservers have different mounts configured by deesign) then you can attacha Boolean true/false attribute to the exec hosts and users could submitjobs like: "qsub -l -hard fastScratch=true ./myJob.sh" or whatever.

For serious and transparent use a JSV might work. The JSV can examinethe user job script and make changes on the fly such as redirecting to adifferent queue or queue instance.

License-aware scheduling is another matter. Google "Olesen FlexLM" tosee how it's done with SGE. Basically the modern method involvesdeclaring requestable/consumable resources for each license entitlementand making it dynamic via a script that polls the license server andconstantly adjusts the value of the resource. This method has supersededthe load-sensor method.

Post by llikethatHi,Is there an option by which SGE can check for the mount points, licensesetc before starting a job on a node?By doing this I want to restrict SGE not to submit jobs on the nodeswhich do not satisfy this.Thanks,

Missing mount points representing OS and cluster problems are usuallychecked by non-SGE cluster tools although you could presumably write aJSV or Prolog script that could check for these things.

Best implementation I saw was at a site where the admins had a scriptthat probed for every OS issue they had ever encountered in the past.The script ran at node boot time and periodically afterwards. As soon asany problem was detected the node gets put into disabled state 'd' andthe admins get notified. The same script also puts the node into 'd'state for the first 5 minutes after boot to make sure that there is timefor problems to show up and be detected before jobs start landing on it.

If the mounts are supposed to be missing (perhaps because differentservers have different mounts configured by deesign) then you can attacha Boolean true/false attribute to the exec hosts and users could submitjobs like:Â "qsub -l -hard fastScratch=true ./myJob.sh" or whatever.

For serious and transparent use a JSV might work. The JSV can examinethe user job script and make changes on the fly such as redirecting to adifferent queue or queue instance.

License-aware scheduling is another matter. Google "Olesen FlexLM" tosee how it's done with SGE. Basically the modern method involvesdeclaring requestable/consumable resources for each license entitlementand making it dynamic via a script that polls the license server andconstantly adjusts the value of the resource. This method has supersededthe load-sensor method.

Hi Craffi,That's a lot of information. But i'm really not sure if i'll be able to set it up like this. Because we are currently using DRMAA for submitting array jobs. The DRMAA is in python, but it does not use any -l flag at the moment.

Post by llikethatHi,Is there an option by which SGE can check for the mount points, licensesetc before starting a job on a node?By doing this I want to restrict SGE not to submit jobs on the nodeswhich do not satisfy this.Thanks,

Post by llikethatSubject: Re: [GE users] checking mount points or any other user defined attributesDate: Tuesday, 23 November, 2010, 5:30 PMMissing mount points representing OS and cluster problems are usuallychecked by non-SGE cluster tools although you could presumably write aJSV or Prolog script that could check for these things.Best implementation I saw was at a site where the admins had a scriptthat probed for every OS issue they had ever encountered in the past.The script ran at node boot time and periodically afterwards. As soon asany problem was detected the node gets put into disabled state 'd' andthe admins get notified. The same script also puts the node into 'd'state for the first 5 minutes after boot to make sure that there is timefor problems to show up and be detected before jobs start landing on it.If the mounts are supposed to be missing (perhaps because differentservers have different mounts configured by deesign) then you can attacha Boolean true/false attribute to the exec hosts and users could submitjobs like: "qsub -l -hard fastScratch=true ./myJob.sh" or whatever.For serious and transparent use a JSV might work. The JSV can examinethe user job script and make changes on the fly such as redirecting to adifferent queue or queue instance.License-aware scheduling is another matter. Google "Olesen FlexLM" tosee how it's done with SGE. Basically the modern method involvesdeclaring requestable/consumable resources for each license entitlementand making it dynamic via a script that polls the license server andconstantly adjusts the value of the resource. This method has supersededthe load-sensor method.Hi Craffi,That's a lot of information. But i'm really not sure if i'll be able to set it up like this. Because we are currently using DRMAA for submitting array jobs. The DRMAA is in python, but it does not use any -l flag at the moment.

Post by llikethatHi,Is there an option by which SGE can check for the mount points, licensesetc before starting a job on a node?By doing this I want to restrict SGE not to submit jobs on the nodeswhich do not satisfy this.Thanks,

Post by craffiBest implementation I saw was at a site where the admins had a scriptthat probed for every OS issue they had ever encountered in the past.The script ran at node boot time and periodically afterwards. As soon asany problem was detected the node gets put into disabled state 'd' andthe admins get notified.

I'd have hoped that sort of thing was standard practice, for some valueof `every OS issue'. (I use Nagios.) You do need to judge whether it'sworth it for a particular failure mode, both in terms of resources towrite/organize a test, and the resources to run it, which might have asignificant effect on the compute nodes, or the head, if you're runningit there.

The SGE angle is that the job prolog/epilog are a convenient place tomake tests just at the time they particularly matter, without putting acontinual load on the node. You can either ensure the queue goes intoan error state, check that and rely on figuring out why, or usesomething like NCSA under Nagios. To have Nagios disable queues, forinstance, you have to be careful either to run specific commands undersudo or make sure nagios has appropriate SGE privileges, and it's notnecessarily easy to test that it all works.

--Dave LoveAdvanced Research Computing, Computing Services, University of LiverpoolAKA ***@gnu.org