We have a few clusters on our site and they all run PBS/Torque. What we
want is to setup a gatekeeper node so that we will just have to install
Globus on the gatekeeper node and not on all the head nodes of each
cluster. I tried implementing this before and have not been successful
because PBS's or Torque's routing queue functionality does not work as
documented.
I don't have any problems with (Torque's) routing of jobs from one queue
to another BUT this is only if the other queue is also local to machine
I submitted the job to.
create queue router
set queue router queue_type = Route
set queue router route_destinations = batch at localhost
set queue router enabled = True
set queue router started = True
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodect = 1
set queue batch resources_default.nodes = 1
set queue batch enabled = True
set queue batch started = True
Assuming I have "router" as a routing queue and "batch" as an execution
queue. If I submit a job to router, batch will still end up executing
the job.
But if I set "router at localhost" to route jobs to queues on another
machine "batch at anothermachine.mydomain.com", PBS won't run the jobs
anymore. PBS will tell me "Jobs rejected by all possible destinations"
even without me seeing it tried contacting anothermachine.mydomain.com.
# a queue setup on my local machine
create queue router
set queue router queue_type = Route
set queue router route_destinations = batch at anothermachine.mydomain.com
set queue router enabled = True
set queue router started = True
# a queue setup on anothermachine.mydomain.com
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodect = 1
set queue batch resources_default.nodes = 1
set queue batch enabled = True
set queue batch started = True
Has anybody in this list already got this functionality *(of routing
jobs from a queue in your local machine to a queue in a remote machine)*
working before?
Thanks,
Gerson