Hi all,
I have a small cluster with 3 nodes, each node has 2 CPUs with 4 cores each.
I have been using the cluster for a few month now and it works mostly great
with pbs and open-mpi.
One problem I have been running into for a while is the following:
Starting a job with a script containing
#PBS -l nodes=1:ppn=8
works perfectly. The job starts on 1 node on all 8 cores.
However
#PBS -l nodes=2:ppn=8
will start the job. qstat -f tells me that it is running on 16 cores but
checking with "top" shows that the job is only running one 1 core on 1
node (the node listed second in the nodes files). I could not find
anything in the MOM logs concerning errors.
Any help would be much appreciated.
Cheers, Jan
--
Jan Dettmer, Postdoctoral Fellow
School of Earth and Ocean Sciences, University of Victoria
Victoria, BC V8W 3P6
office: (250) 472-4342 email: jand at uvic.cahttp://web.uvic.ca/~jand/