On Wed, 2012-01-04 at 09:53 -0700, David Beer wrote:
> ----- Original Message -----
> > I’ve started configuring torque 3.0.3 on an SGI UV system (following
> > http://www.clusterresources.com/torquedocs/1.7torqueonnuma.shtml )
> > and am having problems.
> >
> > I started with a working non-numa 3.0.3 setup as a sanity check.
> >
> > I configured it with –enable-numa-support and made a nodes file:
> >
> > cherax-1 np=48 num_numa_nodes=6
> >
> > and mom.layout
> >
> > #cpus=0-15 mem=0-1 /boot
> > cpus=16-23 mem=2
> > cpus=24-31 mem=3
> > #cpus=32-47 mem=4-5 /user
> > cpus=48-55 mem=6
> > cpus=55-63 mem=7
> > cpus=64-71 mem=8
> > cpus=72-79 mem=9
> >
> > (note that some of the blades are set aside for io etc. and not all
> > are currently on or configured).
>> For me this is the first red flag. I don't know that we have anyone
> successfully using non-sequential layouts (skipping a blade in the
> middle). I know we have other sites, in fact it is typical, that skip
> some at the beginning or end for the boot set, but I don't think
> anyone is skipping in the middle. Would it be possible to move that
> user either to the front or to the back?
The way I've handled this is to leave the all NUMA nodes in the
mom.layout file, but then fence off the one I don't want used by jobs by
placing standing reservations on them in Moab and/or marking them
offline in TORQUE.
--Troy
--
Troy Baer, HPC System Administrator
National Institute for Computational Sciences, University of Tennessee
http://www.nics.tennessee.edu/
Phone: 865-241-4233