Parallel Computers : Cluster

Basic setup

The cluster is a Linux-based Rocks cluster
for exclusive use by the class. There are six nodes including the head
node. Each node consists of two quad-core Intel Xeon E5504 chips for a
total of eight cores per node. Jobs are submitted using the Sun Grid Engine (SGE). Each node
will only run one job at a time, so be polite and don't hog the machine
with very long runs!

Logging in

The cluster head node is crocus.csuglab.cornell.edu. It
should be accessible from any on-campus IP address (or you can reach it
if you're logged into the campus VPN. Log into
crocus using ssh with the account name and
password that was sent to you via Dropbox (your account name is your
netid). On the first login, you will be prompted for passwords for
ssh keys; just hit enter, as these keys are purely used for
private communication inside the cluster. Once you've done this, you
should be delivered to a Unix prompt. Welcome to the head node!

After your initial login, you should probably change your password to
something that you can remember. Alternately, you may want to set up
password-less ssh authentication between your machine and the cluster.
The details will depend on which ssh client you use. On Linux, I suggest
looking into keychain. On OS X
10.5 onward, ssh-agent runs for you automatically, so you
can simply add keys at the command line using ssh-add and
then not worry about it. Under Windows, you may want to look into
PuTTY,
which apparently has support for ssh keys; see
this tutorial, for example, keeping in mind as you read it that I
don't typically use Windows myself.

Directory setup

You have access to two types of storage on the cluster. Your home
directory is hosted on the head node, and is mounted by NFS on all the
other nodes. You can read or write to your home directory files in one
place and see them in other places (eventually). However, NFS uses a lot
of bandwidth, and it is easy to swamp the server. For big files, use
/state/partition1, a user-accessible local partion that
exists on each node (on the head node, this is where the home directories
are, but the head node is a special case).

Software is provided in the usual locations (/usr/bin and
/usr/local/bin), but there are also common installations in
/share/apps/local. In particular, this is where GCC 4.4 (and
gfortran) and ATLAS are installed. This is not in the default path, so
you will either need to edit your path or type the fully-qualified
command names to use these compilers.

Hardware

Compute nodes on the cluster are dual quad-core Intel Xeon E5405 chips
running at 2.0 GHz. This is the "Gainestown" family fabricated in the 45
nm process, based on the Nehalem architecture. For more details on the
processor type, try cat /proc/cpuinfo (followed by
Googling!). There are some slides
on this architecture from another class that you can read to find out
more.

The nominal peak per node is 8 GFlop/s, if one starts two SSE
instructions per cycle (each of which can handle two double-precision
floating point operations). There are 16 GB of physical RAM per node.
Each core has a 4-way associative 32 KB L1 cache and an unshared 8-way
256KB L2 cache. There is also a shared (within a processor) 16-way 4 MB
L3 cache.

Queueing

The command to submit jobs to the queue is qsub; try
qsub -help or man qsub at the command line to
see the basic documentation. Running qsub scriptname will
schedule scriptname to be run on one of the compute nodes.
scriptname is usually a shell script; in addition to the
normal shell operations, one can use comment lines starting with
#$ to set execution options (these options can also be set
via the command line).

Some good options to know are:

-cwd - run from the current working directory (where
qsub was called).

-wd name - specify that the named directory should be
the working directory where the job is run