User Tools

Site Tools

How to Use the Trestles Cluster

This is a brief “how to” summary of Trestles for users familiar with the Razor cluster.

Equipment

Trestles has about 240 usable identical nodes, each with four AMD 6136 8-core 2.4 GHz processors. Each node has 64GB of memory and a flash drive with about 90GB usable for temporary space in /local_scratch/$USER/, much less local storage than Razor nodes, but it's faster. Nodes are interconnected with Mellanox QDR Infiniband. Each 32-core node has about the same computational power as the 16-core Intel nodes in the Razor cluster, so an AMD core is about half the power of an Intel core for these models. Because Trestles has more cores which are less powerful, it is most suitable for highly scalable codes, that is codes with good parallel performance, and also for codes that require more than the 24 to 32 GB of memory on most Razor nodes. There is no serial queue on Trestles; use Razor's serial queue. If you have a serial job that needs more than 32GB of memory, you can use a full Trestles node (nodes=1:ppn=32), though a Razor 96GB node would be faster.

Usage

Login node is trestles.uark.edu which is a load balancer to identical login nodes with local names tres-l1 and tres-l2. You can also access from Razor by ssh tbridge ; ssh tres-l2.

Initially there are three queues with maximum runtimes of 30 minutes, 6 hour, and 72 hours: q30m32c, q06h32c, q72h32c. All nodes are identical with a few reserved for shorter jobs. Only whole node access (nodes=N:ppn=32) is supported initially. Serial (single-core) jobs should be run on the Razor cluster unless you specifically need the 64GB memory capacity of Trestles. In that case allocate the whole node with ppn=32.

The user environment is as similar as possible to the Razor cluster. Most codes will need to be recompiled because of (a) the different Infiniband network may affect the low-level links for some MPI versions, and (2) anything compiled on the Intel compilers with processor vectorization greater than SSE2 won't run on an AMD processor. When recompiling, the compiler modules will handle the low-level links, and Intel compiler
-xSSE2 will be the best that will run for compiler vectorization on Trestles.

File Systems

Parallel file systems are Lustre /storage/$USER and /scratch. Your home area is located at /storage/$USER/home and is symlinked to /home/$USER. For most applications the home will be transparent vs. the slightly different “autohome” setup of the Razor cluster. There are also additional reserved condo storage areas /storaged (Douglas group) and /storageb (Bellaiche group, and /storage2 (Track II).

UPDATE Trestles /scratch and /local_scratch are very small, and there will be no directories corresponding to your userid. Existing directories /scratch/$USER with data will be wiped Thursday 3/17/16. There will be only per-job directories, created by the job prologue script, that will expire 14 days after each job ends. In your batch scripts you can use the defined environment variables below, or reconstruct the names of the already created scratch areas at runtime as /scratch/$PBS_JOBID and /local_scratch/$PBS_JOBID on the head compute node. Here is an example for a particular job.

Trestles /storage quota is 900 GiB soft/1000 GiB hard. Condo storage partitions /storage[x] don't have quotas. Unlike razor, your trestles /home area is part of /storage (/home/username is a symlink to /storage/username/home) and does not have its own quota. Future backup schemes that may pull from /home may limit that size. At this time Nothing on trestles is backed up.

We expect to port these /scratch changes back to Razor to help with the space crisis there.

File Transfer to/from Razor

UPDATEtbridge/rbridge renamed bridge from both sides.

There is an interface node bridge to transfer files between Razor and Trestles. The interface node is only to move data; it can't submit jobs. Please login to the interface node and use cp/mv to move files between parallel file systems over Infiniband instead of using scp, which will send files over the slower ethernet network. You may also use rsync, which, on a single node, defaults to using cp. On the bridge node, Trestles file systems are mounted at /storage,/scratch,and /home, and Razor file systems are mounted at /razor/storage,/razor/scratch, and /razor/home. Trestles file systems are also mounted at /trestles/storage and so on. To copy /storage/$USER/mydir from Razor to Trestles, starting on Razor:

ssh bridge
cd /razor/storage/$USER
rsync -av mydir /storage/$USER/

File Transfer to/from World

The Trestles network doesn't yet have a file transfer node to the world, it's on the list to do. Until then, please use Razor tgv to bridge. Please avoid sending huge files to the login nodes on either Trestles or Razor.

Condo nodes are selected in queue qcondo by PBS node properties. Only a sufficient property is required (i.e. m256gb is unique over the currently installed nodes). Nodes with Intel Broadwell E5-26xx v4 cpus have the property “v4”.