ARC2

Operating system

ARC2 is the second phase of the ARC service here at Leeds offers a Linux-based HPC service, based on the CentOS 6 distribution. ARC2 has been in service since January 2014.

There are significant improvements to the batch scheduler, particularly relating to the syntax for submission of parallel jobs. This is referred to as the node syntax.

Hardware

ARC2 consists of a constellation of HP based servers and storage. A schematic of the rack layout is below, which is separated into a high density component geared towards computation and a low-density portion providing mostly infrastructure (click for larger version):

40Gbit/s interconnect to compute blades and access to infrastructure (e.g. Lustre storage) on the edge

Gigabit

Management and general networks facilitating system boot. All user traffic carried over the InfiniBand network

Network topology

All user-facing systems (login and compute) are connected to the InfiniBand network and use it to transfer all user data. This is a layered network, with the latency of communication dependent upon the number of (36-port) switch hops required to route between the source and destination devices. The diagram below shows the cluster’s topology, sometimes described as a half clos network (click for larger version):

ARC2 Network Topology

Each server has a 4X quad-data-rate (QDR) connection which can send and receive data at 3.6GB/s. Each switch has two 4X QDR links up to the core, able to transfer data at ~8GB/s.

The latency between servers connected to the same switch is around 1.1 microseconds. Between servers connected to different switches, the latency is around 1.5 microseconds.

By default, jobs will be dispatched to any compute node in the cluster. The following parameter can be used for a job to be given a better distribution of nodes:

attribute

comments

-lplacement=optimal

Minimises number of switch hops

-lplacement=scatter

Ignore topology concerns and run anywhere but potentially introducing more latency than necessary to all communications (default)

Lustre file system

A large amount of infrastructure is dedicated to the Lustre parallel filesystem, which is mounted on /nobackup. This is served over infiniband, and is configured to deliver ~4GB/s from a 170TB filesystem. It is possible to tune the filesystem in a more-extreme (or conservative) manner, however this configuration achieves a sensible compromise between data integrity and performance.