The ARC1 service closes on May 31st 2017.

Operating system

Hardware

ARC1 consists Sun Microsystems x84-64 based servers and storage, with the addition of a later phase of HP hardware. A schematic of the rack layout is below, which is separated into a high density component geared towards computation and a low-density portion providing mostly infrastructure:

The system also includes additional compute nodes purchased by individual groups and dedicated to their use.

A schematic of the rack layout is shown below (click for larger version):

ARC1Rack Layout

Network topology

All user-facing systems (login, compute, datamovers) are connected to the infiniband network and use it to transfer all user-data. This is a layered network, with the latency of communication dependent upon the number of (36-port) switch hops required to route between the source and destination devices. The diagram below shows a full Clos network capable of supporting 1536 CPU cores. This is the network topology in place for the two fully populated high-density racks.

A schematic of the ARC1 Full Clos network is shown below (click for larger version):

ARC1 Full Clos network

Each server has a quad-data-rate (QDR) 4X connection which can send and receive data at 2.6GB/s. Each 12X (three 4X) uplink to the core can therefore transfer data at 8GB/s.

The latency between servers, including a single switch hop is around 1.8 microseconds. Each switch hop introduces a 60ns latency. As the 72-port switches are a tiered set of 6 36-port switches, they introduce 300ns (5-hops) to the latency.

By default, jobs will be dispatched to span a minimum number of switch hops for the particular size of job. This is controlled through the parameter:

attribute

comments

-lplacement=optimal

Minimises number of switch hops (default)

-lplacement=scatter

Ignore topology concerns and run anywhere potentially introducing more latency than necessary to all communications

The partially filled rack has a 2:1 blocking factor. This is similar to the full-clos topology, above, however only one uplink (rather than 2) connects from each shelf to the core M2-72 switches.

Jobs which will span more than one shelf will be placed preferentially on the non-blocking IB island and smaller jobs directed towards the blocking island.

Lustre file system

A large amount of infrastructure is dedicated to the Lustre parallel filesystem, which is mounted on /nobackup. This is accessed over infiniband, and is configured to deliver ~3.2GB/s from a 100TB filesystem. It is possible to tune the filesystem in a more-extreme (or conservative) manner, however this configuration achieves a sensible compromise between data integrity and performance.