ROGER Technical Summary

System Overview

ROGER is a Dell cluster consisting of a total of 108 Intel Xeon E5-2660v=
3 processors with a total of 13.3 TB of system memory available for computa=
tion. Each Xeon E5 v3 processor is capable of performing up to 416 Gf=
lops1, yielding a peak performance of 44.9 Tflops. In addition t=
o the CPUs are 12 Nvidia Tesla K40M graphics units available for tasks that=
require GPU's. Each GPU is capable of performing at up to 1.68 Tflop=
s2, bringing the total cluster-wide performance to ~65 Tflops (t=
heoretical). The cluster is connected by a high-speed network with 40=
Gb/s switches in the core and 10Gb/s uplinks to each node.

Compute Nodes

Batch Compute Nodes

The compute nodes are Dell Power Edge R730 servers with two Intel Xeon E=
5-2660 v3 chips. Each chip has 10 cores, each running at 2.6GHz, and =
a 25MB cache. Each server has 256GB of physical RAM and 500GB of loca=
l storage for swap and scratch space. There are 24 batch compute node=
s. Their node names are in the format cg-cmpXX (where XX=
is the node number).

GPU Nodes

The GPU compute nodes are identical to the traditional compute nodes wit=
h the addition of an Nvidia Tesla K40M graphics processing unit. Ther=
e are 12 GPU compute nodes. Their node names are in the format cg-gpuXX (where XX is the node numbe=
r).

High Memory Nodes

The High Memory compute nodes are identical to the traditional compute n=
odes, except for for being equipped with 800GB of local storage using SSDs.=
There are called "high memory" because before the upgrade of Apr=
il 2016, they were the only nodes with 256GB of RAM. There are 16 high memo=
ry nodes. These nodes were designed to run Hadoop workloads, and can s=
ustain an 11TB hadoop filesystem with their SSDs. However, the production H=
adoop system is currently using the GPFS shared filesystem for storage, whi=
ch still has excellent performance and allows much greater size: currently =
175 Tb is allocated. Their node names are in the format cg-=
hmXX (where XX is the node number).

Compute Hardware

Node Type

Node Qty

CPU

CPU Qty

Total Cores

RAM

Connectivity

Storage

GPU

Compute1

24

Intel Xeon E5-2660 v32.6GHz25M Cache=
10 Cores

2

20

256 GB

(1x) 10Gb/s

500GB7.2K rpmNearline SAS 6Gb/s&nbsp=
;

n/a

GPU

12

(same)

2

20

256 GB

(same)

500GB7.2K rpm =
Nearline SAS 6Gb/s

Nvidia Tesla K40M

Hadoop/High Mem2<=
/td>

16

(same)

2

20

256 GB

(same)

800GB SSDSAS 6Gb/s

n/a

GridFTP

2

(same)

2

20

256 GB

(1x) 40Gb/s

800GB SSDSAS 6Gb/s

n/a

1Seven of these 24 are currently assigned to OpenStack.=
2These were originally the only nodes with 256GB of RAM. They a=
re currently allocated to Hadoop (11/16) and OpenStack (5/16).

The installation of seven more nodes is pending as of May 2016.

Service Nodes

In addition to the computing resources, the cluster also has 10 GPFS fil=
esystem servers, 2 high memory service nodes and an administration node. &n=
bsp;Of the GPFS server nodes, 8 are identical to the high memory nodes exce=
pt for local storage, where the single SSD is replaced by 3 600GB 15K rpm S=
AS drives for OS and filesystem use. The 2 remaining GPFS servers hav=
e an additional 6 SSD drives that connect via an internal 12Gb/s SAS connec=
tion, allowing extremely fast filesystem metadata access.

The additional 2 high memory service nodes have a 40Gb/s network connect=
ion and provide GridFTP services for data transfer into and out of the clus=
ter. These nodes will also run various VM's, such as the user login n=
ode(s) and other cluster services.

Filesystem

All cluster nodes have access to the cluster-wide filesystems. The=
filesystems are built using the General Parallel File System (GPFS) softwa=
re from IBM and backed by NetApp E2700 storage units. Each E2700 has =
180 SATA drives. Total usable disk space is 4.5PB.