François Trahay

Odroid-U2 cluster

24-core Odroid-U2 cluster

This cluster is composed of 6 Odroid-U2. Each node is equipped with:
- a quad-core ARM Cortex-A9 CPU (running at 1.7GHz, with 1MB L2 cache)
- a Mali-400 GPU (that we do not use)
- a 10/100 ethernet network card
- a 16GB (class10) micro SD that hosts the system filesystem

Each node is powered using a 5v 2A power adaptor.
The nodes are connected to a Gigabit Ethernet switch.

Installing the system

Download and install the ubuntu image from HardKernel website.
This cluster is connected to a server that serves as a frontend. This server (an old Xeon desktop machine) hosts:
- a NFS server that exports /home and /opt (where compiled software is installed)
- a LDAP server
- a DNS+DHCP server

Once the server is configured, the installation of the nodes is straightforward since most debian packages are available (nfs-common, libpam-ldapd, etc.)

Installing HPC software

Since we mainly work on MPI, we installed the latest versions of MPICH and Open-MPI from source without any problem. The installation is in /opt so that it is common for all the nodes of the cluster.

We also installed performance analysis tools like EZTrace in order to analyze the behavior of applications (pthreads, memory consumption, MPI messages, etc.) on the ARM processors.

Quick performance evaluation

Disclamer: the performance of the network on the stark cluster is really poor (we need to investigate). These figure are only given to get a rough idea of the performance of the cluster.

We compare the performance obtained on the Odroid cluster with the performance obtained on the Stark cluster.
The Stark cluster is composed of 4 nodes connected through a Gigabit Ethernet network (please note that the performance of this network is really poor, we need to investigate). Each node is equipped with:
- a quad-core Intel Xeon E5-2603 (Sandy Bridge) CPU running at 1.80GHz (10MB L3 cache)
- 8 GB of RAM

NAS Parallel Benchark

We ran the MPI version of the NAS Parallel Benchmark (version 3.3) using both MPICH (version 3.0.1) and Open-MPI (version 1.6.3). Here are the results we obtained for Open-MPI. The results with MPICH are similar.

Performance for Class=A Nprocs=16. Only 4 nodes were used for both clusters.

Kernel

Execution time on Stark ( s )

Execution time on Odroid ( s )

BT

127.78

99.19

CG

4.76

6.39

EP

1.8

4.73

FT

24.48

24.14

IS

12.00

8.09

LU

25.67

91.59

MG

4.35

4.75

SP

203.66

142.69

Performance for Class=A Nprocs=4. Only 1 node was used for both clusters.

Energy consumption

Using a simple wattmeter, we measured the power consumption of the nodes:
1 node:
- when idle: 2 W
- when computing : approx. 7 W

4 nodes:
- when idle: 8W
- when computing: approx. 24 W

Changing CPU frequency
On the Odroid-U2 boards, it is possible to set the minimum and maximum CPU frequencies, in order to control the power consumption.
The CPU frequency varies from 200MHz (when idle) to 1.7GHz (when computing). It is possible to set the CPU frequency to up to 2GHz, but this heats too much for the passive cooling.