Building a Linux-Based High-Performance Compute Cluster

The Rocks clustering package from the University of California at San Diego makes it easy to build and maintain a high-performance compute cluster with off-the-shelf hardware.

You have an application running on a relatively new dual-core
workstation. Unfortunately, management wants it either to complete
faster or be able to take on a larger dataset in the same time as it
runs now. You do a bit of investigating and find that both
SMP and cluster versions of the application are available. You are using
the SMP version on the workstation. You could speed things up if you
could run on a quad core (or more) workstation, but the boss is not
too receptive to the expenditure in the current economic climate.
But wait, you do have a pile of 32 single-socket servers that were
replaced earlier in the year. They're only single core, but 32 of
them should have more capacity than the dual-core workstation, if you
can just find a way to get them all to play together—that would be a
cluster.

So, what is a cluster? Here's one accepted definition: a cluster is a group
of computers all working together on the same problem. To accomplish
this, the machines in the cluster must be appropriately
interconnected (a network) and trust each other.

It is possible to configure the networking and security manually, but
there are easier ways to accomplish this using any one of a number of
cluster provisioning and management systems. At the moment, one of the
more popular packages is the Rocks package maintained by a team at
the University of California, San Diego, under a grant from the
National Science Foundation.

Rocks is termed a cluster provisioning, management and maintenance
package. It helps you set up the cluster in the first place (from
bare metal); it provides the tools to run parallel programs, and it
provides the tools to maintain and extend the cluster after it is
created.

The package is delivered as a series of .iso images that you burn
onto a series of CDs or DVDs. You then boot the machine that will
become the head node from the appropriate DVD or CD, and the
installation routine guides you from there. After asking a
minimum number of questions in an interactive phase, the installation
program builds the head node. Upon reboot, you invoke a single
routine (insert-ethers) to add the rest of the machines as compute
nodes. To add a compute node, you simply network boot it, and it will
be added to the cluster, loaded and configured automatically. After
the last node is complete, you have a functional cluster, ready to
execute parallel applications.

So, with all of this in mind, let's build a cluster with those
otherwise unloved machines.

Step 1. Hardware Setup

The first item on the agenda is setting up the hardware. The overall
idea is to have a set of connected computers. Ideally, the machines in
the cluster should be as identical as possible, so no single machine
or group of machines will be the weak link in any parallel
computation. The same homogeneity should apply to the network, because
most parallel computation relies on continuous communication between
all of the nodes within the cluster.

Find a spot to set up your 32 servers, and make sure you have enough
power and cooling to support them. As you connect all of the servers
to power, label both ends of each power cord so you can keep track of
what is connected to each power strip in the rack.

Because you are starting with a clean sheet, now is a good time to
update and configure the BIOS on each system. Set the BIOS clock to
the current time as closely as practical (plus or minus five minutes is
a good goal). Most clustering packages keep the BIOS clocks
synchronized during operation, but only if the clock is reasonably
close to the correct time at the beginning.

Because the machines are used, it's prudent to wipe all the
disks before loading the cluster software. There are many ways to
accomplish this. One fairly thorough method is to use DBAN (Darik's
Boot and Nuke). This self-contained application can perform several
disk wipe techniques, including two that have some level of Department
of Defense approval.

Remember, the goal here is to make all the machines in the cluster
as identical as possible. But, this is a goal, not a hard and fast
requirement. Heterogeneous clusters will work, but you may need to be
careful as to how you deploy workloads on the machines to get the
best performance.

Step 2. The Network

Now that you have all the compute nodes configured and in the rack,
it's time to set up the communications network. Figure 1 shows a
typical networking setup for a simple compute cluster. In this
configuration, the Ethernet fabric most likely would be used for
administrative purposes, while the InfiniBand fabric would carry the
compute traffic. If you don't have InfiniBand hardware
available, you can just ignore the bottom section of the diagram. The
Ethernet fabric can carry both the administrative and compute
traffic.

Figure 1. Network Setup for a Compute Cluster

The best Ethernet network configuration for your cluster would be a
single 48-port switch. If a switch like that is not available, you always can
resort to a set of smaller federated switches forming a full
fat tree network for the cluster. Like the compute nodes themselves,
the network should be as uniform as possible.

Plan all the cable runs, remembering that Ethernet cables have a
nonzero cross section. Before you install them, test each cable.
There is nothing as aggravating as finding that a cable is bad
after it has been tied into the rack in a dozen places. Once again,
label both ends of each cable to make troubleshooting simpler if it
is necessary.