Turn your desktop computer into a high-performance cluster with PelicanHPC

Clutter to Cluster

Crunch big numbers with your very own high-performance computing cluster.

If your users are clamoring for the power of a data center but your penurious employer tells you to make do with the hardware you already own, don't give up hope. With some some time, a little effort, and a few open source tools, you can transform your mild-mannered desktop systems into a number-crunching super computer. For the impatient, the PelicanHPC Live CD will cobble off-the-shelf hardware into a high-performance cluster in no time.

The PelicanHPC project is the natural evolution of ParallelKnoppix, which was a remastered Knoppix with packages for clustering. Michael Creel developed PelicanHPC for his own research work. Creel was interested in learning about clustering, and because adding packages was so easy, he added PVM, cluster tools like ganglia monitor, applications like GROMACS, and so forth. He also included some simple examples of parallel computing in Fortran, C, Python, and Octave to provide some basic working examples for beginners.

However, the process of maintaining the distribution was pretty time consuming, especially when it came to updating packages such as X and KDE. That's when Creel discovered Debian Live, spent time wrapping his head around the live-helper package, and created a more systematic way to make a Live distro for clustering. So in essence, PelicanHPC is a single script that fetches required packages off a Debian repository, adds some configuration scripts and example software, and outputs a bootable ISO.

Boot PelicanHPC

Later in the article, I'll use the script to create a custom version. For now, I'll use the stock PelicanHPC release (v1.8) from the website [1] to put those multiple cores to work. Both 32-bit and 64-bit versions are available, so grab the one that matches your hardware.

The developer claims that with PelicanHPC you can get a cluster up and running in five minutes. However, this is a complete exaggeration – you can do it in under three.

First, make sure you get all the ingredients right: You need a computer to act as a front-end node, and others that'll act as slave computing nodes. The front-end and the slave nodes connect via the network, so they need to be part of a local LAN. Although you can connect them via wireless, depending on the amount of data being exchanged, you could run into network bottlenecks. Also, make sure the router between the front end and the slaves isn't running a DHCP server because the front end doles out IP addresses to the slaves.

Although you don't really need a monitor or keyboard or mouse on the slave nodes, you need these on the front end. If you have a dual core with enough memory, it wouldn't be a bad idea to run the front end on a virtual machine and the slave on physical machines. Primarily, PelicanHPC runs on memory, so make sure you have plenty. If you're doing serious work on the cluster, you can make it save your work on the hard disk, in which case, make sure you have a hard disk attached. In fact, to test PelicanHPC, you can run it completely on virtual hardware with virtual network connections, provided you have the juice on the physical host to power so much virtual hardware.

With the hardware in place, pop in the Live CD in the front-end node and let it boot. If you want to choose a custom language or turn off ACPI or tweak some other boot parameters, you can explore the boot options from the F1 key; press Enter to boot with the default options.

During bootup, PelicanHPC prompts you thrice. First it wants you to select a permanent storage device that'll house the /home directory. The default option ram1 stores the data on the physical RAM. If you want something more permanent, you just need to enter the device, such as hda1 or sda5. The device can be a hard disk partition or a USB disk – just make sure it's formatted as ext2 or ext3. If you replace the default option ram1 with a device, PelicanHPC will create a user directory at the root of that device.

Next, PelicanHPC asks whether it should copy all the configuration scripts and the examples to the home directory on the specified device. If this is the first time you are running PelicanHPC, you'll want to choose Yes. If you've selected a permanent storage location, such as a partition of the disk, on subsequent boots, you should choose No here. Of course if you are running PelicanHPC from RAM, you'll always have to choose Yes.

Finally, you're prompted to change the default password. This password will be for the user user on the front-end nodes, as well as on the slave nodes. PelicanHPC is designed for a single user, and the password is in cleartext.

When it has this info, PelicanHPC will boot the front-end node and drop off into the Xfce desktop environment.

Set Up the Cluster

Now that the front-end node is up and running, it's time to set it up for clustering. PelicanHPC has a set of scripts for this purpose. Either call the scripts manually or use the master pelican_setup script, which calls all the other scripts that start the various servers and connects with the slave nodes.

To start setting up the cluster, open a terminal window and type:

sh pelican_hpc

If you have multiple network interfaces on the machine, you'll be asked to select the one that is connected to the cluster. Next, you're prompted to allow the scripts to start the DHCP server, followed by confirmation to start the services that'll allow the slave nodes to join the cluster. At first, the constant confirmations seem irritating, but they are necessary to prevent you from throwing the network into a tizzy with conflicting DHCP services or from accidentally interrupting on-going computations.

Once it has your permission to start the cluster, the script asks you turn on the slave nodes.

Slave nodes are booted over the network, so make sure the Network boot option is prioritized over other forms of booting in the BIOS. When it sees the front-end node, the slave displays the PelicanHPC splash screen and lets you enter any boot parameters (language, etc.), just as it did on the front-end node earlier.

Instead of booting into Xfce, when it's done booting, the slave node displays a notice that it's part of a cluster and shouldn't be turned off (Figure 1). Of course, if your slave nodes don't have a monitor, just make sure the boot parameters in the BIOS are in the correct order and turn it on.

When the node is up and running, head back to the front end and press the No button, which rescans the cluster and updates the number of connected nodes (Figure 2). When the number of connected nodes matches the number of slaves you turned on, press Yes. PelicanHPC displays a confirmation message and points you to the script that's used to reconfigure the cluster when you decide to add or remove a node (Figure 3).

To resize the cluster, run the following script:

sh pelican_restarthpc

That's it. Your cluster is up and running, waiting for your instructions.

Crunchy Bar

The developer, Creel, is a professor of economics at the Autonomous University of Barcelona in Catalonia, Spain. He works in econometrics, which involves a lot of number crunching. Therefore, you'll find some text and example GNU Octave code related to Creel's research and teaching. If you're interested in econometrics, the econometrics.pdf file under the /home/user/Econometrics directory is a good starting point. Also check out the ParallelEconometrics.pdf file under /home/user/Econometrics/ParallelEconometrics. This presentation is a nice introduction to parallel computing and econometrics.

For the uninitiated, GNU Octave [2] is "a high-level computational language for numerical computations." It is the free software alternative to the proprietary MATLAB program, both of which are used for hardcore arithmetic.

Some sample code is in the /home/user/Econometrics/Examples/ directory for performing tests such as kernel density [3] and maximum likelihood estimations, as well as for running the Monte Carlo simulations of how a new econometric estimator performs.

Related content

The server space is changing, bringing new opportunities for the agile admin. If you plan on taking up residence in the new data center, you'd better get your toolkit ready. This month we explore tools and techniques for data center and server room environments.