Cluster Distributions

Article Index

The Rocks project began at the San Diego Supercomputer Center, with collaboration from the Millenium project at UC-Berkeley, and now has a large group of maintainers involving labs, universities, and companies from around the world. The Rocks user registry currently shows 168 clusters registered with the Rocks project, ranging from small clusters in high schools to very large clusters at national labs. Like OSCAR, the strength of Rocks is the installation and SQL-based configuration database. {mosgoogle left}

Rocks also provides a full suite of tools for running your cluster, including queuing and scheduling software, the usual MPI packages, tools for simplifying administration, etc. Rocks features the Ganglia cluster monitoring system, developed in many of the same labs that developed Rocks (although Ganglia is now available in other distributions as well). A screenshot of Ganglia is shown in Figure Two, showing memory and processor usage on cluster nodes. Ganglia is highly configurable, and supports collection of a number of different kinds of data (Ganglia will get more attention in a future column as well). The newest versions of Rocks include support for the latest 64-bit processors, including the Opteron and IA-64 chips.

Rocks and OSCAR have brought administration of clusters a quantum leap forward. Where administrators were previously left using a standard Linux distribution, then cobbling together a collection of home-grown and downloaded scripts to build a cluster, now they have full-featured software that comprehensively addresses the problem of running clusters. Perhaps even more importantly, OSCAR and Rocks have created support communities for cluster administrators. Like so many open source projects, there are mailing lists, FAQs, and web pages filled with information on running these tools. A number of commercial vendors have started distributing OSCAR, or Rocks, or both, and these vendors can provide professional support all of your cluster software.

The Third Generation - Re-envisioning the OS

While most of the second generation solutions are still effective means of using and running even large clusters, they still fundamentally rely on an OS designed for a single system. A few groups of researchers have attempted to re-envision the basic functions of an operating system to tailor it specifically to a cluster environment. One of the most significant innovations in the cluster OS space is the bprocdistributed process space, as described in detail in a another column. The bproc concept grew out of the original Beowulf project at NASA's Goddard Space Flight Center. Scyld Computing (a subsidiary of Penguin) sells and supports a the bproc system. Scyld also provides the beoboot system for booting and maintaining clients with a minimal OS, plus associated resource management and administration tools. If you purchase the Scyld version, you also receive extensive documentation and professional support, as well as a few more GUI tools for management.

Figure Two: Ganglia monitoring system, distributed with Rocks

Figure Threeshows the Scyld setup and node status graphical tools. Both bproc distributions make administration of large clusters much simpler. The minimal OS running on the nodes and the capabilities of bproc remove most of the problems of maintaining user accounts, authentication, consistent versions of libraries across nodes, etc.

The simplicity of maintaining clusters with this dramatically different approach comes at a cost, however. Since bproc systems are so different and not yet extremely widespread, there is sometimes a lag before the latest versions of your favorite MPI implementation, scheduler, or commercial application become available. Since bproc changes some fundamental assumptions about the OS, there is frequently some porting to do, and takes time for the busy engineers at Scyld to keep up. Since there is not yet general agreement on where the responsibilities of, for instance, the OS kernel end and those of the message passing libraries begin, there are occasional clashes. The latest MPICH versions provide a daemon for high-speed process creation, but bproc itself provides this functionality. Differences like these will take time to iron out. However, if you do not need the absolute latest version of every tool, the Scyld methodology can let you run very large clusters with a relatively small effort.

Another substantial re-envisioning of the cluster OS is the Open MOSIX approach. MOSIX provides a set of kernel patches, which, like bproc provide for automatic migration of processes from the head of your cluster to the computing nodes. MOSIX uses a set of algorithms to attempt to balance the load on all the nodes of the cluster, assigning and moving processes as it sees fit. MOSIX will migrate processes around your cluster to balance CPU, memory usage, or I/O usage. MOSIX has some of the same drawbacks as the bproc systems; it's different enough from the normal Linux model that certain packages, for instance resource managers and schedulers, won't work right away with MOSIX clusters, although MOSIX provides it's own alternatives for these functions.

Much more research work is being done in the cluster OS space, and new ideas, tools, and distributions spring up all the time. Next time, we'll hit a few more and get more in detail about installation procedures for some of these tools. Happy clustering.

This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Dan Stanzioneis currently the Director of High Performance Computing for the Ira A. Fulton School of Engineering at Arizona State University. He previously held appointments in the Parallel Architecture Research Lab at Clemson University and at the National Science Foundation.