The Orion desktop cluster, low-power processors, and distros
for newbies.

The Beowulf mailing list provides detailed discussions about
issues concerning Linux HPC clusters.
In this article I review some postings to the
Beowulf list on
the new (at the time) Orion desktop cluster which brings up some
observations about low-power processors in clusters, and what Linux
distribution people recommend for starting out with clusters.

96 Processors Under Your Desktop

On August 30, 2004 the ever-present Eugen Leitl forwarded the Orion
Multisystems announcement to the beowulf mailing list. If you recall
Orion is (was) producing a desktop cluster and a desk-side cluster.
Russell Nordquist was the first to respond with two main observations.
The first one was that the top brass at Orion are from Transmeta, and
(2) the cluster wasn't a shared memory setup, which he thought would
increase latency. Russell also asked about the Efficeon CPU,
particularly compared to the Opteron. Russell said a 4-way Opteron
box was about the same price as the 12 CPU Orion box was curious about
the performance comparison.

In response to Russell's query, Jim Lux posted that he thought Orion's
market (offices) might have power consumption and noise issues with a
quad Opteron and wondered what the numbers were. Jim then postulated
that if a 4-way Opteron were equal to 12 Efficeons, then the 96 CPU
Orion box would like 32 Opterons. Then Jim went on to say that if the
Opterons were 100W each, then 32 Opterons was 3200W, which was much
larger than the 1500W-1800W range claimed by that Orion.

Russell responded that he found some data for a 4-way Opteron at
1.8 GHz. With some magic to scale the numbers to 2.4 GHz he thought
the linpack performance for a 4-way Opteron would be similar to the
12 CPU Orion desktop box. So Russell thought that Jim's comments were
well taken and that the Orion box wins on heat and noise.

Mark Hahn posted that he thought the performance of the Efficeons
relative to the Opterons was about right. He also mentioned that for
typical HPC clusters, memory capacity and bandwidth are low. So he
thought that the Orion box might be good for cache-friendly things
like sequence oriented bioinformatics codes or Monte-Carlo stuff that
uses small models. He went on to say that he thought the main appeal
of the Orion machines was the tidiness/integration/support. He also
went on to say that for comparison you could put 18 Apple Xserves
that would deliver about the same GFLOPS, but dissipate 2-3 times as
much power, and take up about twice the space. But he thought that
"chicks" would dig the stack of Xserves more (I didn't know chickens
were into clustering. Hmmm... I need to rewatch "Chicken Little".)

Glen Gardner posted that he has been touting the virtues of low power
clusters for a while (along with many other people). He found them to
be very effective and they were the only way to get his 14-node
cluster in his apartment. They cost him about $20 a month in
power/cooling and are on 24/7 and in use much of the time. Glen
thought that the ability to have a good performing cluster in your
cubicle (or apartment), which has low power requirements and low
noise, was a very attractive one. He also thought the price/performance
for the desktop unit was very good.

Mark Hahn responded back that the Orion would be good at certain
tasks but not good at more traditional HPC applications. Mark also
asked some questions about the necessity of having a cluster on your
desk. He mentioned that he could do many or most thing on his cluster
remotely.

In response, Robert Brown said that he loved his home cluster and
felt it server many useful purposes. He said he did lots of
challenging things on his home cluster and even did production work
on it.

Michael Will also mentioned that AMD has two low power Opteron
versions. They have an HE version that is specified at 55W, but it
was twice the cost of standard Opterons. There is also an EE version
that is specified at 30W (Note: these are for the Socket 940 CPUs.
The new Socket F CPUs may fall into other power ranges). Michael
also asked what is the range of per year cost for a 1U dual Opteron
for air conditioning and power consumption.

A fairly long thesis about power consumption and cooling was provided
by Robert Brown. As with virtually all of Robert's postings, except
the ones disparaging Fortran, they are well worth reading. Robert did a
very good initial estimate of about $200 a year to power and cool
a dual Opteron machine ($1/Watt/Year). Robert went on to estimate
when a more energy efficient but slower CPU would be worth the monetary
savings in energy (power and cooling).

Jim Lux had a companion posting to Robert's, answering Michael's
question. Jim had a very good point that computing costs based purely
on energy savings is not correct. It is a nonlinear function. For
example he said he could add a cluster with 1500W to his office
dissipation and would only cost about $400/year. However, this would
mean an additional air conditioning unit would have to be installed
that would cost as much as the computer. Jim made a very good point,
"I doubt that anyone can cost justify using more lower powered
processors (i.e. fewer watts/GFLOPS) on a purely dollars/completed
machine operation basis, except in some fairly unusual environments
(i.e. a spacecraft, where every watt/joule is precious and expensive).
The real value (to my mind) is making a cluster a minimal-hassle
item, the same way a desktop PC is perceived today."

Jim went on to explain in more detail what he means by making a
cluster an appliance rather than something that is created when
one needs it.

At this point the discussion moved into one of COTS (Commercial
Off The Shelf) versus a more proprietary cluster. The discussion
was interesting and broke down into a discussion of what COTS really
means. Overall this thread was very interesting because Orion is (was)
trying something new and exciting and it appears that in general
the cluster community appreciates that.

A Good Linux Distribution for Clusters?

Discussing Linux distributions for clusters is always a fun time. There
are plenty of personal opinions but there good information always seems
to surface. On Sept. 8, 2004, Jeff Dragovich said that he had a small
10-node cluster for running a finite element program using MPI and
wondered what flavor of Linux would be the best to use.

Tim Mattox was the first to reply and recommended a Linux distribution
that is supported by the cluster management tool you select. Tim
recommended using
Warewulf,
which only requires an rpm based
distribution. He also mentioned that he preferred the new cAos
distribution. He also uses cAos-1 and has found it to be very stable
and very easy to install and maintain. He also said that he is
anxiously awaiting cAos-2 (note: cAos-3 is under development and a
"beta" is available).

Robert Brown (rgb) posted some good points. He started by saying that
whatever distribution Jeff chooses, he should be become adapt at using
it. Rgb went on to say that he had some trouble with FC1, but FC2 was
working fine. He made a quick summary of other distributions including
Debian. He then put on his "rant" cap and talked about his problem with
major cluster distributions. In his opinion, the best way to install
a cluster is from a repository via PXE or something like kickstart
where the only thing that is different between a head node and a
compute node are the packages chosen and some post-install scripts.

There were some further discussions about using Debian as a cluster
distribution. Not to belittle Debian, but indications are that it
hasn't been used in many clusters up to this point. However, that
could change given the wonderful plethora of desktop Debian
distributions available.

Erwan from Mandrakesoft posted to correct some comments from rgb. He
went on to discuss how
CLIC,
Mandrakesoft's GPL-ed cluster distribution
could help those new to clusters. From Erwan's description it sounds
like a very good cluster distribution.

This article was originally published in ClusterWorld Magazine. It has been
updated and formated for the web. If you want to read more about HPC
clusters and Linux you may wish to visit
Linux Magazine.

Jeff Layton has been a cluster enthusiast since 1997 and spends far
too much time reading mailing lists. He can found hanging around the Monkey
Tree at ClusterMonkey.net (don't stick your arms through the bars though).