Kris Boulez wrote:
>What do people find good resources (books, websites, tools) to learn
>more about administering a compute cluster. I'm not looking for general
>unix sysadmin (been doing this for 10 years), but stuff which comes into
>play when administering large numbers of machines.
>I looked at the biobluster install diary Chris posted a few days ago,
>but was wondering if people know of other resources.
>>Kris,
>
Hey Kris-
Most of the available printed or online clustering resources are either
totally out of date or more often they are written from the perspective
of people who:
o Want to build tightly coupled supercomputer-like systems on the cheap
that will only really run parallel apps ( 'beowulf')
o People who are willing to do silly and complicated things in order to
get the fastest possible performance at the expense of everything else
including reliability and ease of management. There is this huge bias
out there towards getting the fastest possible raw performance at the
expense of literally everything else.
Both of these aproaches are generally not cool for life science clusters
which typically are not "beowulf-style" systems anyway.
With some exceptions biologists don't build clusters designed to run a
single instance of some massively parallel application at supercomputer
speeds. Biologists tend to use clusters as a way of distributing their
huge non-parallel ("embarassingly parallel") compute demands across many
inexpensive, loosely coupled systems. The software layer that handles
job scheduling, remote execution and dispatch is typically something
like PBS, GridEngine or Platform's LSF suite.
This is why I tend to use the term "compute farm" rather than "cluster"
for most of the stuff I build.
When it comes to administering large, loosely coupled systems used for
life science research I have not found any good comprehensive books or
online references. I do know that people are working on such things for
OReilly and other publishers though...
You may want to try seeing if there is anything useful up at the
SourceForge Clustering foundry: http://foundries.sourceforge.net/clusters/
Anyone else have links?
From my experience here are the 2 biggest pain points that I have
found from a cluster admin perspective. If you can solve this to your
(and your manager's) expectations then you in a very good position !
Knowing how to tackle these 2 things before you purchase your cluster is
even better. heh.
(**1**) Reducing administrative burden as much as possible
This is your # 1 concern as a cluster administrator. The goal is to do
everything possible to avoid having to treat and manage your cluster as
dozens or hundreds of individual machines. When I was at Blackstone one
of my internal research interests was figuring out how to make a 1,000
node cluster require only one half-time administrator to operate.
It boils down to ruthlessly automating and scripting everything that is
humanly possible. In an ideal world your cluster compute elements will
then become:
o anonymous (users should never care where their job actually runs)
o interchangeble (if a node dies the workoad is migrated and a new
server is brought online)
o disposable (if a node breaks send it back to the vendor and pop in a
cold spare *whenever convenient*)
There are lots of methods for easing cluster administration. Some are
commercial and some are free. I saw a company at the OReilly
Bioinformatics Conference called LinuxNetworx
(http://www.linuxnetworx.com/) that had these amazing "ICE boxes" in
their rack that combined serial console, remote power control and
temperature monitoring into one small package. Very cool - wish I could
buy those as a standalone product.
My biggest tools in this area are (a) SystemImager and (b) remote power
control
SystemImager (www.systemimager.org) kicks all kinds of ass. Using it I
can completely install a cluster node from scratch without having to
attach a keyboard or anything else. Just boot off an autoinstall CDROM
or floppy or in some cases just a network-based PXE boot will do the trick.
Besides automating the process of partitioning disks and installing the
operating system and layered software SystemImager also allows you to
incrementally push out changes which makes the process of installing or
upgrading software or libraries pretty trivial.
Remote power control is nice because I can remotely kill or reboot nodes
that are misbehaving and I can also turn on and turn off the entire
cluster in a staged manner (so you don't blow your power circuits!)
With these 2 tools in hand, this is what my admin philosophy becomes:
(1) If a node is behaving, don't touch it
(2) If a node acts strangely use systemImager to automatically wipe the
disk and reinstall the OS from scratch (remotely)
(3) If a node acts strangely after it has been freshly imaged then
remotely kill the power and leave it dead.
(4) Whenever it is _convenient_ for me as an administrator take the dead
node out and pop in a spare. Thanks to systemimager in about 6 minutes
I'll have a fully operational cluster node that is again performing
useful work. The dead node can either be diagnosed onsite (if you feel
like it) or sent back to the vendor for replacement.
No muss, No fuss. The key is to never waste time dealing with any
individual machine.
(**2**) Research and install your load management sotware carefully
It makes me sad to see people go out and spent tens of thousands of
dollars (or even more) on cluster hardware only to turn around and
neglect the software side of things by throwing on a halfass default PBS
rpm install and walking away.
PBS may be free but it requires care and attention to get it configured
and keep it online. Many people who don't do their due dillegence end up
screwing themselves because they find that they need someone almost
fulltime just to keep the darn load managent layer running. This is
especially true for PBS where people are constantly finding themselves
patching and recompiling the code from source.
This is why I recommend LSF software from Platform. It may be expensive
(really expensive...) but it installs in minutes, is easily configured
and is way more stable then any of the competition (pbs, pbsPro,
gridengine, etc). In the long run the reduced administrative burden and
serious fault tolerance that LSF provides can make the cost of the
commercial license very reasonable.
Another alternative that is cheaper than LSF is to build the cluster
yourself but hire professional consultants to come in and handle the
tricky part of getting the load management system configured and
tweaked. The good people at Veridian systems sell a commercial version
of PBS called "PBSPro" that is reasonably priced. They'll even give you
the source code if you need it. Paying Veridian for a few days of
consulting time may be worth it if they leave you with a fully
configured system that does not require lots of ongoing care and feeding.
Damn I'm long winded today.
-Chris
--
Chris Dagdigian, <dag@sonsorol.org>
Life Science IT & Research Computing Geek; http://BioTeam.net
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E Yahoo IM: craffi