A summary of past postings from the Beowulf mailing list up to October 10, 2003

In the fall of 2003, there was something of a general theme on the Beowulf mailing list. The theme revolved around the environment in which our clusters live. That is, the machine room. This topic involves the design of machine
rooms and how to save our dear clusters from imminent disaster when
the cooling fails. Join us as we take a look at killing power (quickly), building machine rooms, and environment monitoring.

Beowulf: Kill the Power Faster than poweroff?

On the 11th of September 2003 David Mathog asked about ways to shut down
a system and kill the power faster than using the poweroff command.
He was interested in ways to shut down systems in emergency over
heating conditions. He has some Athlon systems that he wanted to
shutdown in the event of a cooling fan failure. The ensuing
discussion was very interesting because not only was a fast shut
down of the system discussed but also some old Unix habits.

Initially, David mentioned he wanted something like running a
sync command and then powering off the system. The sync command
would flush the file system buffers and get a consistent file system
state, hopefully completely flushing the journal for a journaled
file system. The first suggestion from Ariel Sabiguero was to do
either use the halt -p -f command or poweroff -f. He said that
in his tests it only took 3 seconds to shuts down his system instead
of 20 seconds. David responded that this approach did indeed work
quickly, but was not a clean shutdown, forcing the file system to
be repaired via fsck upon reboot including fixing inodes. He
didn't necessarily mind this since, in his opinion, a fsck is better
than fried hardware. Bernd Schubert added that since it was a
2.4.21 kernel or later, that a series of changes to
/proc/sysrq-trigger would force a shutdown in less than a second
on his machine.

At his point the discussion brought in the question of how to sync the
file system prior to shutdown. Alan Grossfield mentioned the ever
popular system administrator approach of running the sync command
3 times before shutting down. Donald Becker and others said that this
sysadmin habit was before the advent of good journaling file systems.
Now, just one sync should be sufficient to ensure a consistent
file system before shutting down. Who says you can't teach an old
sysadmin new tricks?

{mosgoogle right}

The final piece of the discussion was how Linux unmounted file systems
during the shutdown. The esteemed Robert Brown (Bob or rgb to people
on the list) started off the discussion by mentioning that applications
that have an open file(s) would have to be killed quickly to avoid a
race and to satisfy David's initial request for a very fast shutdown.
Greg Lindahl provided some great insight into how Linux shuts down. He
pointed out that Linux nicely kills the processes during shutdown. He
also mentioned that if you want to do it faster, using the kill -9
command will greatly speed things along. Robert Brown also added that
during a fast shut down you might get some of the
infamous .nfs20800200 leftover files if the system had an active
nfs mount.

The moral of these discussions is that if you have to do a very fast
shutdown, you should first make sure you are using a journaling
file system on all disks on the system in question, and the follow
one of the suggested methods to shut down the system. However, you
could end up having to fsck the file systems. The final moral is
that you don't need to run sync three times before having to
shutdown a system.

Beowulf Q: Building a small machine room? Materials/costs/etc.

There was a very interesting discussion about designing machines rooms
for clusters that was initiated by Brian Dobbins on the 16th of
September 2003. He wanted to solicit the advice of people who had experience
designing small machine rooms for their clusters. Of course the first
reply was from Robert Brown, who has lately taken machine room requirements,
especially electrical, to heart. He responded with many good comments
about power, airflow, structural integrity (primarily weight), sound
and light, networking, security, comfort an convenience. Michael Stein
and Bob Brown added many more details to the electrical requirements for
supplying power to the room including estimating the power required,
what kind of room power supplies to use, and where to put the power
distribution panels. Bob went on to add additional items such as a
thermal kill switch for the machine room in the even of a complete
cooling failure. He pointed out that in the event of a room cooling
failure, the temperature can go from a reasonable temperature to system
thermal failure in just a few minutes (it's better to spend some time
fixing file systems than to have to purchase all new equipment). He also
extended his comments about a raised floor for the machine room. Bob also
made a very good point that it is highly recommended to get facilities
people involved very early in the design process not only for the design
of the room but also operational issues such as not shutting down the
chillers in the winter just because it's cold outside!

There was some more discussions about the thermal kill switch for the room
pertaining to the amount of time from room cooling failure to system
thermal damage. It was discussed that in some cases, it could only be a
matter of tens of seconds before systems start having all kinds of thermal
problems. Jim Lux added a very important point that when the power is killed,
the power to the lights and a few receptacles needs to be left on. He also
asked the proverbial question, "you also don't want the thermal kill to shut
down the power to the blowers, now, do you?" Jim went on to say that there
is reason for shutting down the blowers to prevent the HVAC (heating,
ventilation, and air conditioning) system from spreading the fire around
the building. He went on to suggest that one might want to consider a
configuration with staged responses to over heating. For example, a moderate
over heat shuts would shut down the computer equipment, a bigger over heat
like a fire shuts down the blowers, and then you would have the big red
emergency button next to the door that shuts down all power. There was
some humorous discussion about the location of the big red button close
to the light switch and the door opening switch and what has and could
happen. Robert Brown added the one could also use scripts and lm_sensors
as part of the first stage to shut down nodes before they figuratively
melt down and before the emergency facility crews can identify and fix
the cooling problem.

{mosgoogle right}

Andrew Latham also added some general guidelines including contacting a
halon installation company (halon is a gas used to suppress a fire so
that a sprinkler system is not needed - water and electronics don't mix
well). There was some discussion about whether halon could be used in
new installations and Joe Jaeggli pointed out that halon has been banned
because it is a CFC (Chlorofluorocarbon) and can damage the ozone layer.
Joel gave several replacements for halon. Finally Luc Vereecken gave
everyone a lesson in how halon works by describing the chemistry of the
combustion (fire) and how halon disrupts the combustion. Luc also pointed
out that he uses his cluster for doing research in combustion chemistry!

This was a very good discussion about many of the things that go into
making a good machine room for clusters. If you planning a new machine
room or want to upgrade or retrofit an old one, you would be wise to
review the posting to the Beowulf mailing list and perhaps ask further
questions on the mailing list.

Beowulf: Environment Monitoring

To go along with the discussion of designing a machine room for clusters
was the discussion of environmental monitoring of clusters. On the 30th of
September, Mitchel Kagawa started this discussion by asking about environmental
monitoring appliances like
NetBotz/RackBots
that email or call you in the event of a problem in the machine room (Mitchel's
machine room hit 145 degrees because the cooling shut down, but amazingly
20 of his 64 nodes were still running!). Robert Brown (who is that masked
man?) responded that the NetBotz boxes would work fine, but were a bit
expensive in his opinion. He suggested using a temperature probe on the
serial port of a select number of nodes and then a series of scripts to
perform whatever action you desire based on the readings. Bob also went
on to describe a do-it-yourself (diy) setup using a PC-TV card and an X10
camera to monitor the room remotely (finally a use for those stupid pop-up
adds!)

Several people suggested using lm_sensors
and scripts to monitor and shut down nodes appropriately. This allows you
to address each node in addition to an overall room monitoring system.
Robert Brown and others suggested using lm_sensors with a polling cron
script to watch the systems and take appropriate action if and when needed
(please see the previous discussion). If you get one or two emails
from a script based on lm_sensors you might not have a problem, but if
start to get a number of them, this might indicate a room problem. There
was some discussion about how lm_sensors presents the monitoring
information and that is presented to the users. Robert Brown, Don Becker,
Rocky McGaugh and others joined in the discussion which spilled over
to the lm_sensors mailing list.

Bill Broadley presented an alternative idea to using lm_sensors. If a
system using lm_sensors goes down, you can no longer receive any information
from the sensors. Bill mentioned an inexpensive stand alone temperature
monitoring probe that be used to monitor temperature even if a node is
shut down. The monitoring data even includes a time stamp and the device
can build a temperature histogram for you. In his cluster they put one
behind the machine (what he called the rack temperature), one on top of
the rack (what he calls the room temperature), and one in the air conditioner
output, and puts them all on the same wire connector. He has found them
useful to help convince facility people that the room was getting hot more
often than they thought.

This article was originally published in ClusterWorld Magazine. It has been updated and formated for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Jeff Layton has been a cluster enthusiast since 1997 and spends far
too much time reading mailing lists. He occasionally finds time to perform
experiments on clusters in his basement.

Unfortunately you have Javascript disabled, please enable Javascript in order to experience the comments correctly