The Heart of the Penguin - page 2

The Heart of Darkness

June 7, 2002

By
Brian Proffitt

In a small server room on the campus of UAB sits a kitchen supply
rack, with four big shelves, each of which holds four dual-processor
Dell computers. These sixteen boxes are hooked together in a standard
Beowulf cluster, so that all 32 CPUs are working on the same
computations in concert, providing a level of computation, their
administrator believes, that rivals the performance he could get out
of a supercomputer and at a fraction of the cost.

Dr. Andy Pollard is a Biomedical Engineer who is very familiar with
using supercomputers and mainframes. Throughout much of the last 15
years, he has used them to perform the billions of calculations needed
to simulate the effects of electrical fields on human heart tissue in
a deep effort to understand exactly what are the triggers for a
fibrillation event.

Pollard is now using what he describes as a "boilerplate Beowulf"
configuration of machines that are all running Red Hat Linux 7.2 to
run the computational software he uses. And one of the exciting things
for him is that with the tools that come with Red Hat,
he is already achieving 91 percent parallelism *practically out of the
box*.

And this power is certainly needed. The research itself is targeted
towards three areas: observation of the effect of electrical fields on
heart tissue to learn how and why defibrillation works (and, in so
doing, track down why fibrillations occur in the first place);
learning in a more direct manner why fibrillations start; and how a
fibrillation event progresses from start to finish.

In order to simulate these events, models of heart cells are created
in great detail, and connected to each other (virtually) as if they
were in a huge resistive network. Then, in the models, differential
equations are used to determine how each cell might react when an
ionic current is applied. It is the solving of these differential
equations that the real power of the Linux cluster is needed.

To give a rough idea of just how much computing muscle is needed,
Pollard described the amount of real time it would take to run a heart
simulation. With one CPU, the amount of time it would take to simulate
a 1 ms event on a two-dimensional sheet (measuring 2 cm X 2 cm) of
heart cells would be about 450 seconds. Given that a good simulation
should have about 1-2 seconds of a simulated event, then the amount of
computational time jumps to 125-250 hours (roughly 5-10 days).

With their current cluster setup, Pollard estimates that 1 ms of
simulation can be done in 50 seconds, which immediately pulls the time
of a full one-second simulated run to 13.9 hours.

Thanks to the open-source nature of Linux, Pollard has already
identified areas within Red Hat that can be tweaked to
improve the performance of the message passing interface (MPI) and
parallel virtual machines (PVMs). These tweaks, coupled with
improvements to the application software, may get the parallelism
number up to 98 percent--which could reduce that 1 ms of simulated
time down to 20 seconds of computational time and bring the time for a
total one-second event to a mere 5 and a half hours.

With this kind of performance, Pollard could increase the simulated
times of his events, or increase the area of virtual cells that are
being tested and gain far more data in the same amount of time it used
to take him to run far simpler experiments.

Nor is his program limited to settling for this level of performance.

In the past, Pollard would have to rely on proprietary hardware and
software on mainframes and supercomputers to do his work for these
research programs. Like many such projects in academia, grants are the
financial source for everything. Many times, justifying the $50,000
expense for a new mainframe might be done at the very first stages of
getting grants for a project. But, after the project was underway,
very rarely could Pollard ever get money to upgrade these proprietary
systems--leaving he and his students stuck using computers that would
grow more obsolete as the months would pass by.

Now, Pollard says, this kind of problem no longer affects him. "The
thing that I think is great," he said, "Is that this is the first
solution that provides real independence." Pollard went on to explain
that because he is using machines with ordinary Intel CPUs, his
initial hardware costs are much lower than they would be for a
mainframe. Now he can obtain the same kind of performance for around
$10,000--a number that is much easier to get from a grant.

And, because the hardware number is so much lower, Pollard feels free
to ask for new hardware on subsequent rounds of funding. With the
ability to roll out machines or even just processors on a regular
basis, Pollard is assured he can keep his work running on relatively
cutting edge hardware.

"Now all I have to do is assume these machines over just the lifetime
of the grant," Pollard said.

Some of this upgradeability is due to more than just cost. Because of
the standard nature of Red Hat, Pollard can shift his applications to
the new or upgraded machines with ease. In the past, variations
between different flavors of UNIX, for example, would delay porting
his apps from older to newer machines as he had to tweak his software
to play well with the new operating system.

This standardized platform assisted Pollard well even before he
installed his cluster. During the viability testing for the initial
grant proposal for this cluster, he was able to go from system to
system around the campus and get performance numbers from other Red
Hat machines.

Red Hat also helped him reduce system maintenance costs. In the past,
a dedicated support staff member would have to handle many (if not
all) administrative tasks on the mainframe Pollard used.

"I don't need a systems person to do everything like before," he
said, adding that he now can handle most of the administrative tasks
with Red Hat by himself, which saves his grant money for something
else--like more students to think of more ways to find a solution to
the problem of fibrillation.