CRAYS, CLUSTERS, AND A CROWDED HOUSE:
A LOOK AT SCD'S NEW HARDWARE
Bob Niffenegger is definitely relieved. The manager of operations in
the Scientific Computing Division's Operations Section is good-
humored by nature, but the trials of installing a supercomputer could
rattle anyone. And SCD has installed not one, but as many as seven
major computers (depending on which systems one considers major).
"It's too good to be true how trouble-free the hardware installation was.
Something has to be wrong," Bob laughs.
Another relieved man is Gene Schumacher, the group head for
supercomputer systems in SCD's High Performance Systems Section,
who shepherded the software installation and integration of six new
Crays into the network of NCAR Cray computers. "It's been a grueling
summer for my group and the Cray analysts and engineers that we
work with--lots of long hours and long days. Now that the hectic pace
has let up some, we're starting to tidy things up."
The new installations give NCAR's computing power a major boost.
Allocations of computer time to the university community have risen
20%, and in-house scientists are enjoying similar increases. Beyond
their sheer number-crunching ability, the machines--including a
massively parallel system and a workstation cluster--expand the range
of computers on hand in SCD. As usual, the driver is the continual
demand from scientists both within and without UCAR for increased
computer time, speed, and storage capacity.
Over the past couple of years, the supercomputing world has been
gravitating toward two newer technologies: massively parallel
systems, in which a task can be subdivided and sent to many processors
at once; and clusters, which distribute work among entire machines
rather than among processors. SCD opted to explore both of these new
directions. The installation action began in March and the pace is only
now beginning to ease.
New hardware notwithstanding, SCD already has its eyes on the next
generation of machines. "Leading-edge computing is often a
prerequisite to leading-edge science," says SCD director Bill Buzbee.
Other research centers in atmospheric science are forging ahead with
their own acquisitions. Instead of massively parallel machines, the
emphasis seems to be on ever-larger, moderately parallel versions of
the traditional supercomputer. Several centers--England's Hadley
Center and Meteorology Office, the Max Planck Institute in Germany,
and the U.S. National Meteorological Center (which becomes the
National Centers for Environmental Prediction on 1 October)--are
acquiring 16-processor CRAY C-90 machines. These have roughly five
times the capability of NCAR's CRAY Y-MP 8/864.
SCD's immediate strategy for keeping up with the Joneses will be to
upgrade existing machines and to dedicate an entire supercomputer to
climate modeling. The Model Evaluation Consortium for Climate
Assessment (MECCA) began using a dedicated CRAY Y-MP 2 in 1991
and reduced the time needed to complete large simulations by as much
as sixfold. Now, with MECCA having run its course, SCD is upgrading
that Y-MP 2 to a Y-MP 8 this fall and devoting it entirely to coupled
climate model runs that will connect earth, ocean, ice, and atmosphere.
This will be the Climate Simulation Laboratory (CSL), NCAR's
computing contribution to the Climate Modeling, Analysis, and
Prediction Program (CMAP) of the Global Change Research Program.
"We expect that the CSL should be capable of completing several 100-
year simulations [with all four physical components linked] in a
calendar year," says Bill. "You can count on the fingers of one hand the
number of air-ocean-land coupled simulations that have been
performed out to 100 years."
What's on the wish list for SCD? Bill wouldn't complain if a next-
generation supercomputer were dropped (gently) into the operations
room sometime in the next year or so. Fully configured, such a
machine should offer at least 24 processors and 15 times the power of
the Y-MP8/T3D combo. A bonus is that supercomputers are relatively
easy to program. Massively parallel machines often require a major
investment in converting established models to a wholly different
environment.
There's always next year, and the next order-of-magnitude
improvement. In the meantime, SCD will have its hands full
managing several times its former load of computers with roughly the
same number of support staff. So far, so good, reports Bob, who gives
credit beyond SCD for the virtually flawless installations this summer.
"The Facilities Support People were on top of it all along. They did a
superb job." --BH
**********************************************
WHAT'S WHAT IN THE SCD OPERATIONS ROOM
CM-5
littlebear
This 32-node computer (named littlebear, in the tradition of naming
SCD machines after high peaks in the Colorado Rockies) arrived from
the Thinking Machines Corporation in April 1993. It became available
for users to conduct parallel experiments later in the year and has since
been linked to the CRAY-3 (see below). The CM-5 is being used for
turbulence modeling, ocean modeling, and climate simulations.
CRAY-3
graywolf
The sleek CRAY-3, boasting gallium arsenide circuits and ethereal
beauty, came to NCAR last year on loan from the Cray Computer
Corporation for testing and experimentation. It returned to Colorado
Springs this summer for refinement and is expected to begin a second
stay in SCD in the near future. To accommodate its return--and the
presence of more machines than ever in the newly crowded operations
room--SCD and FSS have been working to make the room self-
sufficient in its cooling capacity. A new cooling tower and chiller
installed in SCD 26Ð29 August make the room independent from the
rest of the Mesa Lab (and give FSS a chance to remodel the lab's main
cooling system without having to bring all of SCD's computers down).
CRAY T3D
NCAR's most extensive foray to date into massively parallel
computing began in July as SCD acquired a CRAY T3D system with 64
processing elements. The computer was attached to antero, SCD's
recently upgraded Cray (see below). (The T3D has no informal name
because it is networked via antero.) SCD staff have been working with
Cray Research to gain experience in T3D programming environments.
A class offered 19Ð23 September by Cray Research will introduce the
first wave of users at NCAR to T3D protocol.
CRAY Y-MP 2/216
antero
The Y-MP has undergone one metamorphosis already this year, with
another to go. In May the two-processor Cray was upgraded to a Y-MP
5; in October it will be upgraded to an eight-processor machine that will
be functionally the same as a Y-MP 8/864. Already, this Cray serves as
the front end for the massively parallel T3D machine, and their
combined capabilities will increase substantially after this fall's
upgrade. The system is being used for long-running coupled
simulations of climate.
EL Cluster
echo, monarch, st-elmo, alpine
In March came NCAR's first CRAY EL 92; on its heels in June and July
came three EL 98s. Each of these are entry-level supercomputers
compatible with bigger Crays yet standing less than five feet tall. An EL
processor is about one-fifth as fast as a processor on the CRAY Y-MP
8/864. To simplify maintenance and maximize usage, the four ELs will
be joined as a Cray cluster, with two-thirds of their time devoted to
climate simulations and the other third allocated to the user
community at large.
IBM RS/6000 Cluster
arapahoe, comanche, navaho
A set of four IBM RS/6000 model 550 workstations was first installed in
SCD in 1992, with a fifth model 550, chief, serving as the cluster's front-
end machine. Two members of the cluster (arapahoe and comanche)
were upgraded to model 590s in May and opened to the general user
community in August; the third cluster member, navaho, is being
upgraded to a 590 and dedicated to modeling work of the Climate and
Global Dynamics Division. The fourth of the model 550s has been
broken off for other tasks. With processor speed comparable to the ELs
but less hard-disk storage, these machines are primarily used for
relatively short jobs of less than five hours. A model 990, now called
wildhorse, will be added to the cluster in a few weeks.