ORNL is working with IBM to develop the Blue Gene supercomputer for relating protein shapes to disease.

ORNL,
IBM, and the Blue Gene Project

Advanced cellular architecture
in the next-generation supercomputer will help scientists better understand
the makeup and purpose of different genes and proteins in living cells.
Massive computing power and the intricacies of biological matter at the
molecular level will be colliding through a cooperative research and development
agreement (CRADA) announced August 22, 2001, by ORNL and IBM and funded
by IBM and the Department of Energy.

ORNL
is involved in a cooperative research and development agreement with
IBM to help develop the Blue Gene supercomputer that will improve
our understanding of how living cells work. This supercomputer will
use advanced cellular architectures to allow 1000 trillion calculations
per second (petaflops computing). ORNL researchers will write programs
to help the machine run effectively. (Ross Toedte)

At the heart of the agreement
is IBM’s Blue Gene research project, which combines advanced protein science
with IBM’s next-generation cellular architecture supercomputer design.
Unlike today’s computers, cellular servers will run on chips containing
“cells,” which are processors that contain memory and communications circuits.
Cellular architecture will help scale computer performance from a teraflop
(1 trillion calculations per second) to a petaflop (1000 trillion calculations
per second).

The new supercomputer will be a
petaflop machine. The fastest existing computer, ASCI White, unveiled
by IBM in early August 2001, can perform about 12 trillion calculations
per second, or 12 teraflops. That computer is being used for nuclear weapons
stockpile stewardship research at DOE’s Lawrence Livermore National Laboratory.
IBM, also known as Big Blue, began its five-year, $100 million Blue Gene
project at the end of 1999; its goal is to create a supercomputer that
can handle large-scale computing projects.

Supercomputing power of this
magnitude (1 petaflop) will improve scientists’ ability to predict future
climate, advance the field of nanotechnology, and gain a better understanding
of how gene sequences and the folding of proteins relate to diseases.

“Proteins control all processes
occurring in the cells of the body,” says Joe Jasinski, manager of the
Computational Biology Center for IBM Research. “These proteins are made
up of a vast array of different combinations of amino acids that fold
and bend into very complex, three-dimensional shapes that determine the
exact function of each protein. If the shape of a protein changes because
of some environmental, physical, or biological factor, the protein may
turn from being beneficial to one that causes a specific disease.”

The understanding of the protein-folding
phenomenon is a recognized “grand challenge problem” of great interest
to the life sciences. The scientific knowledge derived from research on
protein folding can potentially be applied to a variety of problems of
great scientific and commercial interest, including protein-drug interactions,
enzyme catalysis, and refinement of protein structures created through
other methods.

“Our collaboration with Oak
Ridge National Laboratory is vital to IBM’s work to extend the boundaries
for applications of large-scale computing, focusing on the combination
of IBM and ORNL’s deep scientific capabilities,” says David McQueeney,
vice president of Emerging Business for IBM Research. “Together we have
built a common roadmap for an ambitious, multi-year evolution of the simulation
and modeling of many complex systems. We are confident that we will break
new ground in several domains, including life sciences.”

A
Blue Gene IBM supercomputer with 100,000 processors could fail at
a rate of one processor every few seconds. ORNL researchers are establishing
a theoretical foundation for a whole new class of fault-tolerant algorithms
that are scalable beyond 100,000 processorsthat is, they allow
the supercomputer to proceed with calculations and work around the
processors that have failed. Each program is a cell in a larger job
where each cell interacts with a fixed number of other cells.

“The complexity of the protein-folding
problem, nanoscale science, and climate dynamics will require computational
resources at a scale not yet achieved by any scientific application,”
says Thomas Zacharia, ORNL’s associate laboratory director for Computing
and Computational Sciences. “This is an exciting next step in ORNL’s history
of evaluating new computational architectures and pushing the computational
science envelope.” Before it will be possible to solve problems in biology,
climate, and nanotech-nology, scientists must devise methods to run applications
that use tens of thousands of processors in the Blue Gene supercomputer.
Each processor forms a cell with memory, communication, and input/output
built in. This approach departs from past designs and offers a glimpse
of what’s to come in high-performance computing.

“The world of supercomputing is
rapidly changing,” says Ed Oliver, associate director in the Department
of Energy’s Office of Advanced Scientific Computing Research. “We need
to develop approaches to solving computational problems that are able
to scale to thousands of processors and at the same time be tolerant of
failures of some of these processors.”

Working with IBM, ORNL researchers
led by Al Geist of the Computer Sciences and Mathematics Division will
develop fault-tolerant algorithms to allow the Blue Gene supercomputer
to work around processors that fail, as well as other capabilities, to
ensure that the machine operates effectively. ORNL scientists led by Ying
Xu of the Life Sciences Division will collaborate with IBM on how the
supercomputer should be programmed to analyze proteins and predict their
structures.

IBM and ORNL hope to use this
enormous computing power to explore numerous other areas, as well. This
effort merely represents the beginning of what is expected to be a long
relationship.