ORNL researchers are devising ways to move large data files faster over computer networks and to reduce delays in data delivery so supercomputers are not idle.

Networking: Making Faster Connections Among Supercomputers

Some computational scientists’
high-performance computing jobs are getting done even while they are not
working, thanks in part to networking. Networks, particularly high-speed
networks, allow supercomputer nodes to “talk” to each other, send messages
to other nodes asking for data, and transmit large data files across the
country. In addition, networks allow computational scientists to keep
tabs on the progress of long-running jobs that can often run for days
at a time.

DOE’s Center for Computational
Sciences (CCS) at ORNL has supercomputers, as do the National Science
Foundation center in Pittsburgh, the National Center for Supercomputing
Applications facility at San Diego, and DOE’s National Energy Research
Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory
(LBNL) in California. But, according to Bill Wing of ORNL’s Computer Science
and Mathematics Division, the similarities stop there.

“Other supercomputer centers slice
up their resources on a fine scale and run hundreds of jobs for thousands
of users,” he says. “We are different. We focus our computational resources
on a few high-end users who need massive computing capacity for climate
prediction, human genome analysis, and materials science simulations.
Our customer base, which includes many users out-side ORNL, has different
needs, including different network needs, so we use a different model.”

At CCS the computational scientists
modeling future climate or exploding stars or searching for genes in DNA
sequences run jobs for days or weeks at a time and generate huge files
of calculated results that are transmitted between ORNL and NERSC’s data
archives. Sometimes climate modeling can produce a run of data amounting
to 1 trillion byes (1 terabyte). These data are sent between ORNL and
NERSC in chunks of 250 million bytes (megabytes).

ORNL
is a hub on DOE’s Energy Sciences network (ESnet), which connects
DOE’s national laboratories. (Illustration by Gail Sweeden)

"We focus on moving large
files of data," Wing says. "ORNL and LBNL are writing computer
programs to ensure that these data packages slide through the network
rather than clog it. In addition, we are developing the ability to allow
users to monitor the progress of these long-running jobsand steer
them if necessaryfrom a variety of portable access points, including
laptops and personal digital assistants like Palm Pilots or iPAQs."

Data are sent over the network
mostly using the transmission control protocol (TCP), a predefined protocol
that computers use to communicate over a network. LBNL and ORNL researchers
are devising ways to improve the ability to send large files so that supercomputers
are not idle because of delays in data delivery.

To reduce delays in data delivery,
Nageswara Rao of ORNL’s Computer Science and Mathematics Division has
developed a computer program called NetLet that is being tested on 12
free telnet and university sites serving as monitors and routers. “NetLet
allows computers to efficiently talk with each other, ‘predict’ the delay
in getting the message to the receiver, and suitably route the message,”
Rao says. “This algorithm enables the computers to measure connection
speeds and the delays of pathways and then identify the best combination
of pathways to get the information delivered efficiently in the time or
at the rate guaranteed.”

Demonstrations of NetLet have shown
that the algorithm has improved the speed of data delivery by about 40%
without any additional support from the Internet routers. “Some of our
data files used to take 10 seconds to get from our computer to a destination
computer,” Rao says. “Those same data files can now get there in 6 seconds.
That means that a huge data file that took 10 hours to arrive at a destination
computer can now get there in 6 hours.”

The data files transmitted
from ORNL's Eagle (IBM RS/6000 SP supercomputer) to the NERSC data archive
fly over DOE’s Energy Sciences Network (ESnet), a semiprivate part of
the Internet. Currently, DOE facilities such as ORNL and LBNL are using
the new ESnet (OC12), a high-speed link operated by Qwest that supports
data transmission at 622 megabits per second4 times faster than
the old ESnet.