Data
transfer demo sets speed mark

By
Kimberly Patch,
Technology Research NewsResearchers from Northwestern University
and the University of Illinois at Chicago have set a new speed record
for transmitting data: 2.8 gigabits, or billion bits, per second. The
researchers set the record while transmitting information between Amsterdam
and Chicago at the iGRID 2002 conference in Amsterdam on September 24.

The high-speed networking technology is a component of grid computing,
which allows people to tap computer power and resources like databases
from computers distributed around the world. The speed makes it possible
to work with very large amounts of remote data, much like the Web's Hypertext
Transfer Protocol makes it easy to access and interact with remote documents.

The record data rate is equivalent to transmitting 350 million characters,
or 700 books, per second. Each character requires eight bits, or one byte.
The standard protocol used to send information over the Internet is around
500 times slower, even over fast fiber-optic lines.

The researchers carried out the feat by combining three data protocols,
or communications layers. The combination forms a software architecture
the researchers dubbed Photonic Data Services.

Photonic Path Services set up, check the status of, and tear down the
photonic paths, or particular routes in an optical data network that the
data will take, said Robert Grossman, director of the Laboratory for Advanced
Computing and National Center for Data Mining at the University of Illinois
at Chicago."Applications can request specialized photonic paths as they
are needed," he said.

The network protocol moves bits over long-haul networks. It combines a
more efficient data protocol -- user datagram protocol (UDP) -- with the
standard and reliable Transmission Control Protocol (TCP) used by the
Internet, said Grossman.

The Data Space Transfer Protocol does for data what the Web's Hypertext
Transfer Protocol does for text. It makes it as easy to work with remote
databases as it is to work with remote documents. The protocol enables
users to mine and analyze data that's stored on computers distributed
throughout a network, said Grossman. "DSTP is compatible with Web services,
but also has specialized functionality to work with data -- it supports
keys, metadata and data, [and] can sample data, select rows and columns
of data, et cetera," he said. Database operations like queries have traditionally
been the most difficult for Grid applications to work out.

The three layers work together like this: the network layer provides the
speed to access large amounts of data quickly, the data transfer layer
gives users the ability to execute functions on that data, and the photonic
path layer gives them the flexibility to do this on a per-application
basis, said Grossman.

The combination of layers one and three -- Photonic Path Services and
Data Space Transfer Protocol -- allows users to work with remote data
sets as large as several gigabytes as if they were local, and even work
conveniently with remote terabyte-size data sets, said Grossman. A gigabyte
of data is one billion characters, which would fill 2,000 books; a terabyte
is one trillion characters, which would fill 200,000 books.

In prior work, the researchers combined the second and third layer and
showed they could transfer data as fast as 622 megabits, or million bits,
per second, but the conference marked the first time all three layers
were combined to produce higher data-transfer rates, said Grossman. "At
iGRID for the first time we showed that new implementations of [the network
protocol layer and data transfer layer] could scale to 2.8 gigabits" per
second and could be integrated with the Photonic Path Services layer,
he said.

The Internet's Transmission Control Protocol does not work at very high
speeds on long haul networks because the protocol requires acknowledgment
of each packet of information. When a file is transferred from one computer
to another over the Internet, it is broken up into many packets, which
can take different routes to go from one computer to another. The packets
are reassembled into the file at the end of their travels. Acknowledging
each packet makes the protocol reliable, but "limits the bandwidth to
be a function of... the time required to send a packet and receive an
acknowledgment," said Grossman.

Transmission Control Protocol can be made to work faster by using multiple
network connections, but still tops out at about 155 megabits per second,
said Grossman. This works out to about 31,000 bytes, or characters, or
about a third of a book.

The researchers' Sabul network layer uses the fast User Datagram Protocol
that does not require acknowledgment from the receiving computer in order
to send data at high rates. As data arrives, the receiving computer sends
information about packet loss using a separate channel and Transmission
Control Protocol, and the sender dynamically adjusts its sending rate
based on the packet loss numbers in order to minimize lost packets.

The system can achieve 900 megabits per second through a 1-gigabit-per-second
network interface card, said Grossman. Using clusters of computers connected
to a router through 1-gigabit-per-second links, the researchers were able
to send information at 2.8 gigabits per second over the long haul network
from Amsterdam to Chicago, he said.

The method could prove valuable initially for industries that need large
data flows, said Joe Mambretti, director of the International Center for
Advanced Internet Research and at Northwestern University. These include
bioinformatics, medical imaging, digital industrial design, financial
services and drug design, he said. The protocols could be applied practically
within six months, he said.

The 2.8 gigabit-per-second speed from Chicago to Amsterdam is impressive,
said Mario Gerla, a professor of computer science at the University of
California at Los Angeles. Although the basic ideas the researchers tapped
have been used before, the way they put the scheme together is new, he
said. "What is novel here is the bringing together of all these known
technologies and efficiently integrating them to support a meaningful,
important application."

Key to the method's success are the ideas of using dedicated network paths
and using fast User Datagram Protocol for data transport, then taking
care of error correction and loss recovery at the application level, said
Gerla. "The simplicity is key to its success," he said.

There is work to be done, however, to allow the scheme to operate on a
packet-switched network like the Internet rather than dedicated lines,
he said.

At the same time, there are parallel efforts to extend Transmission Control
Protocol to make it work for Grid computing, Gerla said. "There is currently
very active research in the area of TCP for gigabit channels over long
connections," he said.

The researchers are currently working on optimizing the method, and are
working on ways to implement the protocols in next-generation all-optical
networks, said Mambretti. "We are also developing new techniques for network
service provisioning," he said. These would allow network companies to
offer the services to their customers.

The research was funded by the National Science Foundation (NSF), the
Advanced Photonic Network Testbed (OMNInet), and Nortel.