Computing Grid Helps Get to the Heart of Matter

By Paula Musich |
Posted 05-20-2007

In November, when physicists at CERN in Switzerland begin their grand experiment using the world's largest particle acceleratorthe Large Hadron Collider, or LHCcomputer scientists there and across the globe will also put the world's largest scientific computing grid through its paces.

The success of the experimentintended to answer such questions as what other particles exist in the universe that we don't know aboutwill rely in large part on a worldwide, high-speed network that will allow scientists to harness the power of 100,000 computersmostly PCsto process the tons of data generated by the experiment.

The universe consists of particles of matter, and scientists currently know only a tiny fraction of them. Gaining greater insight into what else makes up the universe will give them a greater understanding of the universe itself.

The network's 10G bps backbone, linking to 11 scientific data centers, forms the core of the world's largest international scientific grid service, which was created to enable scientists to handle the huge amount of data that will come out of the experiment.

"The LHC is a 27-kilometer ring underground that accelerates protons to high energy and smashes them together in the ring to produce a fireworks of particles," said Francois Grey, director of IT communications at CERN, in Geneva. "Huge underground detectors will pick up the signals [from the collisions] using millions of channels that will read out every 25 nanoseconds. The rate at which [the data] will come out [of the four detectors in place] to be stored is in the hundreds of megabytes per second."

Along with lessons about what the universe comprises, the LHC Computing Grid project will teach network engineers valuable lessons about what it takes to run and manage one of the largest 10G-bps networks in the world.

"Everyone is looking to see who's installing a large backbone on that scale. We've become a reference for other people waiting to see what happens," Grey said. "We have no choice because we need that speed. We're also learning a lot about shipping data at high rates and how to optimize a grid between 10G bps and slower links."

About 200 institutions in 80 countriessome with their own large data centerswill participate in the grid to help process an expected 15 petabytes of data per year generated by the LHC.

"We realized early on there was no way we could store all that data and analyze it here at CERN," Grey said. "The idea was to pull those resources together in a grid."

The grid is organized in a three-tiered hierarchy, with CERN serving as the Tier 0 "fountainhead" from which data subsets will be dispersed to 11 Tier 1 data centers in Europe, North America and Asia, according to Grey.

Tier 2 data centers, located mostly at more than 250 universities around the globe, will serve as the locations where physicists analyze the data subsets they receive.

The LHC Computing Grid rides on dark fiber used in national and international research networks to interconnect each of the 11 Tier 1 sites at 10G bps for continuous paths to the different locations. Commercial links are used to connect participants in Canada, Taiwan and the United States.

In North America, Tier 1 sites include two in the United StatesFermi National Accelerator Lab, in Batavia, Ill., and Brookhaven National Laboratory, in Long Island, N.Y.as well as the Triumph Laboratory, in Vancouver, British Columbia.

Because of the nature of the computing task, PCs used in the grid don't have to communicate at very high speeds with one another so they are linked via grid middleware for "trivially parallel" processing, according to Grey.

Detectors read out images of the collisions, which are analyzed for particular patterns. "Each collision is independent from the next one, which is why trivially parallel processing works," Grey said.

At CERN, the PCs, CPU servers and disks are linked on a 1G-bps network provided by Hewlett-Packard ProCurve switches. CERN itself will contribute 10 percent of the total necessary processors for the job, including 3,500 PCs and the rest single- or dual-core processors all running a version of Linux called Scientific Linux CERN. CERN will contribute about 8,000 processors to the computing tasks.

The PCs used at CERN are commodity systems from a mix of smaller vendors, including Elonex, of Bromsgrove, England, and Scotland Electronics, of Moray, Scotland.

"We buy them cheap and stack them high," said David Foster, communications systems group leader in CERN's IT department. "The physics applications can run in parallel, but independently on separate boxes, so any PC which fails can be replaced and just that job restarted."

"Our typical workhorses" are dual-processor PCs in a one-rack unit "pizza box" form factor stacked in 19- inch racks, according to Helge Meinhard, technical coordinator for server procurements at CERN.

Although most of the roughly 8,000 PCs are single-socket machines that run single-core chips, about 750 are two-socket systems that use dual-core processors.

Administering all the PCs is a batch scheduler, which identifies available units and assigns a job to them.