Since he was a graduate student in Germany, Oliver Gutsche wanted to combine research in particle physics with computing for the large experiments that probe the building blocks of matter.

“When I started working on the physics data coming from one of the experiments at DESY, I was equally interested in everything that had to do with large-scale computing,” said Gutsche of his time at the German laboratory. Gutsche now works at DOE’s Fermi National Accelerator Laboratory. “So I also began working on the computing side of particle physics. For me that was always the combination I wanted to do.”

Gutsche’s desire to merge the two focuses has paid off. For the past four years Gutsche has been in charge of worldwide computing operations of the Large Hadron Collider’s CMS experiment, one of two experiments credited with the 2012 Higgs boson discovery. In December he was awarded the CMS Collaboration Award for his contributions to the global CMS computing system. And more recently, he has been promoted to assistant head of the Scientific Computing Division at Fermilab.

As head of CMS Computing Operations, Gutsche orchestrates data processing, simulations, data analysis and transfers and manages infrastructure and many more central tasks. Monte Carlo simulations of particle interactions, for example, are a key deliverable of the CMS Computing Operations group. Monte Carlo simulations employ randomness to simulate the collisions of the LHC and their products in a statistical way.

“You have to simulate the randomness of nature,” explained Gutsche. “We need Monte Carlo collisions to make sure we understand the data recorded by the CMS experiment and to compare them to the theory.”

When Gutsche received his Ph.D. from the University of Hamburg in 2005, he was looking for a job to combine LHC work, large-scale computing and a U.S. postdoc experience.

“Fermilab was an ideal place to do LHC physics research and LHC computing at the same time,” he said. His postdoc work led to his appointment as an application physicist at Fermilab and as the CMS Computing Operations lead.

Today Gutsche interacts regularly with people at universities and laboratories across the United States and at CERN, host laboratory of the LHC, often starting the day at 7 a.m. for transatlantic or transcontinental meetings.

“I try to talk physics and computing with everyone involved, even those in different time zones, from CERN to the west coast,” he said. Late afternoon in the United States is a good time for writing code. “That’s when everything quiets down and Europe is asleep.”

Gutsche expects to further enhance the cooperation between U.S. particle physicists and their international colleagues, mostly in Europe, by using the new premier U.S. Department of Energy’s Energy Sciences Network recently announced in anticipation of the LHC’s restart in spring 2015 at higher energy.

Helping connect the research done by particle physicists around the world, Gutsche finds excitement in all the work he does.

“Of course the Higgs boson discovery was very exciting,” Gutsche said. “But in CMS Computing Operations everything is exciting because we prepare the basis for hundreds of physics analyses so far and many more to come, not only for the major discoveries.”

ESnet to build high-speed extension for faster data exchange between United States and Europe. Image: ESnet

Scientists across the United States will soon have access to new, ultra-high-speed network links spanning the Atlantic Ocean thanks to a project currently under way to extend ESnet (the U.S. Department of Energy’s Energy Sciences Network) to Amsterdam, Geneva and London. Although the project is designed to benefit data-intensive science throughout the U.S. national laboratory complex, heaviest users of the new links will be particle physicists conducting research at the Large Hadron Collider (LHC), the world’s largest and most powerful particle collider. The high capacity of this new connection will provide U.S. scientists with enhanced access to data at the LHC and other European-based experiments by accelerating the exchange of data sets between institutions in the United States and computing facilities in Europe.

DOE’s Brookhaven National Laboratory and Fermi National Accelerator Laboratory—the primary computing centers for U.S. collaborators on the LHC’s ATLAS and CMS experiments, respectively—will make immediate use of the new network infrastructure once it is rigorously tested and commissioned. Because ESnet, based at DOE’s Lawrence Berkeley National Laboratory, interconnects all national laboratories and a number of university-based projects in the United States, tens of thousands of researchers from all disciplines will benefit as well.

The ESnet extension will be in place before the LHC at CERN in Switzerland—currently shut down for maintenance and upgrades—is up and running again in the spring of 2015. Because the accelerator will be colliding protons at much higher energy, the data output from the detectors will expand considerably—to approximately 40 petabytes of raw data per year compared with 20 petabytes for all of the previous lower-energy collisions produced over the three years of the LHC first run between 2010 and 2012.

The cross-Atlantic connectivity during the first successful run for the LHC experiments, which culminated in the discovery of the Higgs boson, was provided by the US LHCNet network, managed by the California Institute of Technology. In recent years, major research and education networks around the world—including ESnet, Internet2, California’s CENIC, and European networks such as DANTE, SURFnet and NORDUnet—have increased their backbone capacity by a factor of 10, using sophisticated new optical networking and digital signal processing technologies. Until recently, however, higher-speed links were not deployed for production purposes across the Atlantic Ocean—creating a network “impedance mismatch” that can harm large, intercontinental data flows.

An evolving data model
This upgrade coincides with a shift in the data model for LHC science. Previously, data moved in a more predictable and hierarchical pattern strongly influenced by geographical proximity, but network upgrades around the world have now made it possible for data to be fetched and exchanged more flexibly and dynamically. This change enables faster science outcomes and more efficient use of storage and computational power, but it requires networks around the world to perform flawlessly together.

“Having the new infrastructure in place will meet the increased need for dealing with LHC data and provide more agile access to that data in a much more dynamic fashion than LHC collaborators have had in the past,” said physicist Michael Ernst of DOE’s Brookhaven National Laboratory, a key member of the team laying out the new and more flexible framework for exchanging data between the Worldwide LHC Computing Grid centers.

Ernst directs a computing facility at Brookhaven Lab that was originally set up as a central hub for U.S. collaborators on the LHC’s ATLAS experiment. A similar facility at Fermi National Accelerator Laboratory has played this role for the LHC’s U.S. collaborators on the CMS experiment. These computing resources, dubbed Tier 1 centers, have direct links to the LHC at the European laboratory CERN (Tier 0). The experts who run them will continue to serve scientists under the new structure. But instead of serving as hubs for data storage and distribution only among U.S.-based collaborators at Tier 2 and 3 research centers, the dedicated facilities at Brookhaven and Fermilab will be able to serve data needs of the entire ATLAS and CMS collaborations throughout the world. And likewise, U.S. Tier 2 and Tier 3 research centers will have higher-speed access to Tier 1 and Tier 2 centers in Europe.

“This new infrastructure will offer LHC researchers at laboratories and universities around the world faster access to important data,” said Fermilab’s Lothar Bauerdick, head of software and computing for the U.S. CMS group. “As the LHC experiments continue to produce exciting results, this important upgrade will let collaborators see and analyze those results better than ever before.”

Ernst added, “As centralized hubs for handling LHC data, our reliability, performance and expertise have been in demand by the whole collaboration, and now we will be better able to serve the scientists’ needs.”

An investment in science
ESnet is funded by DOE’s Office of Science to meet networking needs of DOE labs and science projects. The transatlantic extension represents a financial collaboration, with partial support coming from DOE’s Office of High Energy Physics (HEP) for the next three years. Although LHC scientists will get a dedicated portion of the new network once it is in place, all science programs that make use of ESnet will now have access to faster network links for their data transfers.

“We are eagerly awaiting the start of commissioning for the new infrastructure,” said Oliver Gutsche, Fermilab scientist and member of the CMS Offline and Computing Management Board. “After the Higgs discovery, the next big LHC milestones will come in 2015, and this network will be indispensable for the success of the LHC Run 2 physics program.”

This work was supported by the DOE Office of Science.Fermilab is America’s premier national laboratory for particle physics and accelerator research. A U.S. Department of Energy Office of Science laboratory, Fermilab is located near Chicago, Illinois, and operated under contract by the Fermi Research Alliance, LLC. Visit Fermilab’s website at www.fnal.gov and follow us on Twitter at @FermilabToday.

Brookhaven National Laboratory is supported by the Office of Science of the U.S. Department of Energy. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

One of ten national laboratories overseen and primarily funded by the Office of Science of the U.S. Department of Energy (DOE), Brookhaven National Laboratory conducts research in the physical, biomedical, and environmental sciences, as well as in energy technologies and national security. Brookhaven Lab also builds and operates major scientific facilities available to university, industry and government researchers. Brookhaven is operated and managed for DOE’s Office of Science by Brookhaven Science Associates, a limited-liability company founded by the Research Foundation for the State University of New York on behalf of Stony Brook University, the largest academic user of Laboratory facilities, and Battelle, a nonprofit applied science and technology organization.

The DOE Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

There’s a software tool I use almost every day, for almost any work situation. It’s good for designing event selections, for brainstorming about systematic errors, and for mesmerizing kids at outreach events. It’s good anytime you want to build intuition about the detector. It’s our event viewer. In this post, I explain a bit about how I use our event viewer, and also share the perspective of code architect Steve Jackson, who put the code together.

The IceCube detector is buried in the glacier under the South Pole. The signals can only be read out electronically; there’s no way to reach the detector modules after the ice freezes around them. In designing the detector, we carefully considered what readout we would need to describe what happens in the ice, and now we’re at the stage of interpreting that data. A signal from one detector module might tell us the time, amplitude, and duration of light arriving at that detector, and we put those together into a picture of the detector. From five thousand points of light (or darkness), we have to answer: where did this particle come from? Does the random detector noise act the way we think it acts? Is the disruption from dust in the ice the same in all directions? All these questions are answerable, but the answers take some teasing out.

To help build our intuition, we use event viewer software to make animated views of interesting events. It’s one of our most useful tools as physicist-programmers. Like all bits of our software, it’s written within the collaboration, based on lots of open-source software, and unique to our experiment. It’s called “steamshovel,” a joke on the idea that you use it to dig through ice (actually, dig through IceCube data – but that’s the joke).

Meet Steve Jackson and Steamshovel

Steve Jackson’s job on IceCube was originally maintaining the central software, a very broad job description. His background is in software including visualizations, and he’s worked as The Software Guy in several different physics contexts, including medical, nuclear, and astrophysics. After becoming acquainted with IceCube software needs, he narrowed his focus to building an upgraded version of the event viewer from scratch.

The idea of the new viewer, Steamshovel, was to write a general core in the programming language C++, and then higher-level functionality in Python. This splits up the problem of drawing physics in the detector into two smaller problems: how to translate physics into easily describable shapes, like spheres and lines, and how to draw those spheres and lines in the most useful way. Separating these two levels makes the code easier to maintain, easier to update the core, and easier for other people to add new physics ideas, but it doesn’t make it easier to write in the first place. (I’ll add: that’s why we hire a professional!) Steve says the process took about as long as he could have expected, considering Hofstadter’s Law, and he’s happy with the final product.

A Layer of Indirection

As Steve told me, “Every problem in computer science can be addressed by adding a layer of indirection: some sort of intermediate layer where you abstract the relevant concepts into a higher level.” The extra level here is the set of lines and spheres that get passed from the Python code to the C++ code. By separating the defining from the drawing, this intermediate level makes it simpler to define new kinds of objects to draw.

A solid backbone, written with OpenGL in C++, empowers the average grad student to write software visualization “artists” as python classes. These artists can connect novel physics ideas, written in Python, to the C++ backbone, without the grad student having to get into the details of OpenGL, or, hopefully, any C++.

Here’s a test of that simplicity: as part of our week-long, whirlwind introduction to IceCube software, we taught new students how to write a new Steamshovel artist. With just a week of software training, they were able to produce them, a testament to the usability of the Steamshovel backbone.

This separation also lets the backbone include important design details that might not occur to the average grad student, but make the final product more elegant. One such detail is that the user can specify zoom levels much more easily, so graphics are not limited to the size of your computer screen. Making high-resolution graphics suitable for publication is possible and easy. Using these new views, we’ve made magazine covers, t-shirts, even temporary tatoos.

Many Platforms, Many People

IceCube has an interesting situation that we support (and have users) running our software on many different UNIX operating systems: Mac, Ubuntu, Red Hat, Fedora, Scientific Linux, even FreeBSD. But we don’t test our software on Windows, which is the standard for many complex visualization packages: yet another good reason to use the simpler OpenGL. “For cross-platform 3D graphics,” Steve says, “OpenGL is the low-level drawing API.”

As visualization software goes, the IceCube case is relatively simple. You can describe all the interesting things with lines and spheres, like dots for detector modules, lines and cylinders for the cables connecting them or for particle tracks, and spheres of configurable color and size for hits within the detector. There’s relatively little motion beyond appearing, disappearing, and changing sizes. The light source never moves. I would add that this is nothing – nothing! – like Pixar. These simplifications mean that the more complex software packages that Steve had the option to use were unnecessarily complex, full of options that he would never use, and the simple, open-source openGL was perfectly sufficient.

The process of writing Steamshovel wasn’t just one-man job (even though I only talked to one person for this post). Steve solicited, and received, ideas for features from all over the collaboration. I personally remember that when he started working here, he took the diligent and kind step of sitting and talking to several of us while we used the old event viewer, just to see what the workflow was like, the good parts and the bad. One particularly collaborative sub-project started when one IceCube grad student, Jakob, had the clever idea of displaying Monte Carlo true Cherenkov cones. We know where the simulated light emissions are, and how the light travels through the ice – could we display the light cone arriving at the detector modules and see whether a particular hit occurred at the same time? Putting together the code to make this happen involved several people (mainly Jakob and Steve), and wouldn’t have been possible coding in isolation.

Visual Cortex Processing

The moment that best captured the purpose of a good event viewer, Steve says, was when he animated an event for the first time. Specifically, he made the observed phototube pulses disappear as the charge died away, letting him see what happens on a phototube after the first signal. Animating the signal pulses made the afterpulsing “blindingly obvious.”

We know, on an intellectual level, that phototubes display afterpulsing, and it’s especially strong and likely after a strong signal pulse. But there’s a difference between knowing, intellectually, that a certain fraction of pulses will produce afterpulses and seeing those afterpulses displayed. We process information very differently if we can see it directly than if we have to construct a model in our heads based on interpreting numbers, or even graphs. An animation connects more deeply to our intuition and natural instinctive processes.

As Steve put it: “It brings to sharp relief something you only knew about in sort of a complex, long thought out way. The cool thing about visualization is that you can get things onto a screen that your brain will notice pre-cognitively; you don’t even have to consciously think to distinguish between a red square and a blue square. So even if you know that two things are different, from having looked carefully through the math, if you see those things in a picture, the difference jumps out without you even having to think about it. Your visual cortex does the work for you. […] That was one of the coolest moments for me, when these people who understood the physics in a deep way nonetheless were able to get new insights on it just by seeing the data displayed in a new way. “

Millions around the world, both scientists and non-scientists, use Scientific Linux, an operating system developed for particle physics. Photo: Reidar Hahn

When a handful of developers at Fermilab modified a computer operating system for use in particle physics, they had no idea their creation would eventually be used by millions inside and outside of science.

Today’s version of the system, called Scientific Linux, runs on computers around the world: at top universities, national laboratories and even in low Earth orbit on the International Space Station. An alternative to Windows or Mac, it has attracted the attention of people from a variety of fields. For example, at the University of Wisconsin at Madison, where the majority of the campus grid is running Scientific Linux, students in fields as diverse as statistics, chemical engineering, economics and avian research use the operating system.

Lauren Michael, a research computing facilitator at UW-Madison’s Center for High Throughput Computing, calls Scientific Linux a powerful tool “enabling researchers from all disciplines.”

When Fermilab Lead Scientific Linux Developer Connie Sieh started the development of the first iteration of the system in 1997, though, she was just looking for cheaper hardware.

In the early 1990s, Fermilab scientists used proprietary operating systems from companies like IBM and SGI, Sieh says. But in 1997, as personal computers became more commonplace, Linux and other free operating systems did, too—for everyday people and, especially, scientists.

So when a computing-heavy project came up at Fermilab, Sieh opted to replace the more expensive IBM and SGI hardware and the software that came with those machines. The new software she decided on was a version of Linux distributed by software company RedHat Inc., mostly because it was free and had the option to be installed in batches, which would save a ton of time. At the same time, RedHat’s Linux was simple enough for scientists to install at their desktops on their own. The computing project, running on Linux, was successful, so the laboratory kept using it.

In 1998, Fermilab released a product called FermiLinux, tailored to fit the lab’s needs.

It was possible to modify the operating system only because, in addition to being free, RedHat’s Linux comes with its source code fully included. This would be a little like a car company supplying detailed blueprints of its cars to every customer and its competitors. Open-source software allows customers to customize a product to meet their exact specifications.

“They go above and beyond what they have to do, as far as releasing the source code,” Sieh says.

Fermilab continued to use FermiLinux until 2003, when RedHat announced that it would start charging money for its product. It took only about a week for Fermilab to use the source code from RedHat’s no-longer-free product to get its own, freely accessible version up and running—what would become Scientific Linux.

By early 2004, a collaboration of developers from Fermilab, CERN and a few other labs released Scientific Linux for the entire high-energy physics community to use. That operating system is the same one that millions of scientists and non-scientists use, free of charge, to this day.

Whenever RedHat releases an update, about once every six months, Fermilab purchases it, and the lab’s tiny team of developers—currently, just Fermilab’s Sieh, Pat Riehecky and Bonnie King—work in overdrive to get their version out soon after, adding tools and customizations they think will be useful.

Aside from big users like the national labs, Sieh says, about 140,000 others run Scientific Linux. And, of course, the program is still widely used in the field it was first meant to serve. Its global presence ensures some consistency and unity across many large institutions.

Alec Habig, a physics professor at the University of Minnesota, Duluth, says when his students visit other institutions to do research, “they know what they’re doing already,” having become familiar with the operating system at the university.

“It’s a good tool for the job,” he says. “It helps our students get a leg-up on the research.”

The art of data mining is about searching for the extraordinary within a vast ocean of regularity. This can be a painful process in any field, but especially in particle physics, where the amount of data can be enormous, and ‘extraordinary’ means a new understanding about the fundamental underpinnings of our universe. Now, a tool first conceived in 2005 to manage data from the world’s largest particle accelerator may soon push the boundaries of other disciplines. When repurposed, it could bring the immense power of data mining to a variety of fields, effectively cracking open the possibility for more discoveries to be pulled up from ever-increasing mountains of scientific data.

Advanced data management tools offer scientists a way to cut through the noise by analyzing information across a vast network. The result is a searchable pool that software can sift through and use for a specific purpose. One such hunt was for the Higgs boson, the last remaining elementary particle of the Standard Model that, in theory, endows other particles with mass.

With the help of a system called PanDA, or Production and Distributed Analysis, researchers at CERN’s Large Hadron Collider (LHC) in Geneva, Switzerland discovered such a particle by slamming protons together at relativistic speeds hundreds of millions of times per second. The data produced from those trillions of collisions—roughly 13 million gigabytes worth of raw information—was processed by the PanDA system across a worldwide network and made available to thousands of scientists around the globe. From there, they were able to pinpoint an unknown boson containing a mass between 125–127 GeV, a characteristic consistent with the long-sought Higgs.

The sheer amount of data arises from the fact that each particle collision carries unique signatures that compete for attention with the millions of other collisions happening nanoseconds later. These must be recorded, processed, and analyzed as distinct events in a steady stream of information. (more…)

This is the first part of a two-part series on the contribution Tevatron-related computing has made to the world of computing. This part begins in 1981, when the Tevatron was under construction, and brings us up to recent times. The second part will focus on the most recent years, and look ahead to future analysis.

Few laypeople think of computing innovation in connection with the Tevatron particle accelerator, which shut down earlier this year. Mention of the Tevatron inspires images of majestic machinery, or thoughts of immense energies and groundbreaking physics research, not circuit boards, hardware, networks, and software.

Yet over the course of more than three decades of planning and operation, a tremendous amount of computing innovation was necessary to keep the data flowing and physics results coming. In fact, computing continues to do its work. Although the proton and antiproton beams no longer brighten the Tevatron’s tunnel, physicists expect to be using computing to continue analyzing a vast quantity of collected data for several years to come.

When all that data is analyzed, when all the physics results are published, the Tevatron will leave behind an enduring legacy. Not just a physics legacy, but also a computing legacy.

In the beginning: The fixed-target experiments

This image of an ACP system was taken in 1988. Photo by Reidar Hahn.

1981. The first Indiana Jones movie is released. Ronald Reagan is the U.S. President. Prince Charles makes Diana a Princess. And the first personal computers are introduced by IBM, setting the stage for a burst of computing innovation.

This image of an ACP system was taken in 1988. Photo by Reidar Hahn.Meanwhile, at the Fermi National Accelerator Laboratory in Batavia, Illinois, the Tevatron has been under development for two years. And in 1982, the Advanced Computer Program formed to confront key particle physics computing problems. ACP tried something new in high performance computing: building custom systems using commercial components, which were rapidly dropping in price thanks to the introduction of personal computers. For a fraction of the cost, the resulting 100-node system doubled the processing power of Fermilab’s contemporary mainframe-style supercomputers.

“The use of farms of parallel computers based upon commercially available processors is largely an invention of the ACP,” said Mark Fischler, a Fermilab researcher who was part of the ACP. “This is an innovation which laid the philosophical foundation for the rise of high throughput computing, which is an industry standard in our field.”

The Tevatron fixed-target program, in which protons were accelerated to record-setting speeds before striking a stationary target, launched in 1983 with five separate experiments. When ACP’s system went online in 1986, the experiments were able to rapidly work through an accumulated three years of data in a fraction of that time.

Entering the collider era: Protons and antiprotons and run one

1985. NSFNET (National Science Foundation Network), one of the precursors to the modern Internet, is launched. And the Tevatron’s CDF detector sees its first proton-antiproton collisions, although the Tevatron’s official collider run one won’t begin until 1992.

The experiment’s central computing architecture filtered incoming data by running Fortran-77 algorithms on ACP’s 32-bit processors. But for run one, they needed more powerful computing systems.

By that time, commercial workstation prices had dropped so low that networking them together was simply more cost-effective than a new ACP system. ACP had one more major contribution to make, however: the Cooperative Processes Software.

CPS divided a computational task into a set of processes and distributed them across a processor farm – a collection of networked workstations. Although the term “high throughput computing” was not coined until 1996, CPS fits the HTC mold. As with modern HTC, farms using CPS are not supercomputer replacements. They are designed to be cost-effective platforms for solving specific compute-intensive problems in which each byte of data read requires 500-2000 machine instructions.

CPS went into production-level use at Fermilab in 1989; by 1992 it was being used by nine Fermilab experiments as well as a number of other groups worldwide.

1992 was also the year that the Tevatron’s second detector experiment, DZero, saw its first collisions. DZero launched with 50 traditional compute nodes running in parallel, connected to the detector electronics; the nodes executed filtering software written in Fortran, E-Pascal, and C.

Gearing up for run two

"The Great Wall" of 8mm tape drives at the Tagged Photon Laboratory, circa 1990 - from the days before tape robots. Photo by Reidar Hahn.

1990. CERN’s Tim Berners-Lee launches the first publicly accessible World Wide Web server using his URL and HTML standards. One year later, Linus Torvalds releases Linux to several Usenet newsgroups. And both DZero and CDF begin planning for the Tevatron’s collider run two.

Between the end of collider run one in 1996 and the beginning of run two in 2001, the accelerator and detectors were scheduled for substantial upgrades. Physicists anticipated more particle collisions at higher energies, and multiple interactions that were difficult to analyze and untangle. That translated into managing and storing 20 times the data from run one, and a growing need for computing resources for data analysis.

Enter the Run Two Computing Project (R2CP), in which representatives from both experiments collaborated with Fermilab’s Computing Division to find common solutions in areas ranging from visualization and physics analysis software to data access and storage management.

R2CP officially launched in 1996. It was the early days of the dot com era. eBay had existed for a year, and Google was still under development. IBM’s Deep Blue defeated chess master Garry Kasparov. And Linux was well-established as a reliable open-source operating system. The stage is set for experiments to get wired and start transferring their irreplaceable data to storage via Ethernet.

The high-tech tape robot used today. Photo by Reidar Hahn.

“It was a big leap of faith that it could be done over the network rather than putting tapes in a car and driving them from one location to another on the site,” said Stephen Wolbers, head of the scientific computing facilities in Fermilab’s computing sector. He added ruefully, “It seems obvious now.”

The R2CP’s philosophy was to use commercial technologies wherever possible. In the realm of data storage and management, however, none of the existing commercial software met their needs. To fill the gap, teams within the R2CP created Enstore and the Sequential Access Model (SAM, which later stood for Sequential Access through Meta-data). Enstore interfaces with the data tapes stored in automated tape robots, while SAM provides distributed data access and flexible dataset history and management.

By the time the Tevatron’s run two began in 2001, DZero was using both Enstore and SAM, and by 2003, CDF was also up and running on both systems.

Linux comes into play

The R2CP’s PC Farm Project targeted the issue of computing power for data analysis. Between 1997 and 1998, the project team successfully ported CPS and CDF’s analysis software to Linux. To take the next step and deploy the system more widely for CDF, however, they needed their own version of Red Hat Enterprise Linux. Fermi Linux was born, offering improved security and a customized installer; CDF migrated to the PC Farm model in 1998.

The early computer farms at Fermilab, when they ran a version of Red Hat Linux (circa 1999). Photo by Reidar Hahn.

Fermi Linux enjoyed limited adoption outside of Fermilab, until 2003, when Red Hat Enterprise Linux ceased to be free. The Fermi Linux team rebuilt Red Hat Enterprise Linux into the prototype of Scientific Linux, and formed partnerships with colleagues at CERN in Geneva, Switzerland, as well as a number of other institutions; Scientific Linux was designed for site customizations, so that in supporting it they also supported Scientific Linux Fermi and Scientific Linux CERN.

Today, Scientific Linux is ranked 16th among open source operating systems; the latest version was downloaded over 3.5 million times in the first month following its release. It is used at government laboratories, universities, and even corporations all over the world.

“When we started Scientific Linux, we didn’t anticipate such widespread success,” said Connie Sieh, a Fermilab researcher and one of the leads on the Scientific Linux project. “We’re proud, though, that our work allows researchers across so many fields of study to keep on doing their science.”

Grid computing takes over

As both CDF and DZero datasets grew, so did the need for computing power. Dedicated computing farms reconstructed data, and users analyzed it using separate computing systems.

“As we moved into run two, people realized that we just couldn’t scale the system up to larger sizes,” Wolbers said. “We realized that there was really an opportunity here to use the same computer farms that we were using for reconstructing data, for user analysis.”

A wide-angle view of the modern Grid Computing Center at Fermilab. Today, the GCC provides computing to the Tevatron experiments as well as the Open Science Grid and the Worldwide Large Hadron Collider Computing Grid. Photo by Reidar Hahn.

Today, the concept of opportunistic computing is closely linked to grid computing. But in 1996 the term “grid computing” had yet to be coined. The Condor Project had been developing tools for opportunistic computing since 1988. In 1998, the first Globus Toolkit was released. Experimental grid infrastructures were popping up everywhere, and in 2003, Fermilab researchers, led by DZero, partnered with the US Particle Physics Data Grid, the UK’s GridPP, CDF, the Condor team, the Globus team, and others to create the Job and Information Management system, JIM. Combining JIM with SAM resulted in a grid-enabled version of SAM: SAMgrid.

“A pioneering idea of SAMGrid was to use the Condor Match-Making service as a decision making broker for routing of jobs, a concept that was later adopted by other grids,” said Fermilab-based DZero scientist Adam Lyon. “This is an example of the DZero experiment contributing to the development of the core Grid technologies.”

By April 2003, the SAMGrid prototype was running on six clusters across two continents, setting the stage for the transition to the Open Science Grid in 2006.

From the Tevatron to the LHC – and beyond

Throughout run two, researchers continued to improve the computing infrastructure for both experiments. A number of computing innovations emerged before the run ended in September 2011. Among these was CDF’s GlideCAF, a system that used the Condor glide-in system and Generic Connection Brokering to provide an avenue through which CDF could submit jobs to the Open Science Grid. GlideCAF served as the starting point for the subsequent development of a more generic glidein Work Management System. Today glideinWMS is used by a wide variety of research projects across diverse research disciplines.

Another notable contribution was the Frontier system, which was originally designed by CDF to distribute data from central databases to numerous clients around the world. Frontier is optimized for applications where there are large numbers of widely distributed clients that read the same data at about the same time. Today, Frontier is used by CMS and ATLAS at the LHC.

“By the time the Tevatron shut down, DZero was processing collision events in near real-time and CDF was not far behind,” said Patricia McBride, the head of scientific programs in Fermilab’s computing sector. “We’ve come a long way; a few decades ago the fixed-target experiments would wait months before they could conduct the most basic data analysis.”

One of the key outcomes of computing at the Tevatron was the expertise developed at Fermilab over the years. Today, the Fermilab computing sector has become a worldwide leader in scientific computing for particle physics, astrophysics, and other related fields. Some of the field’s top experts worked on computing for the Tevatron. Some of those experts have moved on to work elsewhere, while others remain at Fermilab where work continues on Tevatron data analysis, a variety of Fermilab experiments, and of course the LHC.

The accomplishments of the many contributors to Tevatron-related computing are noteworthy. But there is a larger picture here.

“Whether in the form of concepts, or software, over the years the Tevatron has exerted an undeniable influence on the field of scientific computing,” said Ruth Pordes, Fermilab’s head of grids and outreach. “We’re very proud of the computing legacy we’ve left behind for the broader world of science.”

On May 26, 2005, a new supercomputer, a pioneering giant of its time, was unveiled at Brookhaven National Laboratory at a dedication ceremony attended by physicists from around the world. That supercomputer was called QCDOC, for quantum chromodynamics (QCD) on a chip, capable of handling the complex calculations of QCD, the theory that describes the nature and interactions of the basic building blocks of the universe. Now, after a career of state-of-the-art physics calculations, QCDOC has been retired — and will soon be replaced by a new “next generation” machine. (more…)

Here I am at CERN, for the first time in more than three months. When I was here this summer, I stayed for five weeks and had my family along with me. Now I’m just here for a short stay and rooming in the hostel again. But in some ways, it feels like I never left. (Except for the jet lag, of course.) The exciting times continue on the LHC experiments. We are under two weeks from the end of this year’s proton run, and we are eager to gather every last bit of data we can before the heavy-ion run and then a technical stop that won’t end until sometime in March. The dataset that we will end with will be more than twice as big as that which we analyzed for results that went to conferences this summer, so it will be very interesting to see what emerges with the additional data.

You might not have heard, but since the last time I posted, Apple co-founder and CEO Steve Jobs died. Obviously Jobs had a huge impact on how we live in our technological world. In the days after his death, I read articles discussing his influence on computing, design, music, publishing, politics, and so forth. Eager to jump onto the bandwagon, I decided to take a pilgrimage to the CERN visitor center at the Globe, located across the street from the Meyrin site. There, you can find this computer in a display area:

A NeXT computer, from 1990.

The ratty sticker on the front implores passers-by not to shut down the computer. The computer is a NeXT, a product of the company that Jobs founded after he was forced out of Apple in the 1980’s. This happens to be the computer that belonged to Tim Berners-Lee, the first developer of what we now know as the World Wide Web, and it hosted the first Web server. (Do not shut down, indeed! Someone on the other side of the world might be using that computer.)

It’s true, we trot this one out a lot in particle physics, but the Web was invented by particle physicists to be used as an information and document sharing system, and it ended up changing the world. Particle physics has driven many developments in computer science over the years, as we’ve long had large datasets and computationally-intensive problems. These days, I feel like I see a lot of back and forth between particle physics and the computing world. Because of the scale of the data volume that we serve and the number of users who want to access it, and because we’re trying to do it on the relatively cheap, we’ve moved to a model of distributed computing that is realized in the Worldwide LHC Computing Grid. Grid computing, which allows straightforward access to computing resources owned by others that aren’t being used at the moment, has been adopted across sciences that do large-scale computing, and cloud computing is an offshoot of this development.

At the same time, we are definitely making use of computing technologies that have been developed in the commercial world. My favorite example of this is Hadoop. It’s a very powerful set of tools, and many US LHC computing sites are using its disk-management system, which is also used by Web sites like Facebook. It has good scaling properties and is easy to maintain, making life easier for site operators. We’re always on the lookout for new ideas that we can bring in from the computing world that will make it easier for physicists to make the most out of the LHC data.

Thanks to all of these tools, someone — perhaps very soon — will be making a plot that could show evidence for new physical phenomena. It wouldn’t be possible without the computing systems that I just described. Will this plot be viewed for the first time on the screen of an Apple product? Will that very screen end up in a display at the Globe? We’ll see.

It’s Labor Day weekend here in the US, but over at CERN it’s the end of the August technical stop for the LHC. To rework a common saying, this is the first day of the rest of the 2011 run. We have two months left of proton-proton collisions, followed by one month of lead-lead collisions, and then in December we’ll have the holiday “extended technical stop” that will probably extend to the spring.

We’re expecting an important change in running conditions once we return from the technical stop, and that is a change in how the beams are focused. This will lead to an increased rate of collisions. Remember that the proton beams are “bunched”; the beam is not a continuous stream of particles but bunches with a large separation between them. The change in the focusing will help make the bunches more compact, and that in turn will mean that there will be more proton collisions every time a pair of bunches pass through each other. When our detectors record data, they record an entire bunch crossing as a single event. Thus, each individual event will be busier, with more collisions and more particles produced.

This is good news from a physics perspective — the more collisions happen, the greater the chance that there will be something interesting coming out. But it’s a challenge from an operational perspective. We try to record as many “interesting” events as possible, but we’re ultimately limited by how quickly we can read out the detector and how much space we have to store the data. Given that we’re going to have more data coming into fixed resources, we’re going to have to limit our definition of “interesting” a little further. The busier events are also a greater strain on the software and computing for the experiments (which I focus on). Each event takes more CPU time to process and requires more RAM. Previous experience and simulations give us some guidance as to how all of this will scale up from what we’ve seen so far, but we can’t know for sure without actually doing it. (The original plan for the machine development studies period before the technical stop was supposed to include a small-scale test of this, so that we could put the computing and everything else through its paces. But that got cancelled. I had originally planned to blog about that. Oh well.)

However, all of this will be worth the trouble. Remember all of the excitement of the EPS conference? That was at the end of July, just a little more than a month ago. There is now about twice as much data that can be analyzed. With the increases in collision rate, we might well be able to double the dataset once again just in these next two months. Or, we might do even better. This will have a critical impact on our searches for new phenomena, and could allow the LHC experiments to discover or rule out the standard-model Higgs boson by the end of this year. Coming soon, to a theater near you.

The field of high-energy physics has always considered itself a family. To address some of the largest questions, such as how were we and the universe formed, it takes building-sized machines, enormous computing power and more resources than one nation can muster. This necessary collaboration has forged strong bonds among physicists and engineers across the globe.

So naturally when March 11 a tsunami and series of earthquakes struck Japan, home to one of the world’s largest high-energy physics laboratories and an accelerator research center, physicists in the U.S. started asking how they could help. It turns out that they have a unique resource to offer: computer power.

Lattice Quantum Chromodynamics (QCD)is a computational technique used to study the interactions of quarks and gluons and requires vast computing power. To help the Japanese continue this analysis, Fermilab and other U.S. labs will share their Lattice QCD computing resources.

“We’re very happy that the shared use of our resources can allow our Japanese colleagues to continue their research during a time of crisis,” said Fermilab theoretical physicist Paul Mackenzie, spokesperson for the USQCD collaboration.

From now until the end of 2011, while computing facilities in eastern Japan face continuing electricity shortages, a percentage of the computing power at Brookhaven National Laboratory on Long Island, Fermi National Accelerator Laboratory near Chicago and Thomas Jefferson National Accelerator Facility in Virginia will be made available to the Japanese Lattice Quantum Chromodynamics (QCD) community.

“We appreciate the support from the U.S. QCD community,” said University of Tsukuba Vice President Akira Ukawa, spokesperson of the Japanese Lattice QCD community. “The sharing of resources will not only be instrumental to continue research in Japan through the current crisis, but will also mark a significant step in strengthening the international collaboration for progress in our field.”