What to Do With 1,000,000,000,000,000,000 Bytes of Astronomical Data per Day

Over the next 12 years, thousands of antennas will be built and installed across a 5,000-kilometer stretch of the southern hemisphere. Satellite dishes, tripod-like dipole antennas, and tiled circular stations will dot arid savannas and comprise the world’s biggest, most accurate radio telescope ever constructed: the Square Kilometer Array.

The ambitious project, which brings 67 scientific teams from 20 countries together, is the next big thing in global scientific collaboration. (To clarify, the antennas cover continent-wide distances, but it’s the signal-collecting area that is one square kilometer, the equivalent of a single dish with a square kilometer of surface area.) Like CERN’s Large Hadron Collider (LHC), the SKA is a multi-year, multi-billion dollar enterprise aimed at answering some of the most fundamental questions about deep time and the very nature of the universe. According to Ronald Luijten, a senior manager at IBM’s Zurich Research Lab, “SKA is very similar to the CERN project in terms of the complexity of project itself, the size of the scientific community, and the global nature of the operation.”

Despite these structural and cultural similarities, the SKA represents a new step in terms of data management and the complexities of project coordination. The instrument will generate an exabyte of data every day – that would be 1,000,000,000,000,000,000 bytes – more than twice the information sent around the internet on a daily basis and 100 times more information than the LHC produces.

This enormous volume of data is a godsend for scientists, but simply storing, sorting, and transferring it is proving to be a major headache. To help make it happen, the SKA team at the Netherlands Institute for Radio Astronomy (ASTRON) is partnering with IBM in a 32.9 million Euro, five-year initiative called DOME (an enthusiastic non-acronym) that will hopefully lay the foundation for effective data management once the SKA comes online.

“The challenge is fundamentally one of scaling,” notes Luijten, “and the only little issue is that we don’t know how to do this. Today’s technology will not scale with density and energy in order to build the SKA.” Luijten describes the necessary advances as a quantum leap in data storing techniques, “comparable to going from an optical microscope to an electron microscope,” a jump that opened a world of new opportunities to nanotechnologists and biologists.

The DOME team is exploring several options to make it happen. One of the first steps involves reconfiguring how multiple computer chips are arranged within a server. In most contemporary architectures, individual chips are about 10 centimeters apart. And since 98% of a server’s energy goes to moving information around (just 2% is needed to actually do the computation), any incremental decrease in the path electronic signals need to travel will lead to significant improvements in cost, speed, and energy use.

With this in mind, the DOME team is proposing 3-dimensional chip stacking – essentially placing chips on top of each other – which would bring chips within a few millimeters of each other. This is low-hanging fruit to be sure, but risk-averse companies haven’t had a compelling reason to pursue different arrangements. Until now, that is.

A pretty picture, and, if you must know, a map of SKA sensitivty at certain radio wave frequencies based on the antenna geometry. Image: iAntConfig SKA SA

So what will these exabytes of information tell us? According to ASTRON’s Dr. Albert Jan Boonstra, the SKA will be “about two orders of magnitude more sensitive than the current generation of radio telescopes,” allowing the team to look farther out into the universe – and farther back in time – than any other instrument. Among other projects, analyses of dust clouds forming around stars will show us how planets form and how life-amenable chemical cocktails might be mixed. And in the spirit of extreme optimism, the SKA might pick up direct radio transmissions from any broadcasting ETs…

And what’s in it for IBM? Dr. Martin Schmatz of IBM Research in Zurich notes that “big data analysis is not only important for astronomers, but more and more important for many industrial applications, like for example in health care.” As more industries generate enormous data sets, curating and analyzing information gets more complicated. IBM envisions incorporating exascale technology into some of these more profitable sectors in coming years, and the SKA provides a convenient testing ground.

To the scientists involved, however, the SKA is no testbed, it’s a transformative instrument which, according to Luijten, will lead to “fundamental discoveries of how life and planets and matter all came into existence. As a scientist, this is a once in a lifetime opportunity.”