Federating and Correlating Sky Surveys

stronomers are receiving a flood of information about the universe from spacecraft and ground-based observatories. The last decade's series of comprehensive digital sky surveys-compiled using the latest generation of telescopes and electronic detectors for x-rays, ultraviolet, visible light, infrared, and radio-frequency radiation, and amenable to computer analysis-are an unprecedented new resource for researchers, far more useful than the photographic atlases and printed catalogs of previous generations. But are important discoveries being overlooked because each survey, taken in a limited region of the electromagnetic spectrum, gives only part of the picture? The Digital Sky project is federating individual sky surveys into a comprehensive digital archive of the entire sky. NPACI researchers' latest efforts are implementing new types of correlations between surveys, which not only will enable astronomers to mine the data in search of specific types of objects, but also will facilitate discoveries of new and unexpected classes of astronomical phenomena.

Figure 1.
One Galaxy,
Three WavelengthsVarious catalogs differ in resolution, accuracy, and limiting magnitude. These images centered on the galaxy UGC 00480 were taken in blue light (POSS-II, top), near-infrared (2MASS, middle), and radio (NVSS, bottom) wavelengths.

Among the surveys being affiliated in Digital Sky are the visible light Digital Palomar Observatory Sky Survey (DPOSS), the near-infrared 2-Micron All Sky Survey (2MASS), the NRAO VLA Sky Survey (NVSS) and VLA FIRST radio surveys, and the ROSAT faint x-ray source catalog. Digital Sky is working closely with the Infrared Science Archive (IRSA), which has supplied archiving services and expertise.

Work is proceeding on several fronts. At Caltech's Infrared Processing and Analysis Center (IPAC), John Good is leading an effort to mosaic 2MASS images. Reagan Moore's DICE group at SDSC is working with the 2MASS and IRSA projects to repackage more than 10 terabytes of high-resolution 2MASS data into SDSC Storage Resource Broker (SRB) "containers" to facilitate information retrieval from SDSC's and Caltech's HPSS mass storage systems. Roy Williams at Caltech's Center for Advanced Computing Research is adapting these images, along with classical star maps, to the Virtual Sky educational resource. And Caltech researcher Robert Brunner is leading an effort to apply new cross-correlations to the federated catalogs.

THE CROSS-IDENTIFICATION CHALLENGE

"Cross-identification of survey catalogs involves billions of sources over large areas of the sky," said Brunner. "The challenge is compounded by the fact that various surveys have intrinsically different resolutions, coordinate systems, and data representations that must be reconciled. We'll also need to identify sources that change over time-transient events, intrinsically variable sources, and moving objects-and this eventually will increase the amount of data from terabytes to petabytes."

Until recently, sky surveys have been federated only by spatial proximity-items in different catalogs were associated by the fact that they occupied the same location in the sky (Figure 1). Digital Sky researchers, building upon earlier work done at IRSA, are now developing a more powerful approach that will utilize all available data to associate catalogued objects using a Bayesian approach to determine statistical probabilities of association.

"Partly because the individual catalogs differ in spatial resolution, calibration accuracy, and limiting magnitude," Brunner explained, "to associate objects in the multi-wavelength federated archive, we will use a priori astrophysical knowledge and secondary parameters such as redshift, colors, or variability, in addition to location on the sky. We plan to produce a data federation toolkit that will enable end users to rapidly cross-identify both image and catalog datasets using either pre-defined association rules or their own custom-defined rules. We will publicly release our databases and tools as soon as they are scientifically verified."

Large numbers of objects with similar characteristics-normal stars such as the Sun, for example-form "data clouds" when their parameters are plotted along multiple axes of a graph. Data points in dense clouds most likely represent well-understood phenomena. But outside these data clouds, sparse groups of points and even isolated data points indicate rare or unusual objects-perhaps of unknown types.

"Opportunities for new discoveries come from mining the data for these anomalous nuggets," Brunner said. "On the other hand, anomalous points may indicate catalog errors or equipment malfunctions, which we also need to know about."

"We plan to use our multi-wavelength statistical approach to investigate several 'hot' astrophysical problems," Brunner said. "What is the relationship between quasars and large-scale structure, and how does it evolve with redshift? How do active galaxies form and evolve? What is the history of star formation in galaxies?"

TOWARD A VIRTUAL OBSERVATORY

"A long-term goal is to grow and evolve the Digital Sky project into a virtual observatory, a comprehensive resource for all astronomers," Brunner said. Such a virtual observatory would be accessible to researchers on the grid, and not only would encompass current sky surveys and catalogs, but also would be able to federate new information. Users will need analysis and information discovery tools to access the federated catalogs.

"We expect that both the Digital Sky metadata catalog and our data federation toolset will become cornerstones of a future National Virtual Observatory," Brunner said. -MG