Automated methods for discovering astrophysical phenomena by sifting through massive amounts of cosmological data are being developed by researchers at Carnegie Mellon University, Johns Hopkins University and the University of Washington under a new three-year, $1.6 million grant from the U.S. Department of Energy (DOE).

The methods proposed by the research team will enable astrophysicists to capitalize on a new generation of telescopes being constructed over the next decade. The tools, based on machine learning principles, would be unique in their ability to not only spot strange new objects that merit in-depth study, but also to identify larger patterns in observational data that could provide insights into the evolution of the universe. These techniques will have applications in biological and other physical sciences.

Jeff Schneider, an associate research professor in the Robotics Institute who will lead the new initiative, said telescopes now on the horizon, such as the Large Synoptic Sky Survey Telescope (LSST) in Chile, and the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) in Hawaii, will increase the rate of astronomic data-gathering a thousand-fold and thus will necessitate new automated discovery methods.

“The amount of data will be overwhelming,” said Schneider, a faculty member of Carnegie Mellon’s interdisciplinary McWilliams Center for Cosmology. “The datasets, measured in quadrillions of bytes, will be so large that no astronomer or group of astronomers could fully explore them, much less comprehend them. Computers have long helped scientists make discoveries by processing and analyzing observational data, but now we will need computer programs that also can make discoveries on their own.”

Working with cosmologists Alexander Szalay of Johns Hopkins and Andrew Connolly of the University of Washington, Schneider will develop computational methods capable of learning and using models with thousands or even millions of variables. Other automated discovery methods have focused on sifting through thousands of observations in search of a single anomalous object that merits follow-up study. But the methods being developed by Schneider, Szalay and Connolly also will look at the larger picture, searching for patterns and trends in the data that could reveal the physics of how the universe was formed and the dynamics of how it is evolving.

Finding those larger patterns is made difficult by the very nature of astrophysical phenomena, which typically evolve over huge time scales. “Most algorithms for discovering dynamic evolution assume that you have a sequence of observations to analyze,” Schneider said. “But in cosmology, you never get to see things evolve. Instead, you see a bunch of objects that are at different points on the evolutionary path. We need a way to look at those objects and use them to infer the evolutionary path and where each object might be on that path.”

That is a problem that is ubiquitous in science, he added. Alzheimer’s disease, for instance, may take a decade to develop — far too long for any medical researcher to practically gather information on the same individuals. Even in laboratory studies of cells, biologists often must destroy a cell to make measurements, so they are never able to both observe and measure the development of a single cell. Developing new automated methods for inferring connections between observations of different subjects at different points of development thus could have broad application in many scientific disciplines, Schneider said.

The researchers will be using data from the Sloan Digital Sky Survey, which used a New Mexico telescope to amass a dataset of 930,000 galaxies, 120,000 quasars and 460,000 stars during its first eight years of operation.