Big Data Grant from Moore, Sloan Aims to Make Pi-Shaped Scientists

Share

Scientists in nearly every field are confronting the blessing and the curse of big data.

New scientific instruments and simulations such as the Large Hadron Collider and the cabled ocean observatory in the northeastern Pacific promise more and better observations of places and phenomena that were difficult or impossible to access before. Sociologists and economists can now test their theories against vast quantities of real-world data from online social networks and point-of-sale terminals.

But this proliferation of data is valuable only to the extent that it leads to new discoveries. The fundamental challenge for many researchers—and the proponents of this way of thinking say it is actually facing all researchers, whether or not they yet recognize it—is no longer about capturing scientific data, but managing the vast, diverse, and rapidly expanding datasets they’ve already got.

A major grant from the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation announced at a White House event Tuesday seeks to unite scientists and computer scientists to figure out better ways to handle scientific data, and to address structural barriers to the adoption of new methods and tools. University of Washington, University of California, Berkeley, and New York University will share in the $37.8 million grant over five years, working together on a challenge that is at its essence about the future of knowledge and discovery.

“If any organization wants to lead in any field of discovery in the next few decades, they have got to lead in data science, because that’s what’s going to propel new discoveries in every field,” says computer science professor Ed Lazowska, leader of the Moore/Sloan effort at UW and director of the university’s eScience Institute, which has been working on this issue for five years.

Lazowska

At NYU, the effort is led by computer science professor Yann LeCun, founding director of the university’s Center for Data Science. From UC Berkeley comes Saul Perlmutter, a Nobel laureate, physics professor, and astrophysicist at Lawrence Berkeley National Laboratory.

Borrowing from the late computer science luminary Jim Gray, Lazowska describes data-driven discovery as the fourth paradigm of scientific research. For thousands of years, discovery was made through observation and experiment. Then scientists began building models and theories. Then, about 50 years ago, computational science was added to the researcher’s tool kit, with increasingly powerful supercomputers performing simulations that allowed them to work faster and to study events they couldn’t before, such as to the first few milliseconds following the Big Bang.

This fourth paradigm, which Lazowska describes as the “semi-automated analysis of vast amounts of data,” is being adopted by different fields at different speeds. To really harness it, domain scientists need to be proficient in a new discipline based on computer science, applied mathematics, and statistics.

Before the data revolution, scientists could be thought of as a “T-shaped”—broadly skilled with depth in their fields of expertise. But in this data-rich era of discovery, Lazowska says scientists will need to be “Pi shaped” (π): still broadly skilled, but now with deep expertise in data science in addition to their scientific domain.

The Moore/Sloan grant aims to help create conditions at universities that will encourage more Pi-shaped researchers. And, according to the foundations, “substantial systemic challenges” stand in the way.

The traditional university organizational structure, which separates people by academic discipline, is one such challenge.

“A huge proportion of the excitement sits in the spaces between traditional fields,” Lazowska says. But someone who works between computer science and astronomy, for example, may not appear to be the best computer scientist or the best astronomer to people in those fields. That can be a hindrance when it comes to things like hiring and tenure.

The UW is already taking steps to address this particular challenge, allocating a set of half-faculty positions to the eScience Institute. If the sociology department wants to hire someone who has a strong methodological bent, someone who can contribute broadly applicable research tools, they can effectively split the cost with eScience, and the campus as a whole benefits, Lazowska says.

“It encourages academic programs to hire these Pi-shaped people at the interfaces of two fields,” he says.