“The multidisciplinary approach for addressing the increasing volume and complexity of data enabled through the TRIPODS+X projects will have a profound impact on the field of data science and its use,” said Jim Kurose, NSF assistant director for Computer and Information Science and Engineering (CISE). "This impact will be sure to grow as data continues to drive scientific discovery and innovation.”

A total of $8.5 million in TRIPODS+X grants were awarded this year, supporting 19 collaborative projects at 23 universities, and bringing new perspectives to complex and entrenched data science problems in science, engineering, and mathematics.

The three Georgia Tech projects span three different NSF priorities in education, visualization, and research.

Education: Data-driven Discovery and Alliance

Prasad Tetali and his multi-institutional team including traditional women’s and historic black colleges and universities, are developing undergraduate courses for STEM majors to give more students access to a data-driven future. With this grant, the collaborative alliance, grounded in math, statistics and computer science theory, will develop a toolkit of data science modules to integrate into science curriculum at Agnes Scott College, Morehouse College, and Spelman College. They will also hold boot camps and workshops. The educational outreach will enrich the knowledge of these institutions’ faculty, and later, the team plans to adapt the initiative to serve other research-intensive women’s and HBCU institutions.

“The NSF-supported educational alliance is exciting in many ways,” Tetali says. “It gives opportunity to infuse the foundational data science curriculum with real-world applications from the physical and life sciences. It will also likely catalyze collaborative research in data science and related fields between Georgia Tech and Atlanta area colleges.”

Investigators:

Prasad Tetali (lead), Georgia Tech School of Mathematics and School of Computer Science

Brandeis Marshall (collaborative lead), Spelman College

Chris DePree, Agnes Scott College

Alan Koch, Agnes Scott College

Wenjing Liao, Georgia Tech School of Mathematics

Chuang Peng, Morehouse College

David Sherrill, Georgia Tech School of Chemistry and Biochemistry

Joshua Weitz, Georgia Tech School of Biological Sciences

Award Amount: $200,000

Vision: Creating an Annual Data Science Forum

Dana Randall and colleagues from Carnegie Mellon University and Columbia University are creating a week-long Data Science Forum built around the Second Symposium on Machine Learning in Science and Engineering (MLSE). The forum combines multiple events aimed at catalyzing communication across foundations, applications, and disciplinary fields, and at fostering diversity and inclusion. Two new workshops that complement the conference are a part of the forum: A Women in Data Science Workshop, and a Foundations of Data Driven Discovery workshop.

MLSE, begun last year by Georgia Tech and Carnegie Mellon, was the first annual machine learning conference organized to collocate tracks within traditional disciplines using machine learning while allowing an exchange of ideas across disciplines. This cross-disciplinary breadth combined with efforts to build diversity in attendance will permeate all MLSE events, and enable a visioning working group at the meeting to develop an inclusive report on the future of machine learning.

“The first MLSE last summer was a great success, providing a new forum for machine learning discussions among scientists and engineers,” said Randall. “It’s very exciting that this grant allows us to expand the event for the next two years by including more students, women, and adding a workshop promoting theoretical foundations, consistent with the goals of TRIAD and IDEaS.”

Investigators:

Dana Randall (lead), Georgia Tech School of Computer Science

Srinivas Aluru, Georgia Tech School of Computational Science and Engineering

Santosh Vempala and researchers at the University of Washington are interested in solving a sampling problem that will help researchers spanning many disciplines. Sampling from a given distribution from a space with many attributes is a fundamental problem in computer science. Over the past two decades, practical applications of sampling have proliferated in statistics, networking, biology, differential privacy, and, most notably, machine learning. Sampling is used to evaluate models, as a subroutine for optimization, and more generally for exploring large complex spaces.

The researchers will help develop a toolkit for sampling and evaluate it on real data sets—a large-scale, high-dimensional toolkit for sampling smooth and non-smooth distributions, and a suite of functions that can be computed or estimated using access to samples. It will be developed by working with domain experts in health metrics and systems biology.

Georgia Tech’s TRIAD, part of a community of TRIPODS institutes that share expertise and work together, integrates research and education in mathematical, statistical, and algorithmic foundations for data science. TRIAD also hosts focused working groups, national and international workshops, and organized innovation labs, to share data science insights and resources locally and nationally.

“The TRIPODS program, and with it our own TRIAD institute, were established to expand our collective capabilities and accelerate progress,” said Xiaoming Huo, executive director of TRIAD. “Whether it is for education, defining a vision for the future, or pushing the frontiers of research, the new ideas we need come from bridging the boundaries of science, engineering and mathematics.”