Aristotle: A federated cloud for academic research

Cornell will lead a five-year, $5 million project sponsored by the National Science Foundation (NSF) to build a cloud computing resource for academic research.

The project, known as the Aristotle Cloud Federation, will combine the resources of the Cornell Center for Advanced Computing (CAC) with others at the University at Buffalo and the University of California, Santa Barbara. CAC director David Lifka will lead the project with colleagues Tom Furlani, director of the University at Buffalo Center for Computational Research, and Rich Wolski, professor of computer science at UCSB.

The name “Aristotle” was chosen to reflect the Greek philosopher’s concept that “the whole is greater than the sum of its parts.”

“Sharing cloud computing and storage assets between institutions and bursting to commercial clouds when appropriate is definitely a model worth a serious trial,” said Robert Buhrman, senior vice provost for research. “Creating federated clouds has the potential to increase multi-institutional and multi-disciplinary research collaborations, enhance data-driven insights, and reduce capital expenditures.”

“The overarching goal is optimizing ‘time to science’ - the actual time it takes a researcher to obtain scientific results,” Lifka said. “The elasticity provided by sharing resources means researchers don’t have to wait for local resources to become available to get their science started.”

In cloud computing, users store their data and run calculations on remote computers, rather than buying and maintaining their own servers. Besides cost savings, the cloud offers scalability: jobs involving massive datasets and intensive processing can expand into the cloud as far as needed, but on days when you don’t have much to do you don’t have to maintain a huge resource on your own campus.

Commercial cloud services from such providers as Microsoft and Amazon are widely used by businesses, and some academic researchers purchase space in commercial clouds. Aristotle will be a resource built by academic institutions specifically designed to support academic research.

“The idea behind a federated cloud is that when local resources are busy you can move your job to a collaborating institution's resources (trading resource access instead of dollars) before you have to go to a public cloud provider and spend money,” Lifka said.

The NSF funding will buy new servers and data storage hardware for all three participating institutions, and support seven science project teams – including four at Cornell – that have signed on so far to use the system.

At Cornell, early participants include Sara Pryor, professor of earth and atmospheric sciences, who studies the effect of aerosols in the atmosphere (tiny particles of solid or liquid ranging from windblown soil to power plant emissions) on climate. Each day, NASA satellites generate four terabytes of data on atmospheric conditions, which Pryor uses to build simulations of climate effects across North America down to a 12-kilometer resolution. Even Cornell’s high-performance computing resources are not up to the task, Pryor said.

"I’m enthused about this project because it removes a significant bottleneck,” she said. “Big data represents a challenge but also an opportunity for tremendous scientific advancement.”

“We plan to use Aristotle for interactive analysis of complex environmental models that generate thousands of data files," said Patrick Reed, professor of civil and environmental engineering.

Reed works with municipalities on their management of water resources, building computer models of the watershed, infrastructure, demand and human behavior. It’s a problem in optimization, where the computer must try out many combinations to find the one that best satisfies all the constraints.

The calculations could take minutes, hours or even years on an ordinary computer, he said. “The other exciting thing is that it’s not just scaling the science at universities,” he said. “The people we are working with may be able to utilize these methods themselves in the commercial cloud.”

When computations become too large even for the combined university resources, Aristotle users can “burst” into a commercial cloud or apply for grants through a collaboration with Amazon Web Services (AWS).

“We are excited to work with the Aristotle team to provide cost-effective and scalable infrastructure that helps accelerate the time to science,” said Jamie Kinney, AWS senior manager for scientific computing.

“This award continues NSF’s multi-year strategy to stimulate exploration of scalable and sustainable data infrastructure models that facilitate collaborative research across disciplines and institutions,” said Amy Walton, program director at the Advanced Cyberinfrastructure Division of NSF. “By experimenting with cloud usage metrics, collaborating with a commercial cloud vendor, and exploring pricing/trading allocation mechanisms, the project will provide valuable information about how the innovations work in a range of situations, and how this ‘market approach’ integrates within the larger research ecosystem.”

Furlani and Wolski have developed and are enhancing tools to monitor resource availability across the federation, show how effectively the resources are being used, and guide users to the most cost-effective place to run their work.

All software, tools, documentation and training materials developed in the project will be made publically available so that others can join the Aristotle federation or create their own. “It is our hope and that of NSF's that this will serve as a model that the national research community can adopt and enhance over time,” Lifka said, “making more computing and data analysis resources available in a very scalable and sustainable way.”