Using Google Compute Engine, an oceanographer collaborates with a Chicago startup to dramatically reduce the processing time for complex ocean simulations from two weeks to one hour.

As an oceanographer tracking the movements of ocean currents and marine life in the North Atlantic, Stefan Gary faces several challenges. His research subjects, the cold-water coral larvae that disperse widely across the Atlantic before settling to form important deep-sea reef habitats, are very numerous but have never been observed in real conditions. “These coral species are effectively ecosystem engineers at the bottom of the sea,” he explains, although no one knows exactly how and where their larvae spread. Their distribution is crucial to understanding the biodiversity of the ocean because coral serves as a vital habitat, feeding ground, and shelter for other marine species, which in turn affects the fishing industry and marine conservation efforts. With principal investigators at the University of Edinburgh and the Scottish Association for Marine Science (SAMS) and funding from EU/ATLAS, a major North Atlantic assessment project, Gary embarked on an ambitious project to track the movements of cold-water coral larvae.

Through ATLAS work at the German lab GEOMAR, Gary has access to a four-terabyte dataset of velocity vectors for the North Atlantic’s circulation over the past fifty years, but to run various models estimating how coral larvae might move through those currents would take a long time and intensive data processing. “We estimated each simulation would take several hours,” he says. “Very quickly you add up your variables and realize it will take two weeks to a month to run all these data and estimate where all these coral babies are going to go.”

"Partnering with Parallel Works to run coral larvae simulations on Google Compute Engine, we sped up the larval swimming parameter sweep calculation from two weeks to one hour!"

Accelerating scientific research from weeks to hours

Through his father’s neighbor, Gary happened to hear about Michael Wilde and his company, Parallel Works. Wilde co-founded Parallel Works with Matthew Shaxted in 2015 to create a platform where non-programmers can easily automate complex workflows to run large-scale data processing in the cloud. Wilde calls it “supercomputing as a service”: “we take application codes and stitch them together into workflow scripts that orchestrate ordinary software applications running at very large scale on cloud resources. These workflows take a complex science or engineering computational process and automate it.” The service is based on the Swift parallel scripting language—an open source research tool created and used at the University of Chicago and Argonne National Laboratory under support from the National Science Foundation and Department of Energy. By bundling Swift with Google Compute Engine, Wilde says, “we bring together parallel resources, dynamic scaling based on each workload, and the ease of an automated computing recipe.”

Using Parallel Works’ interface and Google Cloud Platform’s (GCP) infrastructure, Gary was able to run the massive dataset of North Atlantic velocity vectors from a bucket on GCP, sending 80 gigabytes of data at a time to run on 200 nodes concurrently (using 6,400 virtual CPUs). Each node ran 32 application cores through different simulations of coral larvae movements according to variables like swimming speed or time spent at the surface. That meant that Gary could model 6,400 possible simulations with astonishing speed: “Partnering with Parallel Works to run coral larvae simulations on Google Compute Engine, we sped up the larval swimming parameter sweep calculation from two weeks to one hour!”

“Google Cloud has incredible bandwidth between its storage resources and its compute resources,” Wilde explains. “Whereas Stefan’s single computer was churning a vast amount of data through a very thin pipe. Google Cloud essentially has a massive set of pipes and that was a big benefit. We were also able to take advantage of Google Compute’s custom instances so Stefan could tailor an instance to exactly match what his computation needed. It was not only fast but economical.” Gary adds, “it was astonishing for me to press that button and say Go! and see 200 computers spin up at the same time and start crunching away.”

"The combination of Swift, Parallel Works, and Google Cloud brings a whole new level of scientific productivity to bear."

Michael Wilde, Founder and CEO, Parallel Works

Supercomputing for everyone

For Gary, the collaboration with Parallel Works opens the door to more collaboration with other oceanographers. He hopes to enlist colleagues to use these tools to create an ocean particle-tracking lab so even scientists with less experience with these techniques can get speedy answers to their own questions using other parameters. According to Wilde, these methods can have broad applications and impact: “not only can Stefan get an answer faster now but he can explore many, many more variables and he can ask a lot more questions from the simulations. The combination of Swift, Parallel Works, and Google Cloud brings a whole new level of scientific productivity to bear.”

Note: This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 678760 (ATLAS). This output reflects only the author’s view and the European Union cannot be held responsible for any use that may be made of the information contained therein.

Organization Profile

Based in Oban, the Scottish Association for Marine Science (SAMS) was founded in 1884 to promote the study and appreciation of the marine environment. It is one of the leading marine laboratories in the UK, sponsoring its own research as well as partnerships with 70 projects including the European H2020 ATLAS project, an international assessment of the North Atlantic’s deep-sea ecosystems.