A Global HPC-Powered Hub for Natural Hazard Research

Climate change is fueling more extreme adverse weather worldwide, and DesignSafe wants to help research keep pace. Credit: Image courtesy of National Oceanic and Atmospheric Administration/Department of Commerce (NOAA).

Given the accelerating pace of climate change, the race is on to find better ways to predict the impact of natural disasters, mitigate their damage, and enable faster, more targeted recoveries. In 2018, Hurricane Michael was the strongest storm to hit the mainland United States since 1992 and the strongest ever to hit the Florida Panhandle. Hurricane Florence, which hit just a few weeks before Michael, was the worst rainstorm on record ever to impact the entire U.S. East Coast. This follows the 2017 Atlantic hurricane season, which saw 17 named storms, causing more than 3000 fatalities and $265 billion in damages.

But a race against climate change does not have to be a race between researchers. If researchers collaborate, they can avoid needless duplication of effort, leverage a common set of data, share tools, extend each other’s work, and apply the same metrics, definitions, and standards — all of which accelerates both the pace of research and the pace at which research saves lives and property.

Questions researchers want to answer include:

•Given a certain storm surge, wind speed and wind direction, which buildings in which locations will probably experience flooding or wind damage, and how much?

•How well will different building methods and materials withstand different impacts (storm surge, high wind, etc.)?

The ability to answer these questions has real-world consequences. For example, if you could accurately simulate flooding conditions within a hurricane’s projected impact zone you could then alert emergency managers where to prioritize rescue efforts down to specific buildings. That would reduce the time that storm victims wait to be rescued while responders were searching areas they didn’t need to search.

That’s the idea behind DesignSafe, a web-based research platform within the U.S.-based Natural Hazards Engineering Research Infrastructure (NHERI). DesignSafe enables researchers to manage, analyze, and understand critical information about natural hazards — from earthquakes and tornados to hurricanes and sinkholes. The platform is funded by grants from the National Science Foundation (NSF) and developed by the Texas Advanced Computing Center (TACC) and Cockrell School of Engineering at The University of Texas at Austin in collaboration with partners at Rice University and Florida Institute of Technology.

A Central Cloud-Based Research Platform

DesignSafe is a single cloud-based platform whose primary purpose is to bring together in one place the resources necessary to conduct and collaborate on natural hazard research:

•A central repository capable of holding the massive amounts of field data collected during hurricane impact field surveys

•A library of advanced mapping, visualization, simulation, and data analysis tools that users can apply directly against any data in the repository

•A virtual community channel on Slack (a cloud-based collaboration tool) used to coordinate field surveys and research activities

•A project portal where researchers can post research results, including formal research papers with supporting data and code, such as Matlab or Python scripts

In addition to field reconnaissance studies, DesignSafe also supports many other types of research, including experimental and numerical simulation projects.

A good example of DesignSafe in action was after Hurricane Florence when researchers used the platform to capture and analyze vast numbers of individual surveys of damaged buildings and photographs — all of which were collected from reconnaissance teams whose efforts were coordinated through DesignSafe.

High Performance Computing Challenges

David Roueche, an assistant professor of civil engineering at Auburn University, was one of the researchers coordinating that effort. He says that the complexity and size of such surveys — in terms of both teams and data — demand a centralized approach with high performance computing as its base.

“These datasets are getting huge, so being able to process them and utilize high-performance computing is key. For Hurricane Florence we collected upwards of two to three hundred gigabytes. That amount of data is just not feasible for everyone to be sharing back and forth on their personal workstations while trying to communicate. It’s a much more efficient process to host and process the data on a central hub and not have to worry about downloading it to do your analysis. You can do all that in the cloud through DesignSafe and TACC.”

And just as DesignSafe makes work shared across collaborators more efficient, it does the same for working across datasets and applications, says Dr. Roueche.

“I have done some work, for example, with geotagged photos, including drone photos using HazMapper [a web-based application for creating, visualizing, and analyzing geospatial data developed by the DesignSafe team]. I can directly import that image data from DesignSafe into HazMapper to quickly visualize where we collected the photographs and then use that HazMapper output as a reference to other factors we’re looking at from another database. The advantage is that we can connect the dots directly between one research domain and another.”

Another DesignSafe advantage is the ability of researchers to build on each other’s work, which accelerates research two ways: first, by adding knowledge and, second, by streamlining the research process itself. A good example, says Dr. Roueche, is using TACC’s supercomputer to create machine learning algorithms to auto-tag disaster images so they don’t have to be tagged manually after future disasters.

“We collect all these photographs post-disaster from which we essentially try to parse out engineering failures, and relate those failures to the pre-storm condition of the buildings. This process is done manually at the present, but we are at the same time creating large sets of labeled data that can be used by others to train machine learning or deep learning processes for automatically identifying building attributes from photographs. So once we’ve created this huge labeled dataset, others can easily access it and use it to develop new algorithms. That in turn leads to a much more efficient process going forward and, again, it really speaks to why I think DesignSafe will accelerate research in so many different ways.”

Massively Scaled Quality Control

Solutions like these that accelerate research are imperative; given the massive data collection and analysis challenges researchers face.

“When you think about a disaster like Hurricane Michael,” says Dr. Roueche, “there are hundreds of thousands of structures that are damaged and affected. So, how can we possibly have a team of a few researchers go out and not only collect data but collect sufficient quantities of data to represent the actual conditions, the diverse structural typology, the hazard conditions that each one of them experienced, and then also do that before it all gets cleaned up and repaired? So it’s this balance of: how can we collect as much data as we can and as quickly as possible and then also be able to process that into something useful that the communities that are recovering can use?”

Added to the data collection challenge is the challenge of making sure data is captured in a consistent way at different times by different researchers so that the data from different surveys and insights from different studies can be combined and correlated.

“What we tried to do in our approach,” days Dr. Roueche, “is to not only include the raw data, the photographs, the notes, and so on, but to also include the processed data in a CSV or JSON file. So, for example, for every structure we’ve investigated we have the latitude and longitude of the actual structure, as well as the latitude and longitude of all photographs taken of the structure. We have also QC’d or double-checked all of our damage ratings — i.e., as no damage, minor, moderate, severe, or destroyed — to ensure consistent criteria were followed. We publish those criteria. And we’ve gone back and QC’d all our values to make sure we actually followed that criteria in the published dataset. We’ve defined damage to individual building components, such as roofs, walls and windows or doors, so that researchers studying these individual components can quickly parse the data they need. And we’ve also aggregated data from public records to attach building attributes such as year built, construction materials, roof shape, and more so that building performance can be analyzed within the proper context.”

Quality control like this on such a massive scale would not be practical were it not for the rare combination of resources that DesignSafe offers, Dr. Roueche says. “Regarding the platform — its integration, its research community, the drive for involvement coming from the National Science Foundation, and then also the simulation, the computing facilities, as well as the storage repositories, and the publishing capabilities — yes, that combination is pretty unique.”

He continues: “At most universities there are repositories where a researcher could host their own datasets; there are high-performance computing centers at most universities where one could run simulations. But until DesignSafe, there has not been a central hub where natural hazards researchers from any university can go to find datasets, to collaborate, to run simulations. And so, if I am not using DesignSafe, then I’m probably not reaching my full potential in terms of being able to collaborate and analyze and share data and maximize the potential of that data.”

A New Frontier

“In my experience, prior to DesignSafe it was a little bit of a Wild West in the research community following a destructive natural hazard event,” Dr. Roueche says. “You’d have multiple teams going out into these disaster sites and maybe you knew about somebody else going out and maybe you didn’t. But there typically wasn’t much coordination. You tended to do your own thing. It was competitive. It seemed as if researchers were trying to get their own data, hold onto it, and then eventually analyze and publish it before anybody else does so you get the credit for it. Whereas now, in part through DesignSafe as a central hub, and the emphasis on more open data from funding agencies like the National Science Foundation, there appears to be a rapidly changing paradigm and a new more integrated process of data collection and sharing.”

The fruits of that integration include a much faster, more efficient, and more beneficial research effort into the impacts of natural hazards, like hurricanes.

“We can certainly look back and point to clear steps of progress that we’ve been able to make,” Dr. Roueche says. “We're coordinating and communicating much better so there’s no overlap. We're extending our coverage. We’re using consistent data collection and data standards so we’re getting higher quality and more consistent datasets out of it that have broader coverage. We’re involving more members of the research community that wouldn’t have taken part otherwise. So, we’re expanding our community. Expanding our reach. Expanding both the quantity and quality of the data that we’re collecting. Ultimately, we are expanding the impact this research has on society. And I can point to DesignSafe, and the capabilities that it has, as a huge part of that.”

Randall Cronk is the owner of greatwriting, LLC, a technology marketing writing company in Boston, Massachusetts.

# # #

This article was produced as part of Intel’s HPC editorial program, with the goal of highlighting cutting-edge science, research and innovation driven by the HPC community through advanced technology. The publisher of the content has final editing rights and determines what articles are published.

RELATED ARTICLES

First developed in the 1970s, recent developments are making microflow liquid chromatography techniques more robust – enticing researchers to revisit the opportunities it offers for more sensitive, high-throughput analyses requiring smaller sample volumes than standard flow techniques.

Big data has long been a buzzword in drug discovery, but as analysis methods become more sophisticated, its potential is beginning to be realized. We look at some of the latest advances in big data analysis for drug discovery.

In this article, we look at the key differences between four common types of informatics systems. CSols Inc. is an independent informatics consultancy, and their Delivery Manager, Geoff Turnbull, will help us answer some important questions on the role these systems play, where they overlap with each other, and why we don’t just have one overarching system to rule them all.