You are here

Data & Data Analysis

May 31, 2016 | The rise of big data has big implications for the advancement of science. It also has big implications for the clogging of bandwidth.
The growing deluge of geoscience data is in danger of maxing out the existing capacity to deliver that information to researchers. In response, scientific institutions are experimenting with storing data in the cloud, where researchers can readily get the relatively small portion of the data they actually need.
Helping blaze the way is Unidata, which partnered with Amazon Web Services last year to make Next Generation Weather Radar (NEXRAD) data from the National Oceanic and Atmospheric Administration (NOAA) available in the cloud in near real time. The project is one of the ways Unidata, a community program of the University Corporation for Atmospheric Research (UCAR), is exploring what the future of data access may look like.
"One of the roles we play at Unidata is to see where the information technology world is going and monitor the new technologies that can advance science," said Unidata Director Mohan Ramamurthy. "In the last 10 years, we've watched the cloud computing environment mature. It's become robust and reliable enough that it now makes sense for the scientific community to begin to adopt it."
Inside an Amazon Web Services data center. (Photo courtesy Amazon.)
The data deluge
Since 1984, Unidata has been delivering geoscience data in near real time to researchers who want it. Today, Unidata also offers those scientists tools they can use to analyze and visualize the data.
In 2008, Unidata's servers delivered 2.7 terabytes of data a day to 170 institutions. Just five years later, the program was providing 13 terabytes—or the equivalent of about 4.5 million digital photos—a day to 263 institutions.
Today, Unidata is delivering about 33 terabytes of data a day. And the volume is only expected to grow.
For example, NOAA's new weather satellite, GOES-R (Geostationary Operational Environmental Satellite R-Series), is scheduled to launch in October. When GOES-R is up and running, it alone will produce a whopping 3.5 terabytes of data a day.
"We've been pushing out data for 30-plus years here at Unidata," said Jeff Weber, who is heading up Unidata's collaboration with Amazon. "What we're finding now is that the volume of available data is just getting to be too large," We can't keep putting more and more data into the pipe and pushing it out—there are physical constraints."
The physical constraints are not just on Unidata's side. Many universities and other institutions that rely on Unidata do not have the local bandwidth to handle a huge increase in the incoming stream of data.
To address the problem, Unidata decided a few years ago to begin transitioning its services to the cloud—a network of servers hosted on the Internet that allow you to access and process data from anywhere.
The vision is to create a future where scientists could go to the cloud, access the data they need, and then use cloud-based tools to process and analyze that data. At the end of their projects, scientists would download only their finished products: a map or graph, perhaps, or the results from a statistical analysis.
"With cloud computing, you can bring all your science and the analytic tools you use to the data, rather than the old paradigm of bringing the data to your tools," Ramamurthy said.
'Navigating the waters'
These advantages were part of the motivation behind the U.S. Department of Commerce's announcement last spring that NOAA would collaborate with Amazon, Google, IBM, Microsoft, and the Open Commons Consortium with the goal of "unleashing its vast resources of environmental data" using cloud computing.
A NEXRAD data product available to researchers through Unidata. (Image courtesy Unidata.)
Amazon Web Services was one of the first out of the gate on the NOAA Big Data Project, uploading the full archive of NEXRAD data to the cloud last summer. But to figure out how to continue to feed the archive with near real time observations and to help make sense of the data — how people might want to use it and what kinds of tools they would need — Amazon turned to Unidata.
"It made a lot of sense for Unidata to partner with Amazon and vice versa," Ramamurthy said. "They wanted expertise in atmospheric science data. We wanted an opportunity to introduce cloud-based data services to our community and raise awareness about what it can do."
The scientific community is perhaps more hesitant to rely on the cloud than other user groups. Datasets are the lifeblood of many research projects, and knowing that the data are stored locally offers a sense of security for many scientists, Ramamurthy said. Losing access to some data could nullify years of work.
But the truth is that the data are likely more secure in the cloud than on a local hard drive, Ramamurthy said. "Mirroring" by multiple cloud servers means that data are always backed up.
If the Amazon project, and the NOAA Big Data Project in general, are successful in winning scientists over, it could go a long way toward helping Unidata make its own transition to the cloud. Unidata will be studying and learning from the project – including how to make a business model that will work -- with an eye toward its own future.
"We're navigating the waters to find out what works and what doesn't so we can report back to the National Science Foundation," Weber said. "We want to see how this paradigm shift might play out — if it makes sense, if it doesn't, or if it makes sense in a few ways but not others."
Writer/contactLaura Snider, Senior Science Writer and Public Information Officer

April 4, 2016 | If scientists could directly measure the properties of all the water throughout the world’s oceans, they wouldn’t need help from NCAR scientist Alicia Karspeck. But since large expanses of the oceans are beyond the reach of observing instruments, Karspeck’s work is critical for those who want estimates of temperature, salinity, and other properties of water around the globe.
Scientists need these estimates to better understand the world’s climate system and how it is changing. “It’s painstaking work, but my hope is it will lead to major advances in climate modeling and long-term prediction,” Karspeck said.
She is one of a dozen or so researchers at NCAR who spend their days on data assimilation, a field that is becoming increasingly important for the geosciences and other areas of research.
Broadly speaking, data assimilation is any method of enabling computer models to utilize relevant observations. Part science and part art, it involves figuring out how to get available measurements--which may be sparse, tightly clustered, or irregularly scattered--into models that tend to simplify the world by breaking it into gridded boxes.
Commonly used in weather forecasting, the technique can improve simulations and help scientists predict future events with more confidence. It can also identify deficiencies in both models and observations.
As models have become more powerful and observations more numerous, the technique has become so critical that NCAR last year launched a Data Assimilation Program to better leverage expertise across its seven labs.
“Activities in data assimilation have grown well beyond traditional applications in numerical weather prediction for the atmosphere and now span across NCAR’s laboratories,” said NCAR Director Jim Hurrell. “The Data Assimilation program is designed to enhance data assimilation research at NCAR, while at the same time serving the broader U.S. research community.”
Scientists are using data assimilation techniques to input a range of North American observations into experimental, high-resolution U.S. forecasts. These real-time ensemble forecasts are publicly available while they're being tested. (@UCAR. This image is freely available for media & nonprofit use.)
Improving prediction
Created by the NCAR Directorate, the Data Assimilation Program is designed to advance prediction of events ranging from severe weather and floods to air pollution outbreaks and peaks in the solar cycle.
One of its goals is to encourage collaborations among data assimilation experts at NCAR and the larger research community. For example, scientists in several labs are joining forces to apply data assimilation methods to satellite measurements to create a database of global winds and other atmospheric properties. This database will then be used for a broad range of climate and weather studies.
The program also provides funding to hire postdocs at NCAR to focus on data assimilation projects as well as for a software engineer to support such activities.
NCAR Senior Scientist Chris Snyder coordinates the Data Assimilation Program.
"By bringing money to the table, we’re building up data assimilation capability across NCAR,” said NCAR Senior Scientist Chris Snyder, who coordinates the Data Assimilation Program. “This is critical because data assimilation provides a framework to scientists throughout the atmospheric and related sciences who need to assess where the uncertainties are and how a given observation can help.”
NCAR Senior Scientist Jeff Anderson, who oversees the Data Assimilation Research Testbed (DART), says that data assimilation has become central for the geosciences. DART is a software environment that helps researchers develop data assimilation methods and observations with various computer models.
“I think the Data Assimilation Program is a huge win for NCAR and the entire atmospheric sciences community,” Anderson said. “The scientific method is about taking observations of the world and making sense of them, and data assimilation is fundamental for applying the scientific method to the geosciences as well as to other research areas.”
From oceans to Sun
Here are examples of how data assimilation is advancing our understanding of atmospheric and related processes from ocean depths to the Sun’s interior:
Oceans. Karspeck is using data assimilation to estimate water properties and currents throughout the world's oceans. This is a computationally demanding task that requires feeding observations into the NCAR-based Community Earth System Model, simulating several days of ocean conditions on the Yellowstone supercomputer, and using those results to update the conditions in the model and run another simulation.
The good news: the resulting simulations match well with historical records, indicating that the data assimilation approach is working. “My goal is to turn this into a viable system for researchers,” Karspeck said.
Air quality. Atmospheric chemists at NCAR are using data assimilation of satellite observations to improve air quality models that currently draw on limited surface observations of pollutants. For example, assimilating satellite observations would show the effect of emissions from a wildfire in Montana on downwind air quality, such as in Chicago.
“We've done a lot of work to speed up the processing time and the results are promising," said NCAR scientist Helen Worden. “The model simulations after assimilating satellite carbon monoxide data are much closer to actual air quality conditions.”
Weather forecasting. Data assimilation is helping scientists diagnose problems with weather models. For example, why do models consistently overpredict or underpredict temperatures near the surface? Using data assimilation, NCAR scientist Josh Hacker discovered that models incorrectly simulate the transfer of heat from the ground into the atmosphere.
“With data assimilation, you’re repeatedly confronting the model with observations so you can very quickly see how things go wrong,” he said.
Solar cycle. Scientists believe the 11-year solar cycle is driven by mysterious processes deep below the Sun’s surface, such as the movements of cells of plasma between the Sun’s lower latitudes and poles. To understand the causes of the cycle and ultimately predict it, they are turning to data assimilation to augment observations of magnetic fields and plasma flow at the Sun’s surface and feed the resulting information into a computer model of subsurface processes.
“We are matching surface conditions to the model, such as the pattern and speed of the plasma flows and evolving magnetic fields,” said NCAR scientist Mausumi Dikpati.
Capturing data. In addition to helping scientists improve models, the new Data Assimilation Program is also fostering discussions about observations. NCAR senior scientist Wen-Chau Lee and colleagues who are experts in gathering observations are conferring with computer modelers over how to process the data for the models to readily ingest.
One challenge, for example, is that radars may take observations every 150 meters whereas the models often have a resolution of 1-3 kilometers. Inputting the radar observations into the models requires advanced quality control techniques, including coordinate transformation (modifying coordinates from observations to the models) and data thinning (reducing the density of observations while retaining the basic information).
“We are modifying our quality control procedures to make sure that the flow of data is smooth.” Lee said.
“With data assimilation, the first word is ‘data’,” he added. “Without data, without observations, there is no assimilation.”
Writer/contactDavid Hosansky, Manager of Media Relations
FundersNCAR Directorate National Science FoundationAdditional funding agencies for specific projects

October 12, 2015 | We're excited it's Earth Science Week, and even more excited about this year's theme—visualizing Earth systems—because it happens to be one of the things NCAR does best. NCAR visualizations cover the spectrum, from Earth to air to fire to water.
Clockwise from top left: EARTH (ground movement for an earthquake in California), AIR (wind trajectories during a marine cyclone), FIRE (behavior of a Colorado wildfire), and WATER (sea surface temperature anomalies during El Niño and La Niña). Click on the images to watch the full video versions of the simulations.
Scientists across NCAR and at collaborating universities create visualizations to help make sense of their research, often with the help of the Computational and Information Systems Lab. CISL houses the VisLab (the Scientific Visualization Services Group), VAPOR (the Visualization and Analysis Platform for Ocean, Atmosphere and Solar Researchers group); and NCL (the NCAR Command Language group). These teams of software engineers and other professionals are resources for scientists who want to make their research come alive. Learn more about how the visualizations are made here.
Earth Science Week was launched by the American Geosciences Institute in 1998. #EarthSciWeek 2015 runs from Oct. 11 through Oct 18.
Writer/contactLaura Snider, Senior Science Writer and Public Information Officer

September 3, 2015 | The El Niño brewing in the tropical Pacific is on track to become one of the strongest such events in recorded history and may even warm its way past the historic 1997-98 El Niño.
While it's too early to say if the current El Niño will live up to the hype, this new NCAR visualization comparing sea surface temperatures in the tropical Pacific in 1997 to those in 2015 gives a revealing glimpse into the similarities, and differences, between the two events. Sea surface temperatures are key to gauging the strength of an El Niño, which is marked by warmer-than-average waters.
Even if this year's El Niño goes on to take the title for strongest recorded event, there's no guarantee that the impacts on weather around the world will be the same as they were in 1997-98. Like snowflakes, each El Niño is unique. Still, experts are pondering whether a strong El Niño might ease California's unrelenting drought, cause heatwaves in Australia, cut coffee production in Uganda, and impact the food supply for Peruvian vicuñas.
This video animation was created by Matt Rehme at NCAR's Visualization Lab, part of the Computational & Information Systems Lab. It uses the latest data from the National Oceanic and Atmospheric Administration. Rehme had previously created a similar visualization of the 1997-98 El Niño. When comparisons between this year's El Niño and that event began flying around, he decided to make a second animation and compare the two.
"I was a little shocked just how closely 2015 resembles 1997 visually," Rehme said.
More on El Niño
El Niño, La Niña & ENSO FAQ
Here comes El Niño—but what exactly is it?
El Niño or La Nada? The great forecast challenge of 2014
¡Hola, La Nada! What happens when El Niño and La Niña take a break?
Writer/contactLaura Snider

September 3, 2014 | As observing instruments and computer modeling become increasingly refined, the amount of data generated by field studies has grown tremendously. Storing and archiving the data is a challenge in itself, but scientists also need the data to be easily accessible and connected to other relevant resources.
Researchers use terrestrial laser scanning technology to analyze a dinosaur track site in Denali National Park and Preserve. Geophysical data from ground-based imaging coordinated by UNAVCO is being incorporated in a two-year EarthCube initiative. (Image courtesy UNAVCO and the Perot Museum of Nature and Science.)
To help address this issue, UCAR is launching a project with two partners—Cornell University and UNAVCO—that aims to connect the dots among field experiments, research teams, datasets, research instruments, and published findings.
The two-year project, titled "Enabling Scientific Collaboration and Discovery through Semantic Connections," is funded by the National Science Foundation’s EarthCube initiative, which supports transformative approaches to data management across the geosciences.
The project will demonstrate the benefits of a linked open data tool, known as VIVO, for managing scientific information and data. Developed by Cornell University Library in collaboration with a number of partners, VIVO is being used by over 100 organizations to create authoritative research profiles for faculty and staff as well as to link to their published studies and other relevant research. Other organizations, such as the Laboratory for Atmospheric and Space Physics at the University of Colorado Boulder, are extending VIVO to manage information related to scientific projects and research instruments.
Cold air pouring over the Bering Sea from the south coast of Alaska on April 7, 2013, formed these cloud streets, associated with parallel cylinders of spinning air. The Bering Sea Project is studying the potential effects of climate change on marine ecosystems across the eastern part of the sea. Observations from the project are informing a two-year study of data management. (Image courtesy NASA Earth Observatory.)
The project aims to adapt VIVO so it can be applied to large-scale field experiments involving many investigators from a wide range of institutions. This would create a network of information linking field experiments with particular datasets, authors, publications, and even research tools that result from or are associated with each experiment.
"Someone coming from the outside would be able to find a particular paper that emerged from a field experiment and very quickly track down datasets, instruments, researchers, and so on," said Matthew Mayernik, an expert on research data services in the NCAR/UCAR Library who is the principal investigator on the project. "This is really about increasing the traceability of research and making it easier for people to find, assess, and use data."
To demonstrate the effectiveness of the approach, Mayernik and his colleagues will use VIVO for data from two sources: a recent NSF-supported interdisciplinary field program whose data archive is hosted by NCAR’s Earth Observing Laboratory (the Bering Sea Project), and a set of diverse research projects informed by geodetic tools, such as GPS networks and ground-based imaging, that are operated and maintained by UNAVCO.
If successful, Mayernik said such an approach would be expanded to other field experiments, including their data sets, researchers, publications, and research resources.
WriterDavid Hosansky, NCAR & UCAR Communications
Collaborating institutionsCornell UniversityNational Center for Atmospheric Research/ University Corporation for Atmospheric ResearchUNAVCO
FunderNational Science Foundation (EarthCube initiative)

BOULDER—A program that provides unique data support to geoscientists worldwide will expand its services over the next five years, under a renewal of its grant with the National Science Foundation (NSF).
Unidata, managed by the University Corporation for Atmospheric Research (UCAR), provides atmospheric science data to university departments in near real time. Its services encompass a wide range of cyberinfrastructure technologies that make geoscience data more useful and accessible for scientists and educators at more than 3,000 educational, government, and research institutions worldwide, including 700 U.S. universities.
This 3-D depiction of the flow in and around 2008's Hurricane Gustav was created using Unidata's Integrated Data Viewer. Click on image to animate. (Visualization courtesy Unidata.)
Under the new award with NSF of up to $25 million, Unidata will tap emerging technologies to better serve the geoscience community. This includes using cloud computing in ways that will enable researchers worldwide to access data and collaborate more effectively with colleagues at distant organizations and across scientific disciplines in order to tackle major scientific challenges.
“We’re working to leverage the advantages of and advances in cloud-based computing paradigms that have emerged and become robust in recent years,” said Unidata director Mohan Ramamurthy. “The goal is to help advance scientific understanding of the physical world by better enabling scientists to extract knowledge from a deluge of observations and other data.”
By gathering information into a cloud environment, the Unidata approach will also reduce the amount of data that must be transferred over computer networks and ease the computing requirements at universities and other research organizations.
Unidata focuses on enabling scientists to better access, analyze, and integrate large amounts of data. It has also developed sophisticated tools to visualize information.
Although Unidata’s core activities focus on serving scientists and educators in the atmospheric and related sciences, virtually every project that Unidata undertakes has a broader impact on the geosciences community and society at large. Unidata-developed cyberinfrastructure is in wide use among U.S. federal agencies, private industry, and non-governmental and international organizations, including the National Oceanic and Atmospheric Administration, the Department of Energy, Department of Defense, and NASA.
More than 100,000 university students across the country are expected to use Unidata’s products and services, and hundreds of scholarly articles reference Unidata annually.
Professors and other Unidata users said its services are critical for geoscience education and research.
“Unidata provides the superhighway needed to connect my students to critical weather observations used for education and teaching in the atmospheric and related sciences,” said Jim Steenburgh, professor of atmospheric sciences at the University of Utah.
At Millersville University, scientists and education experts in the Earth Sciences and Computer Science departments used a Unidata analysis and visualization tool to create a 3-D virtual immersion experience known as GEOpod. This allows the user to navigate a virtual probe within a computer simulation of the atmosphere, capturing temperature, humidity, and other parameters while using navigational aids and tracking capabilities.
"With the help of Unidata, we can essentially bring students into a numerical weather model, helping them better understand the actual atmosphere as well as the modeling process,” said Richard Clark, chairman of the Earth Sciences Department at Millersville University.
Unidata is a community data and software facility for the atmospheric and related sciences, established in 1984 by U.S. universities with sponsorship from NSF.