Earth Science Data Analytics 101

Abstract/Agenda:

The broad set of techniques called Earth Science Data Analytics (ESDA) has a clear meaning to everyone, though the meanings often differs depending on the various uses of the data. Data Analytics discussions can range from developing custom code for discovering the signatures in data to leveraging tools that enable predictions to be derived from heterogeneous datasets. This session, uniquely presented by field experts, attempts to introduce the scope, complexities, and possibilities presented by ESDA to further facilitate Earth science.

Guest speakers during this session will describe data analytics use cases that they employ in their work. The goal of this session is to help organize and stimulate Fedreration partners in thinking about how we can facilitate Earth Scien ceData Analytics through Information technologies and tools.

Thanks to all who attended this session, interested in learning more about Earth Science Data Analytics (ESDA). We hope the session captured some of your interest in participating to further mature this relatively new area of Earth data and information science. If you are interested in providing small amounts of time to pursue the activities of this cluster (See ESDA 201 Session notes - http://commons.esipfed.org/node/2723), please e-mail Steve Kempler ([email protected]) to express your interest, and in particular ideas/thoughts on the subject.

The ESDA Cluster, attracting a lot of interest, continues to ’churn’ through the process of maturing their understanding and impacts of this new paradigm: Data Analytics and Data Science. Session participants, reflecting the purpose of this session, were in attendance to ‘learn’ what Data Analytics means in the Earth science context. And to discuss the goal of the ESDA Cluster through Federation science and technology expertise:

2. Provided real use case of how different types of data analytics are employed

3. Provided an view into ESDA technologies available

All presentations are available through links provided below.

Steve Kempler’s introductory presentation message was that although Earth science users have been working with heterogeneous datasets for a while, and technology has been accommodating usage capabilities, what is new is the need to advance and implement the ability to provide infrastructure, technologies, and tools, to efficiently analyze data and information in order to extract knowledge. This is best addressed through:

Data Preparation – Making heterogeneous data so that they can ‘play’ together

Data Reduction – Smartly removing data that do not fit research criteria

Data Analysis – Applying techniques/methods to derive results

Data Analytics Definition: The process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information.

Dave Bolvin discussed the techniques utilized in merging datasets from different sources: Inter-calibrate based on relative quality, morphing, forward/backward propagation, Kalman Filter… you should really see Dave’s presentation, linked below, to derive a high quality product. This might be considered multi-dataset Descriptive Analytics.

David Gallaher followed with a discussion on how he extracted Nimbus data from 1960’s vintage magnetic tape to examine sea ice extent compared to sea ice extent measured by Terra/MODIS. David’s work might be considered single-dataset Descriptive Analytics.

Thomas Hearty’s analysis might be considered Diagnostic Analytics. His work entailed comparing Total Precipitable Water Vapor from two sources: AIRS instrument and MERRA reanalysis. Thomas walked us through the steps he took match up the datasets (e.g., co-register) to understand why measurements were not matching, lending his results to improve AIRS processing algorithms, and thus providing a better product.

Presentations were followed by short discussion focused on the depth and breadth of what performing Earth science data analytics might include. Most noteworthy, was the introduction of the UV-CDAT Project, climate data analysis tools, developed by a team led by Lawrence Livermore National Laboratory (http://uvcdat.llnl.gov/index.html).

Actions:

This was an information/learning session. The only intended action was to participate in the next ESDA Session, ESDA 201. See ESDA 201 (http://commons.esipfed.org/node/2723) meeting link to review the ESDA cluster discussion and activities, and resulting actions.