CINERGI

Community Inventory of EarthCube Resources for Geosciences Interoperability

Science Challenges

Finding appropriate data is a difficulty that has been most often articulated by geoscientists during EarthCube end-user workshops. It becomes especially challenging when researchers work on interdisciplinary problems. Researchers finding and interpreting data across domain boundaries have to deal with unfamiliar terminology and research designs, implicit measurement assumptions, and disparate metadata and data formats. Despite the wealth of geoscience information available in digital form, and a plethora of databases, services and data portals already developed, there is no single inventory of available information across domains. The goal of CINERGI is to compile such an inventory, developing mechanisms to ensure that different resources have consistent and easy-to-interpret descriptions, traceable origins, and documentation that is as complete as possible. The scope includes datasets commonly catalogued by many organizations, as well as documentation for catalogs, vocabularies, data services, process models, repositories, etc. This inventory will help researchers answer both relatively simple and complex queries - in the latter case possibly requiring several iterations and a link to a domain data catalog for additional search options.

Technical Approach

Compiling and curating a large inventory of geoscience information resources requires integration of metadata records from standards-compliant catalogs maintained by domain data facilities and large projects, and information about data sources that are used and/or generated by multitude of smaller research projects, typically referred to as the “long tail of science”. While there are a relatively limited number of protocols for harvesting metadata from such catalogs, we found little consistency in metadata content. To address this challenge, we are developing a CINERGI metadata processing pipeline: metadata documents are harvested using a number of adapters, loaded into a staging database, validated against content standards, and then processed to improve metadata content before being republished via a standard interface. Metadata enhancements include checking and validating spatial extent, assigning an extent based on available information if applicable; analyzing and adding keywords to make the metadata easier to discover across domains; making the dataset title more descriptive; correcting temporal extent as needed; validating organization names against standard vocabularies; and adding standard thematic category and resource type classification terms. As the enhancers change the content of the record, a corresponding provenance record is being created and made accessible via CINERGI search interface.

Science Drivers

Assembling and validating a large collection of geoscience metadata cannot be done without direct involvement of many groups of geoscientists, who specify which data resources are important for members of their domain, which metadata elements are important to expose for cross-disciplinary search, and validate assembled metadata and query responses. This engagement comes in several forms: (1) working with EarthCube Research Coordination Network (RCN) projects to jointly assemble resources used in their domains and make them searchable through the CINERGI system, (2) describing and registering resources mentioned by geoscientists in the course of EarthCube end-user workshops, in responses to EarthCube member surveys, and appearing in similar inventories; (3) interacting with managers of domain data facilities, (4) registering resources developed by EarthCube partners, and (5) exploring more complex query scenarios through collaboration with several geoscience researchers – in paleogeology, hydrology, and critical zone science.

Benefits to Scientists

CINERGI will reduce the burden of finding, interpreting and evaluating fitness-for-use of different types of information resources, across geoscience domains. A number of geoscience data facilities and projects - in geochemistry, hydrology, ocean sciences, ecology and other fields - have developed excellent data repositories and metadata catalogs: CINERGI will enable accessing them via a single standards-based catalog interface, and improve metadata descriptions to make data discovery more uniform and less time consuming.

Resources

There are multiple ways to access or contribute to the CINERGI inventory list, choose the one that is best for you. The following is a selection of the newest interfaces. The application is in development, and changes are expected. To see all registry viewers that have been developed, including legacy Silverlight-based interfaces, see the CINERGI viewer page.

CINERGI Resource Inventory (a large, broad, inventory of resources harvested from catalogs and community contributions), see metadata documents after enhancement through the CINERGI pipeline. The metadata search portal is under construction, and its content is constantly changing as we refine the underlying ontology, update the metadata processing pipeline, and bring in more harvested data.

CINERGI Viewers: see them on a separate page. The main resources are also listed below:

High-level Resource Catalog: a continuously updated collection of information resources of different types suggested by geoscientists: High Level Resource Catalog

CINERGI Community Resource Viewers: community-built, domain-specific viewers for searching, updating and expanding community resource catalogs, a product of our joint work with several Research Coordination Network projects):

Student posters. In summer 2014 we hosted 6 high school students from San Diego who worked on various aspects of CINERGI metadata compilation: Anoushka Bose, Cole Pavelchek, Nick Lograsso, Erica Liu, Amar Haqqi, and Grace Chen. This work was supported by our undergraduate interns Raquel Calderon, Azfar Alam and Nick Nizhnikov. The REHS (Research Experience for High School) posters about these projects are [pdf], [pdf], [pdf]. In 2015, three new high school students worked on CINERGI during the summer: Divya Mohan, Ibrahim Ali and Edric Xiang. See their posters, focused on machine learning for better named entity recognition, and on a metadata crawler, here: [pdf], [pdf]. This work was supported by our undergraduate interns Alice Giliarini, Aaron Gong and Adam Schachne.

EarthCube is a collaboration between the Division of Advanced Cyberinfrastructure (ACI) and the Geosciences Directorate (GEO) of the US National Science Foundation (NSF). For official NSF EarthCube content, please see: http://www.nsf.gov/geo/earthcube/.