Last week Google announced the launch of its Dataset Search Tool, which represents a key milestone in the increasingly widespread use of open web approaches to the exploration of scientific data sets. The advent of this sure-to-be widely used tool is a first and encouraging step in enabling rich and expressive data discovery approaches, and EarthCube is ready to leverage this new capability for the benefit of the geosciences.

Leveraging existing tools and keeping data FAIR

During its development, Project 418 has been in communication with Google and others across the Earth Science community to ensure alignment of recommendations and to leverage existing work and experience.

Google built its Dataset Search Tool upon years of development work by commercial, academic, and non-profit groups. One of the most notable aspects of the new tool is Google’s use of established web architecture patterns and vocabulary work such as Schema.org in its efforts to make scientific datasets more discoverable. This development further spotlights ways in which commercial sector best practices can be brought to the scientific community to address search and discovery goals, such as those described by the FAIR data principles (Findable, Accessible, Interoperable, and Reusable).

The Schema.org vocabulary is an open and collaborative community of commercial and other participants. It is a broad and general schema but has built-in extensibility to enable detailed domain-specific capacity, among them, data sets. As data become more mobile, the range of community standards and vocabularies describing data becomes apparent, and the growth in the amount of geoscience data on the internet is paralleled by the need to address issues of FAIR data.

EarthCube has been promoting its own dataset search tool for NSF funded data facilities

The EarthCube program, funded by NSF, has been directly involved in supporting data discovery among NSF funded data facilities. Through its seed project known as “Project 418”, EarthCube has directly engaged a set of 10 NSF data facilities, scoping nearly 48,000 datasets, to demonstrate and highlight to the community how they can employ and be part of this exciting development.

Project 418’s key focus was to provide enhanced capabilities around domain-specific needs of the various Earth science communities by extending the vocabulary aspect of Schema.org and extensions by a set of NSF data facilities. Another focus was on describing data set resources and evaluating the use of this structured metadata to address discovery.

The work of P418 complements and engages a growing Earth sciences community that is developing practices around connecting vocabularies and describing data using Schema.org. These combined approaches leverage common patterns and standards and provide an open approach that can be leveraged by commercial and academic sectors alike.

P418 alignment with Google Dataset search tool

Project 418’s approach directly aligns with and enables NSF data facilities being indexed and discoverable by the Google Data Search Tool. Additionally, due to the open and standards based approach, this same work enables any other commercial, academic or non-profit group to leverage the exact same resources to provide alternative and even enhanced search options to various communities. The Project 418 team was able to accomplish this by maintaining close communication with Google and others across the Earth Science community during its development to ensure alignment of recommendations and to leverage existing work and experience.

The NSF funded data facilities involved in Project 418 that are present in the Google Data Search Tool include BCO-DMO, Open Topography, UNAVCO, IRIS, and others.