Community question from the contact form:
DataONE collects metadata from its member nodes. Those nodes accept data with different metadata schemas relevant to the submitting community's expertise. I read that DCMI is acceptable, as is FGDC, EML, etc. How does DataONE facilitate discovery and make sense of the disparate schemas without losing important information? Do you have the mother of all crosswalks???

2 answers

The way in which DataONE functions is that each individual data repository (we call them Member Nodes once part of our network) installs Member Node repository software that enables the repository to communicate with our Coordinating Nodes. There are a couple of different products available (Metacat, GMN) which can be used as is, or (in the case of our “Generic Member Node”) configured to interoperate with an existing repository system. There is also the potential for custom development. Metacat uses Ecological Metadata Language (EML) as the metadata format while the GMN accepts multiple formats, as you mentioned. You can read more about the GMN here: https://www.dataone.org/software-tool.... Also, information on our Member Node deployment routes can be found at: https://www.dataone.org/member-node-d....

The DataONE Coordinating nodes host a complete copy of the metadata from each of the Member Nodes and support indexing and replication services. They do not host he data themselves, these remain on the Member Nodes.

Data is discovered by running a search query against the metadata catalog in the Coordinating Nodes via DataONE Search (search.dataone.org). When data is downloaded through the results page of DataONE Search, those data come from the Member Nodes.

DataONE supports community metadata standards through a series of crosswalks that map each metadata standard to a common SOLR schema. For example, even though the title of a data set is found in different locations in FGDC, EML, and ISO19139, all are mapped to a common title field in SOLR. The SOLR schema that we map to is described in the DataONE Architecture documentation. The mappings from common metadata standards are also described for EML, FGDC, and Dryad as examples.

When an incoming metadata document is received, it is parsed and key fields from the metadata are extracted and indexed in SOLR according to the crosswalk, making them available in the metadata search service, and through https://search.dataone.org.

The SOLR metadata crosswalk among the various metadata standards that DataONE indexes does not contain all fields from all schemas. Each of the original metadata documents is available for download, so all metadata are preserved, but viewing specialized fields may require downloading the full metadata document for a data package.