Research Data Repository Interoperability

You are here

Research Data Repository Interoperability

The initial idea of establishing this working group was presented during P6 in Paris in the Repository Platforms for Research Data IG session. Shortly after P6 a telephone conference was carried out with the conclusion to prepare a case statement and to finalize it during a BoF session at P7. The initial co-chairs are David Wilcox and Thomas Jejkal. Contacts to potential co-chairs from Asia were already made during P6 and will be finalized during P7.

The Research Data Repository Interoperability Working Group will establish standards for interoperability between different research data repository platforms focusing on machine-machine communication. These standards may include (but are not limited to) a generic API specification and import/export formats summarized in a document serving as an implementation guide for adoption. The scope of this document and all the WG’s activities will be defined by the following list of initial use cases:

Migration/Replication of a Digital Object between research data repository platforms

Platform, data model and/or version may differ between source and destination

Retrieval of information related to the platform and/or its contents

E.g. to register the system in a (repository) registry or to harvest contents

This initial list might be extended in the first phase of the WG’s operational time.

In order to cover these use cases, existing standards and technologies will be identified and evaluated in the second phase. Evaluation results will be summarized in a separate deliverable and will form the basis of the final deliverable. During the evaluation phase, the preparatory work of other RDA WGs will be used as far as possible along with experiences gathered by the RDRI WG’s members during their work with and on existing research data repository platforms.

In the final phase the WG will strive for a consensus regarding a generic API specification and/or import/export formats needed for offering the listed functionalities. The final deliverable will then contain this consensus in a form such that it can be used as an implementation guide for later adoption.

Value Proposition

The Research Data Repository Interoperability working group will provide recommendations and implementation guidelines (e.g. for a generic API or import/export formats) for research data repository interoperability that can be integrated by platform developers and service providers. Therefore, existing standards and technologies will be evaluated and integrated where possible. Once adopted widely, these outcomes will allow institutions and organizations with research data repositories to deposit, access and share their data in a common way and to disseminate repository resources and contents to clients and services easily. For adopters and their users this means:

Removing Barriers: Defining and implementing interoperability standards for realizing the use cases mentioned above could help to identify and to acquire datasets stored in other platforms not available before in order to enrich the own research.

Easier Collaboration: Having a common way to exchange datasets stored in different research data repository platform instances from different institutions or even disciplines can help to identify new starting points for (inter-)disciplinary collaborations.

Creating Commonalities: Agreeing on and implementing common standards for realizing typical research data repository tasks might bring adopters closer together. For the future this could result in fruitful collaborations extending the basic set of functionalities that have been proposed by this WG.

As everything rises and falls with the adoption of the results, repository platform developers contributing to this group have agreed to implement the results as early adopters.

Engagement with Existing Work

A number of related standardization efforts have already taken place; for example, the OAI protocol for metadata harvesting, the SWORD protocol for repository deposits, and the re3data.org schema for collecting information on research data repositories for registration. The Research Data Repository Interoperability WG will review these and other related standards to see how they might be adopted or extended to support our goals. This review period will ensure that we do not duplicate existing efforts.

Presentation of the specification draft at P10 and identification of open points and potential improvements.

Session participants in an open discussion

September – March 2018

Find consensus regarding final specification and write final deliverable serving as implementation/adoption guideline.

Registered members/co-chairs (writing)

March 2018

Present final results at P11.

Co-chairs

Deliverables

D1. Research Data Repository Interoperability Primer (M6): This document describes targeted use cases, needed functionalities, as well as existing technologies and their feasibility for adoption. Gaps not covered by existing technologies are also described in this document.

D2. Interface Specification Draft (M12): A first draft document of the final specification. The document gives a basic overview of functionalities, exchange formats and intended behavior targeted by the WG to cover the defined use cases. This document will be the basis for finding a consensus between all WG members.

D3. Interface Specification (M18): This specification represents a consensus of all partners regarding an interoperable repository interface. It describes all functionalities provided by this interface including exchange formats and the expected behavior of a repository platform implementing the interface. This document serves as guideline for adopting the results of this working group.

Mode and Frequency of Operation

The Research Data Repository Interoperability WG will primarily communicate asynchronously online using the mailing list functionality provided by RDA. Online voice meetings will be scheduled as needed; likely once per month. When possible, in-person meetings will also be scheduled; these will take place at RDA plenaries and at other conferences where a sufficient number of group members are in attendance.

Addressing Consensus and Conflicts

Group consensus will be achieved primarily through mailing list discussions, where opposing views will be openly discussed and debated amongst members of the group. If consensus cannot be achieved in this manner, the group co-chairs will make the final decision on how to proceed.

The co-chairs will keep the working group on track by setting milestones and reviewing progress relative to these targets. Similarly, scope will be maintained by tying milestones to specific dates, and ensuring that group work does not fall outside the bounds of the milestones or the scope of the working group.

Community Engagement

The working group case statement will be disseminated to mailing lists in communities of practice related to research data and repositories in an effort to cast a wide net and attract a diverse, multi-disciplinary membership. Group activities, where appropriate, will also be published to related mailing lists and online forums to encourage broad community participation.

Adoption Plan

Representatives of several major repository platforms have already joined this working group, including:

These representatives have agreed to consider implementing the standards recommended by the Research Data Repository Interoperability WG in their respective repository platforms. We will continue to seek representatives from a variety of repository platforms and services to ensure that this working group’s deliverables are widely adopted.

I think this group is a very interesting idea. Congratulations and count with me for P8. However, on reading the Related RDA Groups, I am really missing its relationship with metadata WG. I think we should look for sinergies.

Of course, there is also overlap with other RDA groups not explicitly mentioned in the Case Statement and there are definitely contact points with the metdata groups. Therefor, it would be great if we could stay in contact for information exchange.

What is outlined is obviously an essential stage in realizing All Data for All Researchers, but there is rather more to it than enabling repositories to contact and communicate seamlessly. That is only one small step, and it is a long way down the stream of eventual confluence of 'dissimilar' data-sets. In theory it sounds great, but how is it going to work in practice, and to what uses can merely transported data be put when there are so very many other variables in the works? Different sciences use different interpretations of the same word to describe features of their observations. It isn't as if all researchers use identical computers and identical reduction software or modelling tools. Data formats sound misleadingly alike, but can refuse to conform even within the same science. A simple example: different instruments will deliver either fluxes or intensities, or some machine-uncorrected version of either, and it can be crucial to sort that kind of trivial-sounding matter out before drawing erroneous conclusions. Of course, you will reply, all those things will be properly sorted out in due time. But when, and by whom, and will they all be? It only takes one publication to present wrong conclusions that resulted from not fully understanding the subtle differences between different types of data to place the whole effort in jeopardy. I therefore believe that, while the topic in question is worthy of deep consideration, it does also need to be placed very precisely within its rightful place along the whole chain of actions from inter-departmental agreements on format unification, language unification and metadata unification, via inter-university or country-wide or international agreements of the same kind, with ample trials and feedback at every stage and involving users at every stage, until it could be claimed that the data scientists have done their work thoroughly. It will then also take inordinate amounts of dedicated time for the other half of the population, the users, to come up with their own judgements at every step. All of that cannot be swept up in one RDA IG for 'interoperability', though trying to get a full perspective of the total procedure will help to place the intentions of this particular (would-be) IG more nearly into its correct context.

I have the same concerns as Elizabeth. To achieve something within a short timeframe you will need to keep the scope narrow and focused. As Elizabeth points out it is a big issue. But we have to start somewhere. Perhaps there are some outputs around general principles and approaches rather than specific solutions for every situation.

From your perspective, having the overall goal of "All Data for All Researchers" in mind, I totally agree with you. This is something a single WG can impossibly achieve. Of course, the proposed WG contributes only a very small piece to the ultimate vision of sharing every data with everybody. However, we think that this small piece is worth to be tackled and may contribute (on a more technical level) to improve data sharing and exchange.

All other aspects like format, language and metadata unification are out of the scope of this WG, but if there are recommendations of other groups, from inside or outside RDA, in these directions these recommendations will be definitely taken into account as far as possible.

I believe that this WG's proposed undertaking is a very worthwhile effort, having personally encountered the challenge of "Research Data Repository Interoperability" (or lack thereof) in investigating how to mirror data submitted to a data visualization platform, for interactive access to data (namely, opendata.american.edu) into a data archiving platform (namely, dra.american.edu).

Disclosure: I am one of the co-chairs of the Repository Platforms for Research Data IG (with David Wilcox & Ralph Müller-Pfefferkorn).

-- Stefan

P.S.: there seems to be a glitch in this platform - I posted this comment on June 3, but it was datestamped May 19, the same date that the case statement was posted for review. As are all the other previous comments.