Integrating Research Data management and digital preservation systems at the University of Sheffield

The University Library at the University of Sheffield is taking the leading role in supporting the active management and curation of research data within the institution. We have recently implemented a research data catalogue and repository, ORDA (Online Research Data, powered by Figshare for institutions). We have also begun safeguarding library collections and key administrative assets of the University using ArchiveUs, the Sheffield brand of Rosetta, a digital preservation platform from Ex Libris. We are now working with figshare and Ex Libris to integrate both services to provide seamless preservation of published research data across the research lifecycle.

In the longer term, this work will enable us to provide a complete lifecycle data management service for the university’s research community; identify, understand and act on risks associated with preserving data sets; better inform advice and guidance around use of data formats for sharing and preservation purposes; and encourage researchers to share their data more openly with others by guaranteeing the long term sustainability of that data.

On a technical level, initial integration work is focussed on the use of OAI-PMH protocol to use METS packages to allow efficient transfer of data and metadata. While figshare remains the interface for researchers and external users, ArchiveUs will act as a dark archive, giving us a secure additional preservation copy of all data published in ORDA. ORDA will display a badge that confirms each item which has been preserved in Rosetta.

File formats issues

One of the main issues found thus far in the project has been that of file formats. Unlike the relatively stable and expected file formats of research output, research data has the tendency to be presented in niche and proprietary formats. Of the material currently deposited in ORDA, only a small percentage was recognised by a Droid survey. It remains to be seen what the reason for this is; whether this is due to these particular file formats currently being absent from the PRONOM registry or perhaps due to the way files have been created. By investing some time in identifying and confirming some of these uncommon file types we plan on being able to offer reliable examples to relevant directories, which will also hopefully be of use to the wider digital preservation community.

We also plan to improve the quality and volume of metadata accompanying research data. Since ORDA was launched researchers have in many cases submitted material with a lack of accompanying structural metadata. This obviously has the potential to cause future data access issues and is something we are investigating possible solutions to, whether through manual deposit requirements or automated processes.