Mapping the route to open data sharing

Librarians interested in advancing open science may find their researchers are reluctant to share their research data. A recent study found a disconnect between researchers’ perception of the importance of sharing data and their actions. By understanding the underlying issues, librarians can thoughtfully play a role in helping to advance data sharing at their institutions.

Mapping data sharing practices

The Elsevier Research Data Management team helped frame Elsevier’s research data principles, starting with research data should be made available free of charge to all researchers wherever possible, with minimal reuse restrictions. The team aims to support researchers in achieving this goal with data sharing tools and via collaborations with libraries, researchers, data centers and government agencies across the globe. In a recent collaboration with the CWTS (Centre for Science and Technology Studies) at the University of Leiden, we mapped the current landscape of data sharing practices. The year-long, large-scale study, along with the underlying data,1 is available online.

One third of respondents did not share their data at all. Source: Open Data: the researcher perspective - survey and case studies.

Uncovering the barriers to data sharing

Sixty-nine percent of the 1,167 respondents agreed that sharing data was very important in their field and 73 percent wanted to have access to other people’s data. However, only 37 percent believed there was credit attached to doing so, and only 25 percent felt they had adequate training to properly share their data with others. This illustrates the data sharing gap: although they would like to access other people’s data, researchers often do not have enough time, training or incentives to share their own data properly.

Respondents’ main barriers to sharing data were:

Privacy concerns

Ethical issues

Intellectual property rights

Training in data sharing practices

Mandates from publishers or funding agencies were largely not seen as a driving force for sharing.

This combination of lack of training in how to share data, concern regarding reuse and privacy, as well as a perceived lack of urgency in terms of mandates, drives the gap between desire and practice concerning data sharing. It is clear that a multi-faceted approach is needed to bridge this gap.

Implementing data citation standards

From the bibliometric analysis it became clear that although the number of citations to data journals is growing, they are still a small portion of the overall citations, and their adoption is quite domain specific. In addition, there is a lack of standards regarding data citation in regular journal articles, making it difficult to assess data citation and reuse. Acknowledgement sections do not provide consistent mentions of data sharing and use, limiting insight in how widely data is shared and used across domains.

To support incentives for data sharing, the Make Data Count project (spearheaded by the California Digital Library and DataCite) aims to develop a shared set of data metrics to give researchers and institutions credit for following proper data management and sharing practices. To further encourage data sharing, a set of clear and unavoidable data mandates are being developed by funding agencies that move beyond the requirement to create a Data Management Plan (DMP). For more information on funding agencies’ current mandates, see Stony Brook University Library’s helpful overview and the California Digital Library’s detailed overview within the DMPTool.

Looking beyond mandates, publishers, libraries and member organizations are gathering to provide support for data sharing practices through guidelines and tools. Some of these initiatives include:

Development of Machine-Actionable DMPs — These can support the monitoring of data sharing practices within institutional data management systems, and check whether planned data sharing and storing has occurred. Several groups are developing formats to make these easy to implement in current systems, and offer a guided and transparent model for enabling best practice for data sharing across the various disciplines, including the UK’s Digital Curation Centre, the FAIR DMP Group at Force11, and the recently announced NSF EAGER grant, awarded to the California Digital Library for developing Actionable DMPs.

From the domain case studies, we found that national and regional differences in data sharing practice hampered widespread sharing and reuse, because laws and customs differ in regions and countries. The good news is that collaborative research projects naturally enable and enhance data sharing and storing practices, because of their distributed nature.

In some fields, data sharing practices are engrained within the research practice. For example, in digital humanities sharing code through Github is endemic, and code sharing easily translates to data sharing. In other fields, such as human genetics, the fact that raw (sample) data and processed (analyzed) data were used by different individuals at different moments made for an “endemic” data sharing structure, which can be used to scale up sharing and publishing practice.

Leveraging librarians’ expertise

Our study has shown that data sharing is very much a practice in flux: there is a perceived need for better ways to share more data, but still a lack of standards, drivers and training to do so. Many survey respondents were unaware of institutional and funder requirements around data sharing, and were concerned by the additional time needed for data sharing and reporting. Along with other key stakeholders, librarians can play a critical role by:

Facilitating a better shared understanding of ownership (38 percent of surveyed researchers erroneously believed ownership of their research data shifted to publishers after publication) and licensing, and issues of responsibility and control when it comes to research data

Training researchers in the use of those tools and advising in the how-tos of research data management

The Elsevier Research Data Management team is interested in supporting these efforts by librarians and others at the institution to further these goals. We cordially invite librarians to discuss partnerships for driving community participation in data curation, storage, and sharing practices. To further these goals we are developing a suite of tools, including DataSearch and Mendeley Data, to improve research data management practices. We are also working on a basket of data metrics to ensure researchers receive credit for data sharing and will be sharing more on that initiative soon. We invite librarians interested in these issues to comment below or contact us at anita.dewaard@reedelsevier.com or h.cousijn@elsevier.com.