Workshop on Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution

January 29-30, 2015

Washington D.C

Software is as essential as data in the modern practice of science. When scientists share with each other not only research results, but also data and software, it vastly amplifies the reach, relevance, and transparency of science. Yet there are substantial social, systemic, and technological barriers that prevent scientists from sharing data and software. Scientific researchers – particularly academics – are embedded in a reputation economy in which tenure, promotion, and acclaim are achieved through influential research results. Tenure and promotion decisions are typically blind to a researcher’s contributions to shared data or software, despite the crucial role of these activities in the scientific endeavor. Compounding the problem, there are no standard practices for citing data and software, giving appropriate credit to contributors, or measuring the impact and value of data and software contributions. Although numerous data and software sharing repositories exist, each uses a slightly different approach and many scientists still distrust the public access model, preferring to share data and software only by personal request, which assures attribution through personal contact and implicit social contract but substantially limits the reach and benefit of shared data and software.

The research community urgently needs new practices and incentives to ensure data producers, software and tool developers, and data curators are credited for their contributions. This National Science Foundation (NSF)-sponsored workshop facilitated a national, interdisciplinary discussion and exploration of new norms and practices for software and data citation and attribution to inform the Software Infrastructure for Sustained Innovation (SI2) and Science of Science and Innovation Policy (SciSIP) NSF programs. Participants identified social and technical challenges facing current software development and data generation efforts and explored viable methods and metrics to support software and data attribution in the scientific research community. A consensus throughout the workshop was a strong sentiment that it is time to move beyond discussion of the issues and begin to establish pilot projects that endeavor to implement and experiment with actionable ideas. Section 3 of the workshop report presents a full listing of actionable plans discussed at the workshop; highlights of these include:

Request that publishers and repositories interlink their platforms and processes so that article references and data set or software citations cross-reference each other.

Request that the research community develop a primary consistent data and software citation record format to support data and software citation.

Data and software repository landing pages should describe the full provenance of the data using appropriate standards.

Authors should be able to cite data and software in their articles at an appropriate level of granularity.

Federal funding agencies should support an effort to convene key players to identify and harmonize standards on roles, attribution, value, and transitive credit (in an extensible framework). All key sponsors would be recognized.

Agencies, publishers, societies, and foundations should fund implementation grants to identify and measure data and software impacts in a way that is relevant to stakeholders and research communities.

This material is based upon work supported by the National Science Foundation under grant number 1448360 with additional support from the Alfred P. Sloan Foundation. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the Sloan Foundation.