Data Provenance, Curation, and Storage

From the Department of Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD (M.E.A., S.C.R.); and Department of Physiology and the Program in Cellular and Molecular Medicine, The Johns Hopkins School of Medicine, Baltimore, MD (M.E.A.).

From the Department of Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD (M.E.A., S.C.R.); and Department of Physiology and the Program in Cellular and Molecular Medicine, The Johns Hopkins School of Medicine, Baltimore, MD (M.E.A.).

This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.

High integrity data retention and curation are critical for preserving the scientific record and informing future discovery.1 However, these steps are often neglected or inadequate because of lack of a tractable, easily operated approach. We offer general guidelines and an exemplar method that is applicable to many, but by no means all, laboratories.

Data Retention and Provenance

Data generated from National Institutes of Health funding should be stored for 3 years after the end of the last competitive renewal. In some cases, data related to patients and patents has longer storage obligations. Data storage rules are in evolution and may differ among various funding agencies, institutions, and journals. The data belong to the host institution, but the responsibility for storage (ie, stewardship) is typically transferred to individual investigators, many of whom have insufficient understanding of or infrastructure for this important role.2 While authors are routinely asked to affirm their accountability for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved,3 there is no standard infrastructure for sharing all primary data among all authors concurrently during drafting of an article, much less after publication.

If you have worked in a laboratory for much time, you will know it is sometimes difficult to locate original data. Some of these challenges are magnified by moving laboratory locations, storing data on proprietary and outmoded platforms, lack of a clear paper trail after laboratory personnel move on, or multiple collaborators generating data at various sites. In general, older data are harder to follow compared with newer data. These truths became starkly evident to me (M.E. Anderson) after a former laboratory member was discovered to have engaged in scientific misconduct. While there was no doubt about his transgressions involving manipulated …