Data Sharing Under the Genomic Data Sharing (GDS) Policy

Data sharing allows data generated from one research study to be used to explore a range of additional research questions. Enabling the combination of data from multiple projects amplifies the scientific value of data.
NCI supports and complies with all NIH data sharing policies. The NIH Genomic Data Sharing (GDS) Policy was issued to:

promote broad and robust sharing of human and non-human data from a wide range of genomic research

ensure appropriate protections for research involving human data and oversight of research conduct, data quality, data management, data sharing, and data use

Share Genomic Data With NIH/NCI Repositories

Because of the variation in how NCI intramural and extramural operate, the process for data submission will be different depending on whether you are an Intramural Investigator or an extramural grantee.

Data Sharing Expectations

Data reuse is facilitated when the data conform to accepted GDS data sharing practices. This helps minimizes potential errors from misunderstanding the data or metadata. Those depositing data to GDS repositories are encouraged to utilize existing, well-documented data standards to help ensure the quality and usefulness of the submitted datasets, and create a more efficient process.

GDS data sharing practices

Terms for disease, cell type, tissue type, and other annotations should be linked to the NCI Thesaurus (NCIt).

Wherever possible, use existing common data elements (CDEs). For clinical specimens, the same data elements reported to clinicaltrials.gov are required.

Data should generally be submitted once it has been cleaned (e.g., the analytical dataset is finalized).

Data pertinent to the interpretation of genomic data—such as associated phenotype data (e.g., clinical information), exposure data, and descriptive information (e.g., protocol or methodologies used) should be shared. Metadata around the experiment or study and annotations that are necessary to reproduce any published table or analysis must be included with genomic data submissions.

Examples of Data Submission Formats

Different data types undergo different levels of data processing, which determine expectations for data submission and data release. Please work with your program officer to determine specific data submission requirements as they may differ based on individual program and data type.
The Office of Science Policy provides the following guidance by level of genomic data:

Level 0: Raw data generated directly from the instrument platform.

Level 1: Initial sequence reads, the most fundamental form of the data after the basic translation of raw input

Level 2: Data after an initial round of analysis or computation to clean the data and assess basic quality measures

Level 3: Analysis to identify genetic variants, gene expression patterns, or other features of the dataset

Level 4: Final analysis that relates the genomic data to phenotype or other biological states

Metadata: Information around the experiment or study

Table 1 describes examples for each level. NIH will review these expectations at regular intervals, and will publish updates on the GDS website and notify the research community through appropriate communication methods (e.g., NIH Guide for Grants and Contracts).
Note that necessary information to interpret controlled-access genomic data, such as study protocols, data instruments, and survey tools, should be submitted to share on an unrestricted basis (i.e., through unrestricted access) concurrent with the relevant Level 1, 2, 3, or 4 genomic data.

Data Type

Level 1

Level 2

Level 3

Level 4

SNP array data from > 500K single nucleotide polymorphisms (SNPs)

(e.g., GWAS data)

.CEL

.TXT

.IDAT

Note: submission of .IDAT files for human sample data will be decided on a case-by-case basis

Study metadata and annotations necessary to reproduce any published table or analysis must be included with genomic data submissions. In particular, data pertinent to the interpretation of genomic data are expected to be shared such as:

associated phenotype data (e.g., clinical information)

exposure data, relevant metadata

descriptive information (e.g., protocols or methodologies used)

NIH/NCI Genomic Data Repositories Help Resources

Investigators may use the following resources to submit datasets to National Institute of Health, National Cancer Institute, and National Center for Biotechnology Information (NCBI) data repositories. For additional questions about data sharing, please contact the NCI Office of Data Sharing (NCIOfficeofDataSharing [at] mail.nih.gov).

Extramural Programs DSP:

Extramural investigators submit their DSP as part of their funding application. DSP requirements should be discussed as early in the research planning process as possible. The approved DSP should be submitted at Just-in-Time (JIT), along with the Institutional Certification. Program Officers must approve the DSP prior to funding.

Intramural Programs:

Intramural investigators submit their DSP in accordance with scientific review. Differences in study type (e.g., studies involving model organisms) and how scientific review takes place within the NCI intramural research programs will dictate when the DSP can be reviewed.

Prospective Scientific Review: The DSP should be submitted to, and reviewed by, the scientific director (SD), or delegate, and genomic program administrator (GPA) at the time the funding decision is made.

Institutional Certifications

The Institutional Certification assures that projects planning to submit genomic data to NIH will meet the expectations of the GDS policy. The certification, provided by the principal investigator and the institutional signing official (SO) of the submitting institution, clearly delineates any “data use limitations (DULs)” on the research use of the data, as agreed to in the informed consent documents signed by study participants.
For multicenter studies (with samples collected at several institutions), NIH understands that the submitting institution is not necessarily the local institution or IRB of record for all sites. However, the submitting institution should assure NIH that it believes, based on either its own review or assurance from other institutions, that the expectations of the policy are met for the entire dataset. Institutions may choose to collect and submit a single-site certification from each site contributing samples or submit a multi-site certification. The Institutional Certifications for both intramural and extramural studies can be found on the GDS website.
An Institutional Certification should be submitted at the earliest possible point in time. The certification should be provided to NCI prior to award, along with any other JIT Information (for extramural researchers) or at the time of scientific review (for intramural researchers).