69 Genomes Data

A diverse data set of whole human genomes are freely available for public use to enhance any genomic study or evaluate Complete Genomics data results and file formats. These include 69 DNA samples sequenced using our Standard Sequencing Service, which includes whole genome sequencing, mapping of the resulting reads to a human reference genome, comprehensive detection of variations, scoring, and informative annotation. The 69 genomes data set includes:

A Yoruban trio

A Puerto Rican trio

A 17-member CEPH pedigree across three generations

A diversity panel representing unrelated individuals from nine different populations

The CEPH samples within the pedigree and diversity sets are from the NIGMS Repository; the remainder of the samples is from the NHGRI Repository, and both are housed at the Coriell Institute for Medical Research. These samples were sequenced with an average genome-wide coverage of 80X (a range of 51X to 89X).

For each publicly available genome, the following whole genome information is provided:

Variation Reporting, including:

SNPs, small insertions, deletions, substitutions, and complex small variants

Copy number variations (CNVs) and structural variations (SVs)

Mobile element insertions (MEIs)

Annotations, including:

Genes and functional impact

dbSNP

Minor allele frequency (MAF) from the 1000 Genomes Project

Non-coding RNA

Known segmental duplication

Type of SV

Extensive scoring

Evidence files, including read support information for the variation

Per base-pair coverage for each position in the reference genome

Additionally, Complete Genomics provides aggregate information for the small variants identified across the 69 genome data set as well as a 55 genome subset that excludes any closely-related individuals. These include:

A list of all of the variants in the set of genomes, generated using the CGA Tools (v1.5) listvariants command

A table indicating which variants is present in which genomes generated using the CGA Tools (v1.5) testvariants command.

A table indicating which variants is present in which genomes, including variant frequency, in VCF format.