Gorilla genome - data download

Gorillas, the largest living primates, are humans' closest living relatives after chimpanzees, and are important for
the study of human origins and evolution. They are found today only within several endangered populations in the
equatorial forests of central Africa.

We generated a reference genome assembly for gorilla using DNA sampled from a single individual - a female western
lowland gorilla (Gorilla gorilla gorilla) named Kamilah, resident at San Diego Zoo. We collected 5.4 Gbp of
capillary sequence and 166.8 Gbp of Illumina read pairs, and combined both data sets in an initial hybrid de
novo assembly. Improvements in long-range structure were guided by human homology, placing contigs into scaffolds
wherever read pairs confirmed collinearity between gorilla and human. Base-pair contiguity was improved by local
reassembly within each scaffold, merging or extending contigs using Illumina read pairs. Finally we used additional
Kamilah bacterial artificial chromosome and fosmid end pair capillary sequences to provide longer range scaffolding.
Base errors were corrected by mapping all Illumina reads back to the assembly and rectifying apparent homozygous
variants.

In addition to data from Kamilah, we collected sequence data for three other gorillas, including one from the eastern
lowland species, to enable a study of diversity within the Gorilla genus. We also sequenced gorilla RNA and
ChIP-seq data to support studies of great ape transcriptomic and regulatory evolution.

The assembly, analysis and other results of the Gorilla Genome Project are published in the publication below.

Accession numbers for all primary sequencing data are presented there; the assembly itself and annotation of genes,
transcripts and predictions of gene orthologues and paralogues are available at Ensembl. The RNA-seq data is available from the European Nucleotide
Archive under accession ERP002094. More information about
the results of the project is also available here.