Assembly

Zv9 (GCA_000002035.2) is the ninth integrated assembly
of the zebrafish genome. This assembly is used by UCSC to create their danRer7
database. It is based on nearly 90% clone sequence (data freeze April 2010),
with remaining gaps being filled using sequence from a novel whole genome shotgun
assembly, WGS31. The project coordination, genome sequencing and assembly is
provided by the Wellcome Trust Sanger Institute.

An overview of the assembly is available here, and frequently asked
questions about the assembly process and terminology are addressed here.

Previous assemblies

Annotation

The zebrafish Zv9 assembly was annotated using a modified
Ensembl pipeline. Predictions from zebrafish proteins have been given
priority over predictions from other non-mammalian vertebrate species. All
Uniprot proteins were filtered to remove predictions ( PE levels 3 and
above ). Aligned zebrafish cDNAs have been used to add UTR regions.8,374
RNASeq models made from a range of zebrafish developmental stages and
tissues were added into the gene build where they added a novel model or
splice variant.Genes are named based on the alignment of their coding
regions to known entries in public databases; ZFIN genes have priority in
this process.

The Ensembl annotations were then merged with Vega annotations at the
transcript level. Transcripts were merged if they shared the same internal
exon-intron boundaries (i.e. had identical splicing pattern) with slight
differences in the terminal exons allowed. Importantly, all Vega source
transcripts (regardless of merge status) were included in the final merged
gene set.