New zebrafish developmental RNA-seq data in Ensembl

For our latest release (e87) we’ve produced annotations from some new embryonic zebrafish RNA-seq data using the Ensembl genebuild RNA-seq pipeline. The collection of new data we’re providing consists of gene sets and alignments for 18 separate embryonic developmental stages, from the single celled zygote right up until 120 hours post fertilisation. As per usual, these features can be viewed in our browser as separate tracks, or they can be downloaded from our ftp site.

The RNA-seq data we used were produced by the Vertebrate Genetics and Genomics Group at the Sanger Institute. The team collected 96 embryos from each of the 18 stages, examining their morphology so as to ensure every single embryo was at the correct phase of development. Such an undertaking, although extensive, is more achievable in zebrafish than in many other vertebrates due to features such as large clutch size and external fertilisation and development. The team made 5 libraries for each of the developmental stages, each one comprising a pool of 12 embryos. All 90 libraries were made simultaneously by a robot to reduce batch effect and strand-specific sequencing was used to reveal information on genes overlapping on the opposing strand. The data were released to ENA directly after sequencing, to allow public access as early as possible. Variation in gene structure across development can be viewed in Ensembl and the changing expression level can be viewed in Expression Atlas. A manuscript describing the changes in gene structure and expression level across development is currently in preparation.

The alignments and annotations generated from the data are viewable in the Ensembl browser, and the individual tracks can be configured using the RNA-seq tissue matrix. The initial introduction of this matrix was covered in a previous blog post. The new zebrafish entries appear in chronological order under the heading ‘WTSI stranded RNA-seq’. A merged set, which contains all of the new developmental RNA-seq data, is also selectable.

We expect these RNA-seq data will expose new isoforms of previously annotated genes, which may be especially prevalent during, and perhaps even unique to, early embryonic development. The alignments may also reveal interesting expression patterns for specific genes.

We’d like to encourage our users to take full advantage of these exciting new data, and we hope they’ll facilitate some interesting new research.