Abstract: Over the last few years, RNA sequencing (RNA-Seq) has become the technology of choice for measuring gene expression. The rapid growth in RNA-Seq studies has accumulated a large amount of RNA-Seq data sets for different organisms under a variety of experimental and environmental conditions. It is only natural to begin exploring how the large amount of existing data sets can help the analysis of future experiments. In this talk, we discuss identifying stably expressed genes from multiple existing RNA-Seq data sets based on a numerical measure of stability. We envision that such identified stably expressed genes can be used as a reference set or prior information for count normalization and differential expression (DE) analysis of future RNA-Seq data sets obtained from similar or comparable experiments. We also fit a random-effect model to the read counts for each gene and decompose the total variance to into between-sample, between-treatment and between-experiment variance components. The variance component analysis is a first step towards understanding the sources and nature of the RNA-Seq count variation. To illustrate our methods, we examined RNA-Seq data on 211 Arabidopsis samples from 24 different experiments carried out by different labs.

(This talk is based on joint work with Bin Zhuo, Sarah Emerson and Jeff Chang.)