Time: 10:45am Tuesdays.Venue: Level 7 Seminar Room 2, WEHI1

4 September 2018

In most of our routine RNA-seq analyses we have found it profitable to analyse RNA-seq differential expression at the gene level rather than at the transcript level. The recent arrival of very fast light-weight transcript quantification software (kallisto and Salmon) is however likely to increase the popularity of transcript-level analyses of RNA-seq worldwide. Transcript level quantifications suffer very seriously from read assignment uncertainty, whereby reads that overlap multiple transcripts for the same gene cannot be unambiguously assigned. The level of uncertainty varies greatly between genes and transcripts and therefore needs to be predicted and built into the differential analysis procedures, otherwise statistical analyses will be inefficient and possibly biased. The best way estimate read assignment uncertainty is to bootstrap the reads for each sample.

I have previously outlined how transcript-level differential expression analyses can be done efficiently with the limma and edgeR software packages by incorporating bootstrap output from kallisto or Salmon into the limma or edgeR pipelines. The edgeR approach is particularly simple and requires almost no new software. In this talk, I will give more details of the edgeR approach in particular and will demonstrate the improved performance and sensitivity of edgeR versus competitors such as sleuth. I will describe how previous simulation comparisons of transcript DE methods have been based on incorrect probability models.