Abstract

Background

Ultra-high throughput sequencing technologies provide opportunities both for discovery
of novel molecular species and for detailed comparisons of gene expression patterns.
Small RNA populations are particularly well suited to this analysis, as many different
small RNAs can be completely sequenced in a single instrument run.

Results

We prepared small RNA libraries from 29 tumour/normal pairs of human cervical tissue
samples. Analysis of the resulting sequences (42 million in total) defined 64 new
human microRNA (miRNA) genes. Both arms of the hairpin precursor were observed in
twenty-three of the newly identified miRNA candidates. We tested several computational
approaches for the analysis of class differences between high throughput sequencing
datasets and describe a novel application of a log linear model that has provided
the most effective analysis for this data. This method resulted in the identification
of 67 miRNAs that were differentially-expressed between the tumour and normal samples
at a false discovery rate less than 0.001.

Conclusions

This approach can potentially be applied to any kind of RNA sequencing data for analysing
differential sequence representation between biological sample sets.