High throughput sequencing of RNA has emerged in
the last few years as a powerful method that enables discovery of
novel transcripts and alternatively spliced isoforms of genes, along
with accurate estimates of gene expression. In this work, we study
the fundamental limits of de novo transcriptome assembly using
RNA shotgun-sequencing, where the sequencing technology extracts
short reads from the RNA transcripts. We propose a new linear-time
algorithm for transcriptome reconstruction and derive sufficient
conditions on the length of reads under which the algorithm will succeed.
We also derive fundamental information-theoretic conditions
for reconstruction by any algorithm, and show that the proposed
algorithm is near-optimal on a real data set. Along the way, we show that
the NP-hard problem of decomposing a flow into the fewest number of
paths can be solved in linear time for a family of instances, and
biologically relevant instances tend to fall in this family. We also describe
the construction of a software package for RNA assembly based on
this theory and show that it obtains significant improvements in reconstruction
accuracy over state-of-the-art software.

Biography

Sreeram Kannan is currently a postdoctoral researcher at
the University of California, Berkeley. He received his Ph.D.
in Electrical Engineering and M.S. in Mathematics from the University
of Illinois Urbana- Champaign. He is a co-recipient of the Van
Valkenburg research award from UIUC, Qualcomm Roberto Padovani
Scholarship for outstanding interns, the Qualcomm Cognitive Radio Contest
first prize, the S.V.C. Aiya medal from the Indian Institute of Science,
and Intel India Student Research Contest first prize. His research
interests include applications of information theory and approximation
algorithms to wireless networks and computational biology.