Leveraging a Cray Supercomputer for Parallel De novo Transcriptome assembly using Trinity

The Trinity RNA-Seq de novo assembly software has become a popular application for reconstructing transcriptomes from RNA-Seq in the realms of diverse model and non-model organisms. Trinity involves several software modules (Inchworm, Chrysalis, and Butterfly) that operate in sequence, with varying computational requirements and capacity for parallel computing, and typically require the use of high performance computing systems. Earlier work by Henschel, et al. (http://dl.acm.org/citation.cfm?doid=2…) reported an initial effort of optimizing Trinity by applying best practices for high performance computing, in part, better leveraging of OpenMP for multithreading, which increased scalability and reduced runtime many-fold. Here, we further expand on efforts to improve parallelization and optimize the code for massively parallel computing environments by introducing an enhanced version of the Inchworm software using MPI programming and a novel algorithm to reconstruct transcript contigs using greedy k-mer extension in the context of distributed memory. We contrast the performance of MPI-Inchworm implementation with the earlier (OpenMP) Inchworm on (Cray XC30 hardware).

Categories

Categories

Archives

Archives

What is RNA-Seq?

long RNAs are first converted into a library of cDNA fragments through either RNA fragmentation or DNA fragmentation. Sequencing adaptors (blue) are subsequently added to each cDNA fragment and a short sequence is obtained from each cDNA using high-throughput sequencing technology. The resulting sequence reads are aligned with the reference genome or transcriptome, and classified as three types: exonic reads, junction reads and poly(A) end-reads. These three types are used to generate a base-resolution expression profile for each gene. Nat Rev Genet 10(1):57-63 (2009)