There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads. However, several big challenges remain to be overcome to make it efficient, accurate, and versatile. Stem from the very short read length provided at the emerging stage of NGS, early assemblers, though have been successfully applied to assemble some published genomes, failed in leveraging reads generated by newer generation sequencers. The new reads are not only longer, but also exhibit improved profiles and patterns that green-lighted some previously prohibitive genome studies. However, this requires new algorithms to be developed.
SOAPdenovo2 is developed with a new algorithm design that: 1) reduces memory consumption in graph construction; 2) resolves more complex repetitive regions in contig assembly; 3) increases coverage and length in scaffolding; 4) improves gap closing, and 5) optimizes for large genomes. Benchmark using the public datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive compare to other assemblers in both assembly length and accuracy.
SOAPdenovo2 was developed with versatility as a top priority. Working alone or as a part of a pipeline, SOAPdenovo2 successfully illustrated its power by 1) presenting detailed structural variation (SV) maps of an Asian and African genome and showing that whole genome de novo assembly could serve as a new solution to a more comprehensive SV map; 2) drafting the highly polymorphic and repetitive Oyster genome and showing that complicated oceanic species could be assembled by SOAPdenovo2 together with hierarchical assembly strategy; and 3) finishing the assembly of a haplotype-resolved diploid genome without using a reference genome. The community has also successfully applied SOAPdenovo2 in assembling over a hundred species.
The versatility of SOAPdenovo2 was also exemplified by developing SOAPdenovo-Trans, an assembler tailored for transcriptome assembly using RNA sequencing data. Benchmarking on known transcripts from well-annotated genomes, SOAPdenovo-Trans outperforms two other software on identifying alternative splicing and differential expression levels.

There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads. However, several big challenges remain to be overcome to make it efficient, accurate, and versatile. Stem from the very short read length provided at the emerging stage of NGS, early assemblers, though have been successfully applied to assemble some published genomes, failed in leveraging reads generated by newer generation sequencers. The new reads are not only longer, but also exhibit improved profiles and patterns that green-lighted some previously prohibitive genome studies. However, this requires new algorithms to be developed.
SOAPdenovo2 is developed with a new algorithm design that: 1) reduces memory consumption in graph construction; 2) resolves more complex repetitive regions in contig assembly; 3) increases coverage and length in scaffolding; 4) improves gap closing, and 5) optimizes for large genomes. Benchmark using the public datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive compare to other assemblers in both assembly length and accuracy.
SOAPdenovo2 was developed with versatility as a top priority. Working alone or as a part of a pipeline, SOAPdenovo2 successfully illustrated its power by 1) presenting detailed structural variation (SV) maps of an Asian and African genome and showing that whole genome de novo assembly could serve as a new solution to a more comprehensive SV map; 2) drafting the highly polymorphic and repetitive Oyster genome and showing that complicated oceanic species could be assembled by SOAPdenovo2 together with hierarchical assembly strategy; and 3) finishing the assembly of a haplotype-resolved diploid genome without using a reference genome. The community has also successfully applied SOAPdenovo2 in assembling over a hundred species.
The versatility of SOAPdenovo2 was also exemplified by developing SOAPdenovo-Trans, an assembler tailored for transcriptome assembly using RNA sequencing data. Benchmarking on known transcripts from well-annotated genomes, SOAPdenovo-Trans outperforms two other software on identifying alternative splicing and differential expression levels.

-

dc.language

eng

-

dc.publisher

The University of Hong Kong (Pokfulam, Hong Kong)

-

dc.relation.ispartof

HKU Theses Online (HKUTO)

-

dc.rights

Creative Commons: Attribution 3.0 Hong Kong License

-

dc.rights

The author retains all proprietary rights, (such as patent rights) and the right to use in future works.