Special Issue Information

Dear Colleagues,

We are currently in the midst of a revolution in DNA sequencing that promises unprecedented access to genetic data. Next Generation Sequencing (NGS) and follow-on technologies such as single molecule sequencing (next-next gen or 3rd generation) instruments and methods are advancing rapidly. These advances open the door to new applications for sequencing such as large-scale association studies, comparative genomics, metagenomics and the study of ancient DNA. These technologies have also been used in chromatin immunoprecipitation studies to determine DNA binding sites and quantitative RNA expression analysis by sequencing. These instrument platforms create unique challenges both upstream and downstream of the data generation. Improved methods for library construction and amplification strategies to produce more uniform and complete coverage are required. Methods are also evolving for the processing and analysis of extremely large numbers (often short) reads to better understand quality, improve alignments and generate whole-genome assemblies. Given the huge impact of these developments, a special issue of the journal Genes is being issued to explore the methods and applications of these new sequencing technologies. Authors are encouraged to submit original manuscripts describing utilization of Next Generation Sequencing to answer scientific questions. Also encouraged are papers describing new methods or instruments, and reviews or comparisons of Next Generation Sequencing technologies as well as manuscripts dealing with bioinformatic methods of analysis.

Abstract: Next Generation Sequencing (NGS) refers to technologies that do not rely on traditional dideoxy-nucleotide (Sanger) sequencing where labeled DNA fragments are physically resolved by electrophoresis. These new technologies rely on different strategies, but essentially all of them make use of real-time data collection of a base level incorporation event across a massive number of reactions (on the order of millions versus 96 for capillary electrophoresis for instance). The major commercial NGS platforms available to researchers are the 454 Genome Sequencer (Roche), Illumina (formerly Solexa) Genome analyzer, the SOLiD system (Applied Biosystems/Life Technologies) and the Heliscope (Helicos Corporation). The techniques and different strategies utilized by these platforms are reviewed in a number of the papers in this special issue. These technologies are enabling new applications that take advantage of the massive data produced by this next generation of sequencing instruments. [...]

Abstract: Polysaccharides are an important source of organic carbon in the marine environment and degradation of the insoluble and globally abundant cellulose is a major component of the marine carbon cycle. Although a number of species of cultured bacteria are known to degrade crystalline cellulose, little is known of the polysaccharide hydrolases expressed by cellulose-degrading microbial communities, particularly in the marine environment. Next generation 454 Pyrosequencing was applied to analyze the microbial community that colonizes and degrades insoluble polysaccharides in situ in the Irish Sea. The bioinformatics tool MG-RAST was used to examine the randomly sampled data for taxonomic markers and functional genes, and showed that the community was dominated by members of the Gammaproteobacteria and Bacteroidetes. Furthermore, the identification of 211 gene sequences matched to a custom-made database comprising the members of nine glycoside hydrolase families revealed an extensive repertoire of functional genes predicted to be involved in cellulose utilization. This demonstrates that the use of an in situ cellulose baiting method yielded a marine microbial metagenome considerably enriched in functional genes involved in polysaccharide degradation. The research reported here is the first designed to specifically address the bacterial communities that colonize and degrade cellulose in the marine environment and to evaluate the glycoside hydrolase (cellulase and chitinase) gene repertoire of that community, in the absence of the biases associated with PCR-based molecular techniques.

Abstract: The recent arrival of ultra-high throughput, next generation sequencing (NGS) technologies has revolutionized the genetics and genomics fields by allowing rapid and inexpensive sequencing of billions of bases. The rapid deployment of NGS in a variety of sequencing-based experiments has resulted in fast accumulation of massive amounts of sequencing data. To process this new type of data, a torrent of increasingly sophisticated algorithms and software tools are emerging to help the analysis stage of the NGS applications. In this article, we strive to comprehensively identify the critical challenges that arise from all stages of NGS data analysis and provide an objective overview of what has been achieved in existing works. At the same time, we highlight selected areas that need much further research to improve our current capabilities to delineate the most information possible from NGS data. The article focuses on applications dealing with ChIP-Seq and RNA-Seq.

Abstract: The emergence of next-generation sequencing (NGS) platforms imposes increasing demands on statistical methods and bioinformatic tools for the analysis and the management of the huge amounts of data generated by these technologies. Even at the early stages of their commercial availability, a large number of softwares already exist for analyzing NGS data. These tools can be fit into many general categories including alignment of sequence reads to a reference, base-calling and/or polymorphism detection, de novo assembly from paired or unpaired reads, structural variant detection and genome browsing. This manuscript aims to guide readers in the choice of the available computational tools that can be used to face the several steps of the data analysis workflow.

Abstract: This study presents a new computer program for assessing the effects of different factors and sequencing strategies on de novo sequence assembly. The program uses reads from actual sequencing studies or from simulations with a reference genome that may also be real or simulated. The simulated reads can be created with our read simulator. They can be of differing length and coverage, consist of paired reads with varying distance, and include sequencing errors such as color space miscalls to imitate SOLiD data. The simulated or real reads are mapped to their reference genome and our assembly simulator is then used to obtain optimal assemblies that are limited only by the distribution of repeats. By way of this mapping, the assembly simulator determines which contigs are theoretically possible, or conversely (and perhaps more importantly), which are not. We illustrate the application and utility of our new simulation tools with several experiments that test the effects of genome complexity (repeats), read length and coverage, word size in De Bruijn graph assembly, and alternative sequencing strategies (e.g., BAC pooling) on sequence assemblies. These experiments highlight just some of the uses of our simulators in the experimental design of sequencing projects and in the further development of assembly algorithms.

Abstract: The invention of next-generation-sequencing has revolutionized almost all fields of genetics, but few have profited from it as much as the field of ancient DNA research. From its beginnings as an interesting but rather marginal discipline, ancient DNA research is now on its way into the centre of evolutionary biology. In less than a year from its invention next-generation-sequencing had increased the amount of DNA sequence data available from extinct organisms by several orders of magnitude. Ancient DNA research is now not only adding a temporal aspect to evolutionary studies and allowing for the observation of evolution in real time, it also provides important data to help understand the origins of our own species. Here we review progress that has been made in next-generation-sequencing of ancient DNA over the past five years and evaluate sequencing strategies and future directions.

Abstract: Viruses, the most abundant biological entities on the planet, are capable of infecting organisms from all three branches of life, although the majority infect bacteria where the greatest degree of cellular diversity lies. However, the characterization and assessment of viral diversity in natural environments is only beginning to become a possibility. Through the development of a novel technique for the harvest of viral DNA and the application of 454 pyrosequencing, a snapshot of the diversity of the DNA viruses harvested from a standing pond on a cattle farm has been obtained. A high abundance of viral genotypes (785) were present within the virome. The absolute numbers of lambdoid and Shiga toxin (Stx) encoding phages detected suggested that the depth of sequencing had enabled recovery of only ca. 8% of the total virus population, numbers that agreed within less than an order of magnitude with predictions made by rarefaction analysis. The most abundant viral genotypes in the pond were bacteriophages (93.7%). The predominant viral genotypes infecting higher life forms found in association with the farm were pathogens that cause disease in cattle and humans, e.g. members of the Herpesviridae. The techniques and analysis described here provide a fresh approach to the monitoring of viral populations in the aquatic environment, with the potential to become integral to the development of risk analysis tools for monitoring the dissemination of viral agents of animal, plant and human diseases.

Abstract: Epigenetic modifications play an important role in lymphoid malignancies. This has been evidenced by the large body of work published using microarray technologies to generate methylation profiles for numerous types and subtypes of lymphoma and leukemia. These studies have shown the importance of defining the epigenome so that we can better understand the biology of lymphoma. Recent advances in DNA sequencing technology have transformed the landscape of epigenomic analysis as we now have the ability to characterize the genome-wide distribution of chromatin modifications and DNA methylation using next-generation sequencing. To take full advantage of the throughput of next-generation sequencing, there are many methodologies that have been developed and many more that are currently being developed. Choosing the appropriate methodology is fundamental to the outcome of next-generation sequencing studies. In this review, published technologies and methodologies applicable to studying the methylome are presented. In addition, progress towards defining the methylome in lymphoma is discussed and prospective directions that have been made possible as a result of next-generation sequencing technology. Finally, methodologies are introduced that have not yet been published but that are being explored in the pursuit of defining the lymphoma methylome.

Abstract: DNA methylation is a major form of epigenetic modification and plays essential roles in physiology and disease processes. In the human genome, about 80% of cytosines in the 56 million CpG sites are methylated to 5-methylcytosines. The methylation pattern of DNA is highly variable among cells types and developmental stages and influenced by disease processes and genetic factors, which brings considerable theoretical and technological challenges for its comprehensive mapping. Recently various high-throughput approaches based on bisulfite conversion combined with next generation sequencing have been developed and applied for the genome wide analysis of DNA methylation. These methods provide single base pair resolution, quantitative DNA methylation data with genome wide coverage. We review these methods here and discuss some technical points of special interest like the sequence depth necessary to reach conclusions, the identification of clonal DNA amplification after bisulfite conversion and the detection of non-CpG methylation. Future application of these methods will greatly facilitate the profiling of the DNA methylation in the genomes of different species, individuals and cell types under healthy and disease states.

Abstract: miRNAs constitute a family of small RNA species that have been demonstrated to play a central role in regulating gene expression in many organisms. With the advent of next generation sequencing, new opportunities have arisen to identify and quantify miRNAs and elucidate their function. The unprecedented sequencing depth reached by next generation sequencing technologies makes it possible to get a comprehensive miRNA landscape but also poses new challenges for data analysis. We provide an overview of strategies used for miRNA sequencing, public miRNA resources, and useful methods and tools that are available for data analysis.

Abstract: In the years since the first complete human genome sequence was reported, there has been a rapid development of technologies to facilitate high-throughput sequence analysis of DNA (termed “next-generation” sequencing). These novel approaches to DNA sequencing offer the promise of complete genomic analysis at a cost feasible for routine clinical diagnostics. However, the ability to more thoroughly interrogate genomic sequence raises a number of important issues with regard to result interpretation, laboratory workflow, data storage, and ethical considerations. This review describes the current high-throughput sequencing platforms commercially available, and compares the inherent advantages and disadvantages of each. The potential applications for clinical diagnostics are considered, as well as the need for software and analysis tools to interpret the vast amount of data generated. Finally, we discuss the clinical and ethical implications of the wealth of genetic information generated by these methods. Despite the challenges, we anticipate that the evolution and refinement of high-throughput DNA sequencing technologies will catalyze a new era of personalized medicine based on individualized genomic analysis.