Thursday, November 1, 2012

Sequence Technologies: Recent Advances and Implications for the Future

1.Introduction

Sequencing in genetics is the determination of primary structure of biopolymers like nucleicacids and proteins. Several sequencing techniques have been developed and extensively commercialised. Currently there are companies world over offering sequencing services, reagents, sequencers and analytical services for research and industry as a ‘commodity’.
The recent developments in sequencing, annotation and sequence-based technologies supported by bioinformatics are leading a step-wise revolution of our knowledge base in biological sciences in entirety. These are bound to bridge the gap between classical approaches and reverse approaches in genetics and escalate our understanding of different aspects of genotype-phenotype correlation (Figure 1).

It was in 1902 that Fischer and Hofmeister proposed proteins are formed through peptide bonds between aminoacids in a linear structure (Lichtenthaler, 2002); it took again half a century for decrypting the aminoacid sequence. Most proteins fold into unique 3-dimensional structures, which are as important as the aminoacid sequence in their function. Edman degradation, peptide mass fingerprinting, protease digests and mass spectrometry can be used for protein sequencing, latter being the most advanced and popularly used. However thanks to the central dogma and genetic code it is much easier to infer the protein sequence when the gene encoding it is known and vice versa. Large amount of proteomic data is available presently for diverse organisms that allow researchers to predict secondary structure, efficiently identify homologous proteins by sequence alignment, construct phylogenetic trees and so on.

2.2.Nucleicacid sequencing

The initial attempts of deciphering the nucleotide sequence of nucleic acids were on bacteriophage RNA in 1969, which eventually resulted in sequencing of the first complete gene and genome in 1972 and 1976 respectively (Adams et al., 1969; Min Jou et al., 1972; Fiers, W. et al., 1976). The major constraints at that time were difficulties in purification and large size of the polymers. The milestones in classical sequencing approaches are summarised in table 1.

Table 1 Classical Sequencing Techniques

There have been remarkable improvements in nucleicacid sequencing technologies and data-production pipelines in recent years. Today companies are able to offer overnight DNA sequencing services for 1,000s of bases read length. The trend of DNA sequencing costs as tracked by The National Human Genome Research Institute (NHGRI) for assessing improvements in DNA sequencing technologies is shown in figure 2.

The sudden and profound out-pacing of Moore's Law, an information technology indicator of excellent technical advancement, in 2008 represents transition of sequencing agencies from Sanger-based to 'second generation' or 'next-generation' DNA sequencing technologies.

2.3.Next Generation Sequencing (NGS)

Next-generation high-throughput sequencing (HT-NGS) technologies were developed to overcome the limitations of the earlier technologies. They offered higher speed, less labour, and lowered cost.
The 454 FLX Pyro-sequencer from Roche Applied Sciences was the first next-generation sequencer to become commercially available in 2004, followed by Solexa 1G Genetic Analyzer from Illumina, the SOLiD (Supported Oligonucleotide Ligation and Detection) System from Applied Bio systems and HeliScope from Helicos BioSciences in 2006, 2007 and 2008 correspondingly. The different HT-NGS sequencing platforms developed that uses different detection principles as summarised in Table 2.

Table 2 Second/next generation sequencing techniques

The last decade saw a race between these industrial giants for improved sequencing technologies. In 2006, the X Prize Foundation in collaboration with J. Craig Venter Science Foundation, established the Archon X Prize for Genomics, 10 million US$ award to “the first Team that can build a device and use it to sequence 100 human genomes within 10 days or less, with an accuracy of no more than one error in every 1,000,000 bases sequenced, with sequences accurately covering at least 98% of the genome, and at a recurring cost of no more than $1,000 per genome"(http://www.xprize.org). The major players in the field have been upgrading their technologies and dropping their prices ever since. However the X Prize still remain unclaimed.
Today the different sequencing technologies are recommended for specific needs. 454 Roche is preferably used for ultra-deep sequencing and production of reference genome for the whole genome or transcriptome sequencing where as SOLid is recommended for targeted re-sequencing. Automated Sanger method continues to be used for sequencing of PCR products, plasmids and gap closure or finishing of genomes.
At the moment third generation sequencing technologies are being reported. These next-next-generation sequencing technologies include Nano-pore Sequencing, which involves nano-pore immersed in a conducting fluid and applied potential across that detect characteristic electric current due to conduction of ions through it; real-time monitoring of PCR activity through fluorescent resonant energy transfer; Single molecule real-time sequencing utilizing Zero-mode waveguides; pay-as-you-go sequencing and Direct to Consumer Whole Genome Sequencing (Clarke et al., 2009; Wabuyele et al.,2003; Levene et al., 2003; Pollack, 2012; Vorhaus, 2012). scalability, simplicity, efficiency, and economics are they key features of these. real-time results being the vital objective.

3.Implications for the future

With the recent advances in sequencing technologies the quantity that can be sequenced in unit time at unit cost is increasing by day. Improvements in speed, accuracy and availability of high throughput DNA sequencing technologies have caused a meteoric rise in the volume of ‘omic sequences available in public domain (Figure 3). On-going researches’ are developing virtual environments to explore genomic space at the gene, protein, and function and pathway network level. The large volumes of data thus generated are creating powerful resources for scientific research in all areas of life; few cases are described below (Collins et al., 2003).

Unprecedented progress in genomics elucidating the genetic/genomic basis of health, illness, disease risk, and treatment responses is applicable to both biomedical research and clinical medicine. This could possibly revolutionize healthcare through earlier diagnosis, identification of the genetic factors associated with diseases, more effective prevention, production of designer drugs, custom-made finest treatment of diseases, and avoiding drug side effects. Genomic medicine brings humanity closer in offering a better quality of life to at high risk individuals and finding a cure for many life-threatening diseases like Cancer (Guttmacher and Collins, 2002).

3.2.Agri-genomic Revolution

Genomics revolution is at the core of plant and animal breeding. Complete genomic sequences of model plants and crops for example Arabidopsis, rice, wheat, date-palm; availability of omic databases; and high-throughput and parallel approaches for analysis of mutations allows us to understand the function of genes in terms of their relationship to the phenotype. The technologies may soon be able to decipher the relationship between genetic variation in gene sequences and phenotypic variation in traits, rather than just between a gene and a mutant phenotype. genomic approach may also help in studying quantitative trait variation and molecular diversity of genes.
New approaches to QTL mapping and quantitative trait nucleotide (QTN), candidate gene approaches and whole gene scan have shaped from the new advents in sequencing. Association studies based on existing populations/ germplasm collections will be a major advance for species where experimental populations are difficult to access e.g. Oil-palm. Future prospects lie in improved plant breeding efficiency in the form of Marker Assisted Selection (MAS), identification of new trait supporting alleles in wild germplasm, targeted mutagenesis and more (Morgante and Salamini, 2003).
In animal breeding, Genome-wide SNP panels are now available for an increasing number of livestock species, enabling breeders to cost-effectively and accurately determine a genomic estimated breeding value making traditional approaches obsolete and revolutionizing global livestock industries

3.3.Ecology and Evolution Studies

Comparative genome analysis in a phylogenetic context can provide the most meaningful insights into both germplasm characterization and processes of evolution. Genomic and meta-genomic sequencing techniques are beginning to reform the study of ecology and evolution starting with our understanding of Bacteria and Archaea. The NGS technologies have the potential to bring the genomics revolution to whole populations, and to endangered and ecologically and evolutionary important species (Hudson, 2008; Shokralla et al.,2012).

3.4.Practical Difficulties

Biology is in the middle of a paradigm shift towards becoming a fully data driven science. The analysis of the growing volume of gene expression data becoming available from the various post-genomics technologies will present a challenge for generating necessary annotations and large-scale computational support.

3.5.Public Concerns

The sequencing technologies have improved our understanding of the genetic makeup of living organisms. However there are many aspects of public policy to be addressed before such advances could be put to practice. Concerns regarding privacy, discrimination, biological terrorism, equitable access, intellectual property, validation of tests and products, ethics, economics and public awareness are only a few of them.

4.Conclusion

The NGS technologies provide practical, massively parallel sequencing at lower cost and without the requirement for large, automated facilities, making genome and transcriptome sequencing and re-sequencing possible for small and large endeavours in research and practice (Morozova and Marra, 2008). Still many ethical, legal, and social issues surround access to genetic information.
The ramifications, the rapid advances in sequencing technology will have in our daily lives will be surely profound and lasting, even though are unpredictable -as Eric Lander[1] reflected, “it was easier to predict 10 years ago what we will be doing today than to predict today what is going to be possible in a few years’ time”.