A single instance of experimentally obtainedsubsequence representing a (possiblyerroneous, likely biased) subsampling of asequence space of (generally larger)targetnucleotide molecules, which may also havesome computed associated measure of quality

SANGER SEQUENCING

2.1NextGeneration Sequencing and Sequence Assembly Algorithms

Contigs

and Sanger Automated Sequencing

Large insertlibrary

Shotguncloning

Sequencing

Chromosome

Large-insert

Clones

Sequencing reads

from subclones

Sequence reads

CAGACTACCGTTAGACTT

Dideoxy

chain-termination (“Sanger”) Method

NGS Trend

PubMed was “searched in two-year increments for key words and thenumber of hits plotted over time.”

Early genomes were sequenced on the basis ofdecomposition of genomes and chromosomesinto tractable sizes of (subcloned) DNA (~100 kb),ordered and oriented by detailed genetic andphysical maps.

Clone-by-clone based sequence assembly was asimpler computational problem given therelatively small size and reduced complexity ofthe sequence target (subclone) and relativelylong Sanger reads, but was extremely costly dueto the experimental overhead of the DNAdecomposition.

“Classical” Sequence Assembly

Read, edit & trim DNA chromatograms

Remove overlaps & ambiguous calls

Read in all sequence files (10-10,000)

Reverse complement all sequences (doubles # ofsequences to align)

Remove vector sequences (vector trim)

Remove regions of low complexity

Perform multiple sequence alignment & merge

Fill (“finish”) gaps using a variety of experimentalprocedures.

Contig Alignment-

Process

ATCGATGCGTAGCAGACTACCGTTACGATGCCTT…

Sanger Automated Sequence Assembly Software

Phred:

base calling program that does detailedstatistical analysis on Sanger chromatogram(“trace”)files

Target genomes are generally full of subtlydivergent repetitive content of diverse nature,generally of a length longer than NGS readlengths, it is not easy to always know what issequencing error versus a sequence variant