Abstract

The completion of the human genome draft has taken several years and is only the beginning of a period in which large amounts of DNA and RNA sequence information will be required from many individuals and species. Conventional sequencing technology has limitations in cost, speed, and sensitivity, with the result that the demand for sequence information far outstrips current capacity. There have been several proposals to address these issues by developing the ability to sequence single DNA molecules, but none have been experimentally demonstrated. Here we report the use of DNA polymerase to obtain sequence information from single DNA molecules by using fluorescence microscopy. We monitored repeated incorporation of fluorescently labeled nucleotides into individual DNA strands with single base resolution, allowing the determination of sequence fingerprints up to 5 bp in length. These experiments show that one can study the activity of DNA polymerase at the single molecule level with single base resolution and a high degree of parallelization, thus providing the foundation for a practical single molecule sequencing technology.

The Sanger method of DNA sequencing (1) and subsequent developments in automation (2) and computation (3) revolutionized the world of biological sciences and eventually led to the sequencing of the consensus human genome (4, 5). The successes of this and other genome projects have only whetted the appetite of the scientific community, and many applications of DNA sequencing have been proposed that will require cheaper, faster, or more sensitive sequencing technology than conventional methods currently provide. After the determination of the consensus human genome, there is a desire to sequence many individual human genomes to provide high-resolution genotypes that can be used to determine the complex relationships among disease, pharmaceutical efficacy, and genetic variability (6–8). Similarly, aggressive technological innovation is required for the field of comparative genomics to reach its full potential (4). Finally, mRNA sequencing is valuable to determine exon splicing patterns (9) and as a tool to discover gene function from context-specific expression data (10).

There have been many proposals to develop new sequencing technologies based on single molecule measurements, generally either by observing the interaction of particular proteins with DNA (6, 11–13) or by using ultra high-resolution scanned probe microscopy (14). Although none of these methods has been demonstrated experimentally, they are interesting because they promise high sensitivity, low cost, and in some cases a high degree of parallelization (15). Unlike conventional technology, their speed and read length would not be inherently limited by the resolving power of electrophoretic separation. Single molecule sensitivity might permit direct sequencing of mRNA from rare cell populations or perhaps even individual cells.

A major obstacle in the development of single molecule sequencing schemes is that DNA has an extraordinarily high linear data density, with a pitch of only 3.4 Å between successive bases. Scanned probe microscopes have not yet been able to demonstrate simultaneously the resolution and chemical specificity needed to resolve individual bases (14). Other proposals turn to nature for inspiration and seek to combine optical techniques with enzymes that have been fine-tuned by evolution to operate as machines that assemble and disassemble DNA with inherent single-base resolution (6, 11, 12). Although there have been single molecule studies of DNA polymerase (16, 17), RNA polymerase (18, 19), and exonuclease (20, 21), measuring the activity of these enzymes with single-base resolution has been an elusive goal. We took advantage of the exquisite discrimination and fidelity of DNA polymerase to image sequence information in a single DNA template as its complementary strand is synthesized. Angstrom spatial resolution is not necessary because the nucleotides are inserted sequentially; only the time resolution to discriminate successive incorporations is required. After each successful incorporation event, a fluorescent signal is measured and then nulled by photobleaching. This method lends itself to massive parallelism, and in the experiments described here we were able to monitor hundreds of templates simultaneously.

Observations of single molecule fluorescence were made by using a conventional microscope equipped with total internal reflection (22) illumination, which reduces background fluorescence (Fig. 1). The surface of a quartz slide was chemically treated to specifically anchor DNA templates while preventing nonspecific binding of free nucleotides, and a plastic flow cell was attached to the surface to exchange solutions. DNA template oligonucleotides were hybridized to a fluorescently labeled primer and bound to the surface via streptavidin and biotin with a surface density low enough to resolve single molecules. The primed templates were detected through their fluorescent tags, their locations were recorded for future reference, and the tags were photobleached. Labeled nucleotide triphosphates and DNA polymerase enzyme were then washed in and out of the flow cell while the known locations of the DNA templates were monitored for the appearance of fluorescence. With this technique we show that DNA polymerase is active on surface-immobilized DNA templates and can incorporate nucleotides with high fidelity.

The experimental system. (a) Schematic drawing of the optical setup. The green laser illuminates the surface in total internal reflection mode while the red laser is blocked. Both Cy3 and Cy5 fluorescence spectra are recorded independently by the intensified charge-coupled device. (b) Single-molecule images obtained by the system. The two images show colocation of Cy3- and Cy5-labeled nucleotides in the same template. (Scale bar = 10 μm.) (c) Schematic of primed DNA template attached to the surface of a microscope slide via streptavidin-biotin.

A confounding factor in previous attempts to sequence single DNA molecules with fluorescence microscopy has been an inability to control background fluorescence and fluorescent impurities (20, 23). In this work we used a combination of evanescent wave microscopy and single-pair fluorescence resonance energy transfer (spFRET; refs. 24–26) to reject unwanted noise. The donor fluorophore excites acceptors only within the Forster radius, thus effectively creating an extremely high-resolution near-field source. Because the Forster radius (27) of this fluorophore pair is ≈5 nm, the spatial resolution of this method exceeds the diffraction limit by a factor of 50 and conventional near-field microscopy by an order of magnitude. With this spFRET method we were able to obtain single molecule sequence fingerprints up to 5 bp in length.

Experimental Procedures

Detection and Data Analysis.

The optical setup is shown in Fig. 1. An upright microscope (BH-2, Olympus, Melville, NY) equipped with total internal reflection (TIR) illumination served as a platform for the experiments. Two laser beams, 635 (Coherent, Santa Clara, CA) and 532 nm (Brimrose, Baltimore), with nominal powers of 8 and 10 mW, respectively, were circularly polarized by quarter-wave plates and undergo TIR in a dove prism (Edmund Scientific, Barrington, NJ). The prism was optically coupled to the fused silica bottom (Esco, Oak Ridge, NJ) of a hybridization chamber (Sigma) so that evanescent waves illuminated up to 150 nm above the surface of the fused silica. An objective (DPlanApo, 100 UV 1.3oil, Olympus) collected the fluorescence signal through the top plastic cover of the chamber, which was deflected by the objective to ≈40 μm from the silica surface. An image splitter (Optical Insights, Santa Fe, NM) directed the light through two bandpass filters (630dcxr, HQ585/80, HQ690/60; Chroma Technology, Brattleboro, VT) to an intensified charge-coupled device (I-PentaMAX; Roper Scientific, Trenton, NJ), which recorded adjacent images of a 120- × 60-μm section of the surface in two colors. Typically, eight exposures of 0.5 sec each were taken of each field of view to compensate for possible intermittency in the fluorophore emission. Custom idl software (28) was modified to analyze the locations and intensities of fluorescence objects in the intensified charge-coupled device pictures. We inspected the resulting traces to determine incorporation information and hence the template sequences.

Sample Preparation.

The target DNA was composed of a DNA primer, [Cy3–5′-tagaacctccgtgt-3′], which was annealed to template 1 [3′-atcttggaggcacaATCATCGTCATCGTCATCG-(TCATCG)7-5′-biotin], template 2 [3′-atcttggaggcacaATCGTCATCATCGTCGTCA-(TCATCG)7-5′-biotin], or template 3 [3′-atcttggaggcacaCTACTGACT-(ACTGACT)11-5′-biotin] (all oligonucleotides were synthesized by Operon, Technologies, Alameda, CA). Surface chemistry based on polyelectrolytes (29, 30) and biotin-streptavidin bonding was used to anchor the DNA molecules to the fused silica surface of the hybridization chamber and to minimize nonspecific binding of the nucleotides to the surface. Slides were sonicated in 2% MICRO-90 soap (Cole–Parmer, Vernon Hills, IL) for 20 min and then cleaned by immersion in boiling RCA solution (6:4:1 high-purity H2O/30% NH4OH/30% H2O2) for 1 h (31). They were then immersed alternately in polyallylamine (positively charged) and polyacrylic acid (negatively charged; both from Aldrich) at 2 mg/ml and pH 8 for 10 min each and washed intensively with distilled water in between. The carboxyl groups of the last polyacrylic acid layer served to prevent the negatively charged labeled nucleotide from binding to the surface of the sample. In addition, these functional groups were used for further attachment of a layer of biotin. The slides were incubated with 5 mM biotin-amine reagent (Biotin-EZ-Link, Pierce) for 10 min in the presence of 1-[3-(dimethylamino)propyl]-3-ethylcarbodiimide hydrochloride (EDC, Sigma) in MES buffer, followed by incubation with Streptavidin Plus (Prozyme, San Leandro, CA) at 0.1 mg/ml for 15 min in Tris buffer. The biotinylated DNA templates were deposited onto the streptavidin-coated chamber surface at 10 pM for 10 min in Tris buffer that contained 100 mM MgCl2. For incorporations, the reaction solution contained Klenow fragment Exo-minus polymerase (New England Biolabs) at 10 nM (100 units/ml) in the reaction buffer (EcoPol buffer, New England Biolabs) and a nucleotide triphosphate. dATP, dGTP, dTTP, and dCTP from Roche Diagnostics, dCTP-Cy3, dUTP-Cy3, and dUTP-Cy5 from Amersham Pharmacia, dCTP-Cy5, dATP-Cy3, dGTP-Cy3, dATP-Cy5, and dGTP-Cy5 from Perkin–Elmer, and dCTP-Alexa647 from Molecular Probes were used at 0.2 μM for the Cy3-labeled and 0.5 μM for the Cy5-labeled and unlabeled nucleotides. Incubation times were 6–15 min, with the longer incubation time at the later stages of the experiment. To reduce bleaching of the fluorescence dyes, an oxygen scavenging system (27) was used during all green illumination periods, with the exception of the bleaching of the primer tag.

Reagent Exchange Sequence for Single-Pair FRET Sequencing.

The positions of the anchored Cy3-primed DNA were recorded, and then the tags were bleached by the green laser illumination (Fig. 3a1). dUTP-Cy3 and polymerase were introduced and washed out. An image of the surface was then analyzed for incorporated U-Cy3 (Fig. 3a2). If there were none, the process was repeated with dCTP-Cy3. If there was still no incorporation, incubation was repeated with unlabeled dATP and dGTP and then cycled again from the beginning until the first fluorescently labeled base had been incorporated. The Cy3 dye of this incorporated nucleotide was kept unbleached. Next, a mix of dATP, dGTP, and polymerase was incubated to ensure that the primer was extended until the next A or G of the template. At this point we switched to Cy5-labeled nucleotides, except for one successful reaction in which the label was Alexa-647, a Cy5 analogue (Molecular Probes). The incorporation and observation process was repeated, except that each observation with green illumination was followed by an observation with red illumination to photobleach any incorporated Cy5 fluorophores. After bleaching the acceptor, we incubated the mix of dATP, dGTP, and polymerase again, washed it out, and observed the sample briefly with green illumination to record the recovery of the donor (Fig. 3a4). The alternation between incorporation reactions with U-Cy5, C-Cy5, and a G and A spacer was repeated several times.

Results and Discussion

A series of experiments was performed to prove that the DNA polymerase enzyme can operate with high fidelity and discrimination when using the modified nucleotide triphosphates and anchored DNA templates. DNA polymerase and a mismatched species of labeled nucleotide were incubated in the flow cell for 5 min and washed out. The surface was imaged and the positions of the fluorescent molecules that appeared on the surface were compared with the positions of the DNA molecules that were detected beforehand (Fig. 2). A second reaction was then performed with the correct labeled nucleotide triphosphate. When the images are superimposed, a high correlation between the primer position and the nucleotide position was found for the correct match (i.e., when dUTP-Cy3 matches the available template base, A) and a low correlation for mismatched bases (dCTP-Cy3 does not match A). The pairwise relationships between the molecules can be summarized in a correlogram in which the positions of detected molecules in the two fields of view are cross-correlated with each other (Fig. 2a6). Variations of this experiment were successfully completed >20 times, using each of the four labeled bases as positive and negative controls, and DNA templates of differing sequences. There is no significant correlation in the absence of either the polymerase enzyme or the correct complementary nucleotide triphosphate. Thus, specific template-dependent incorporation of labeled nucleotide into the anchored DNA molecules is catalyzed by the DNA polymerase and can be detected at the single molecule level. We have also shown that multiple incorporations of fluorophores in a single template can be quantitated by their intensity and stepwise photobleaching (data not shown). This ability can be used to measure consecutive repetitions of a particular base in a sequence.

The polymerase is active on anchored single DNA templates. (a) Correlation between the locations of the DNA templates and the labeled nucleotides. (a1) Image of the Cy3-labeled template locations. (Scale bar = 10 μm.) (a2) Positions of each molecule in a1, found by software. (a3) Image of the surface after the template fluorophores are photobleached and an incorporation reaction is performed. (a4) Positions of the molecules in a3, found by software. (a5) Overlay of the template positions with the labeled nucleotide positions. (a6) There is a high degree of correlation between template and nucleotide positions. (b) The polymerase maintains selectivity and fidelity in these experiments. In consecutive incorporations (b1), the polymerase correctly refused to incorporate C-Cy3. (b2) The next reaction correctly incorporated U-Cy3. (b3) After filling the gap with unlabeled A and G and by using FRET from the first incorporation, the polymerase correctly refused to incorporate U-Cy5. (b4) The next reaction correctly incorporated C-Cy5.

While attempting to iterate this scheme to determine the sequence of the DNA templates, we discovered that multiple washings with the labeled nucleotides led to increasing nonspecific binding of unincorporated nucleotides, rendering interpretation of the experiment ambiguous beyond two or three incorporations. We therefore suppressed this background noise by the use of single-pair FRET (24, 25) as a highly localized excitation source to monitor incorporation of nucleotides in the templates. The first labeled nucleotide to be incorporated contained a donor fluorophore (Cy3), and successive nucleotides were labeled with an acceptor fluorophore (Cy5) (Fig. 2b). The acceptor fluorescence was detected by exciting the donor, and the acceptors thus fluoresced only if they were in the vicinity of a donor (Fig. 3a3). The noise from a nonspecific attachment of labeled nucleotides to the surface became very small because the effective illumination region was only a few nanometers. After each incubation and FRET signal detection, the surface was illuminated with a red laser that bleached the acceptor but left the donor unharmed (Fig. 3a4).

Sequencing single molecules with FRET. (a) Schematic illustrating extension of the template through the first few steps of sequencing. (b) Intensity trace from a single template molecule through the entire session. The green and red lines represent the intensity of the Cy3 and Cy5 channels, respectively. The label at each column indicates the last nucleotide to be incubated, and successful incorporation events are marked with an arrow. (c) FRET efficiency as a function of the experimental epoch.

This method was used to determine the order of appearance of A's and G's in a template sequence by alternating between incorporation of labeled U and C while filling the gaps with unlabeled A and G. A trace of the emission intensities of an individual DNA molecule as a function of time is shown in Fig. 3b. A simultaneous drop in the donor and rise in the acceptor emission indicates FRET event, and hence, incorporation. These events can be assessed in Fig. 3c, where the FRET efficiency (25), Ia/(Ia + Id), is calculated; Id and Ia are the average intensities of the donor (Cy3) and the acceptor (Cy5), respectively. The FRET efficiency has higher signal to noise than quantitation of either channel alone because it combines information from both fluorophores while simultaneously normalizing the relative intensities. The particular trace shown reads out the correct sequence fingerprint for template 1 (AAGAGA). Note the skip after the first G. This demonstrates that the sequencing scheme is asynchronous, an important feature that distinguishes sequencing at the single molecule level from the ensemble averaging inherent in macroscopic schemes. Thus, when an incorporation reaction is not carried to completion on a particular template molecule, it can be successfully completed in a later cycle without producing false information or interfering with data from other DNA templates in the field of view.

The sequence fingerprinting experiment was also performed with an independent template DNA sequence (template 2). In pooled data from experiments on templates 1 and 2, a total of 40 molecules reached 4 incorporations. Comparing the measured sequences to the set of all possible 4-mer sequences shows that the correct sequences for templates 1 and 2 can be discriminated with a 97% confidence level (Fig. 4). Moreover, these data show that a priori sequencing in which the template sequence is not known can be accomplished with an error rate of 0.04. For template 1, seven traces continue to the fifth incorporation and one continued to the sixth, all with the correct sequence. Taken together, these data show that the incorporation fidelity of the DNA polymerase is high enough for reliable readout of the template sequence and prove the principle of single molecule DNA sequencing by polymerase extension.

Histogram of sequence space for 4-mers composed of A and G. All traces that reached at least four incorporations are included. (a) Results for template 1 (actual sequence fingerprint: AAGA). (b) Results for template 2 (actual sequence fingerprint: AGAA).

What are the prospects for turning this method into a practical DNA sequencing technology? The experiments are already highly parallel and the reagent exchanges are straightforward to automate, either with conventional or microfluidic plumbing. The inherent limitation of FRET in readout length (5 nm, and thus 15 bp) may be solved either by incorporating a new donor-labeled base at regular intervals or by placing the donor on the polymerase (32). The read length of successive incorporations is limited by the stepwise incorporation yield of the labeled nucleotides, which was ≈50% for the experiments described in Figs. 3 and 4. We believe this yield is largely determined by the interaction of the DNA polymerase with the modified nucleotide triphosphates and is one of the reasons why we chose to measure sequence fingerprints of only two of the four nucleotides. This interpretation is supported by sequence data taken on template 3, which was designed so that labeled nucleotides (Fig. 5) would be incorporated in adjacent positions. The yield was reduced to ≈10% for the second incorporation. Others have shown that nucleotide analogs with longer linker arms can be incorporated into DNA templates with significantly higher yields (33). It is also possible to use a more promiscuous polymerase (34, 35) or nucleotide analogues whose dye can be chemically removed at each step. Because some of these ideas have already been used to synthesize long DNA molecules with every base replaced with nucleotide analogues (20, 34–36), we believe that there are no fundamental or practical obstacles to extending our results to create a highly parallel and sensitive single molecule sequencing technology.

There are several practical implications that result from this work. For de novo genome sequencing, this method allows a high degree of parallelization and sparing use of reagents. For example, in the work described here a few hundred templates were anchored in a 100-μm diameter field of view. With an automated scanning stage, one can extrapolate to 12 million templates simultaneously sequenced in a 25-mm square, using only a few microliters of reagents. With so many templates the ability to asynchronously sequence, and thus not having to drive every enzymatic reaction to completion in each cycle, becomes a crucial advantage that will allow one to choose incorporation times that minimize unwanted side reactions. The capability to sequence many single molecules in parallel means that it should be possible to make direct measurements of gene expression from single cells. If the mRNA from a cell is bound to the glass surface, it can either be sequenced directly with reverse transcriptase or sequenced with DNA polymerase after a DNA strand is synthesized. In many cases it will only be necessary to sequence 15–20 nucleotides to get a unique gene fingerprint that can be used in conjunction with a genome sequence to determine the identity of the expressed gene. Alternatively, sufficiently long sequence fingerprints of two of the four bases may also be used to uniquely identify genes. Finally, it should be possible to use this assay system to study basic biochemical questions concerning DNA polymerase activity in general and fidelity in particular.

Acknowledgments

We thank Henry Lester and Marc Unger for fruitful discussions. Partial financial support was provided by the Lester Deutsch fellowship, National Institutes of Health Grant HG01641-01, the Packard Foundation, and the Burroughs–Wellcome Foundation.

Footnotes

↵* To whom correspondence should be addressed. E-mail: quake{at}caltech.edu.

Blood-sucking sand flies from disparate global regions have a predilection for feeding on the marijuana plant (Cannabis sativa), and the findings hint at a potential avenue for controlling sand flies, which can transmit leishmaniasis.