Reading DNA with silicon—or via the glow of fireflies

In our latest installment in our series on DNA sequencing, we discuss …

Many of the modern DNA sequencing techniques involve partially overlapping methods, so our past articles have provided a nice foundation. In this case, the tethered PCR approach that we mentioned in covering SOLiD sequencing, in which beads wound up covered with multiple copies of an identical sequence, is also used by an otherwise unrelated approach, employed by Roche's 454 sequencing machines. The chemistry of the 454 sequencing reaction is radically different, and goes by the catchy name pyrosequencing.

Pyrosequencing is distinct from the rest of the techniques we'll discuss, which focus on attaching labels to the DNA bases that are added to a growing DNA strand. Instead, the pyrosequencing chemistry focuses on something that's an afterthought in most sequencing methods: the phosphate group that gets kicked off each time a DNA base is added. In our initial diagram for DNA polymerization, we showed what happens when a new triphosphate base is added to a growing DNA strand.

Adding a base to DNA results in the release of a pyrophosphate molecule.

Left off that image was a byproduct of this reaction: two linked phosphates (called a pyrophosphate), which are cleaved off as the sugar-phosphate-sugar bond forms. As it turns out, it's possible to measure the amount of pyrophosphate produced, and therefore the progress of DNA production. The pyrophosphate group is what gives pyrosequencing its catchy name (disappointed readers who were expecting flames to be involved are encouraged to read on regardless).

Building DNA one base at a time

The traditional Sanger sequencing technique provides the DNA polymerase with all the bases it needs to copy DNA; the polymerase will continue to add bases until it hits a terminator and stops. 454 sequencing is very different, in that it relies on feeding the polymerase only one base at a time. This changes the dynamics of the reaction dramatically. There's a chance that the base that's been added to the reaction won't be the appropriate one to incorporate—it doesn't base pair with the first open position. In this case, the polymerase will simply sit on the DNA and wait.

If it is the appropriate one, then there will be a burst of activity, as the polymerase quickly adds it to the growing strand. If there's a run of identical bases, the polymerase will add as many as it can until it runs into a position where it can't find the right match. When it reaches the first mismatch, it will stall and start waiting for the right base to come along again.

In 454 sequencing, the sequencing machines wash each of the four bases across the DNA beads, one by one. The polymerases on those beads move forward in fits and starts, incorporating bases as they can, and often waiting out periods where they're supplied with nothing but a mismatched base.

Each time the right base is supplied, however, hundreds of polymerases will quickly incorporate it. This creates a burst of pyrophosphates, liberated as the bases are added. The size and shape of the peak will be proportional to the number of bases that get added. So, for example, a run of four identical bases will produce a larger and more sustained burst of pyrophosphate than when a single base is added. And, because the sequencing machine keeps track of which base was present in the reaction solution when the pyrophosphate peak gets generated, it's easy to associate each of the peaks with a specific base.

When a base added to the reaction chamber matches the one from the target sequence (upper left), it results in a burst of pyrophosphate. The size of the burst corresponds to the number of identical consecutive bases.

Measuring the pyrophospate

Having a burst of pyrophosphate present is only useful if you have a way of measuring it. The 454 machines use a pair of enzymes to convert the presence of pyrophosphate into a burst of light. These enzymes are packed onto beads that get mixed in with the DNA-covered beads on which the reactions take place.

The first enzyme is sulfurylase, which combines the pyrophosphate with a relative of ATP, the cell's standard energy-carrying molecule. The enzyme takes an energy-depleted form of ATP, called AMP, that has a sulfur attached to it. It replaces the sulfur with the pyrophosphate, resulting in a molecule of ATP. That ATP provides the energy for a light-producing reaction.

That reaction is catalyzed by the luciferase enzyme, which is famous for giving fireflies their glow. This enzyme uses ATP to power a chemical reaction in which a chemical (called luciferin) is oxidized, emitting a photon in the process. If there are sufficient quantities of luciferin around, the amount of ATP becomes a rate-limiting item. And, in the reaction conditions used in 454 sequencing, pyrophosphate derived from DNA polymerization limits the amount of ATP that's around. So, ultimately, the production of light is limited by whether or not the DNA polymerase is adding bases to the DNA strand. The more bases that are added at once, the longer and more intense the burst of light is.

Two enzymes (sulfurylase and luciferase) are needed to convert pyrophosphate released by DNA polymerase to light.

A simple photodetector can pick up the bursts of light from a tiny reaction chamber, and relate it to the presence of one of the four bases.

When 454 Sequencing first hit the market, it provided short reads and was rather error prone. Over time, the company has refined its reaction conditions and improved the hardware to increase its efficacy. So, for example, the slides that contain the reactions have been coated in titanium dioxide, which is highly reflective and stops signals from one reaction from bleeding over into neighboring reaction chambers. The net result is that, by this time last year, the Broad Genome Center was regularly getting over 400 base reads from each machine, and plans are to extend it out further to Sanger-like reads.

In the meantime, researchers gained enough experience with comparing the signals obtained using 454 machines to well defined sequences, allowing a greater sense of how errors creep in, and how to recognize them in software. This has enabled a "quality score" to be attached to each base, something that's very helpful for analysis.

Light is overrated

The DNA polymerase reaction liberates a hydrogen from the end of the growing strand (highlighted in red), which is detected by the Ion Torrent sensor chip.

I started writing this article a few months ago, and have been swamped with other work since. That turned out to be lucky, because in the intervening time, a company called Ion Torrent exited stealth mode. The company appears to take the rough outlines of the 454 process, but uses a different reaction byproduct to get rid of everything about 454 that's complicated. The pyrophosphate, ATP, luciferase, photodetectors? They're gone in Ion Torrent machines.

For biomolecules like DNA, hydrogen atoms are so common that the diagrams we use rarely ever bother to include them. But, just about any time you rearrange a few bonds, there are hydrogen atoms to be accounted for when balancing the chemical books, and that's definitely the case when it comes to the addition of a new base to the DNA backbone. That hydrogen would normally pair with the pyrophosphate, but phosphate is a fairly strong acid. As a result, the hydrogen remains in the solution as a free proton. So, adding a base to a bunch of DNA molecules results in burst of protons appearing in the reaction solution.

Ion Torrent's trick is that they have a chip that can register the passage of these protons (it's not clear from their literature whether they rely on them diffusing to their sensor, or use a voltage difference to drive them there). As the protons hit, they create an electrical signal that can be read out just like the burst of light from luciferase is. And, as new solutions are added to the reaction chamber, the protons will quickly be diluted out to background levels, or end up taken out of solution by pH buffers.

The key advantage here isn't just the simplicity provided by getting rid of all the additional chemical reactions used by 454 machines. Eliminating the reaction also gets rid of lots of consumables—the enzymes, the luciferin, etc.—that help make a sequencing reaction expensive. Provided that the ion-sensitive chips have a decent lifetime, that can make owning an Ion Torrent machine significantly cheaper.

Once again you have written a wonderfully informative article. I work with Solid, 454, and Illumina sequencing technologies, but it is difficult for me to explain what exactly I do to my friends/family. I have been pointing them to your past few articles with great results.

It is almost correct. The bottom data represents the illumination evoked by the machine. First it tried adding a G. No light, so that was wrong. Then it tried an A. Wrong. Then a T: Light! Small burst, so one T is the first correct base added to the primer, reflecting the first base (A) listed as the template. The next addition that yielded light was a C, corresponding to a G in the template. But I think it should have happened four cycles earlier, immediately after the successful T addition.OK?

Edit: Oh! Magic! The picture is now corrected!. Never mind the now missing error.

The only other quibble, for non-biologists, is the caption of the sequence figure. Rather than "matches", you might say "Is complementary to, reflecting the base-pairing rules".But excellent article and I hope we meet one day.

I know that it is considered "3rd generation", but there is much interest in the field of "Strobe Sequencing", specifically PacBio's system. Can we expect an article focusing on this technology sometime in the future?

I know that it is considered "3rd generation", but there is much interest in the field of "Strobe Sequencing", specifically PacBio's system. Can we expect an article focusing on this technology sometime in the future?

Yes, I can do PacBio. In the mean time, send everyone you can to this story; the more traffic, the sooner i can justify spending the time on the next installment.

Does the fact that nucleotides are fed to the PCR emulsion in a specific ordered cycle - GATC here - which, while important here to determine what bases are present on the complement strand, create any potential bias in terms of base calling?

Does the fact that nucleotides are fed to the PCR emulsion in a specific ordered cycle - GATC here - which, while important here to determine what bases are present on the complement strand, create any potential bias in terms of base calling?

I don't think so, based on the apparent efficiency and accuracy of incorporation. Here and there, individual molecules may end up with a mismatch or fail to incorporate, but that will just leave them out of synch with the larger population, and their signal should be swamped. Over time, i'd imagine that the out of synch population rises, which is probably what limits the reads to four hundred bases.