2 Adleman 2 The tools of molecular biology are used to solve an instance of the directed Hamiltonian path problem. A small graph is encoded in molecules of DNA and the `operations' of the computation are performed with standard protocols and enzymes. This experiment demonstrates the feasibility of carrying out computations at the molecular level.

3 Adleman 3 In 1959 Richard Feynman gave a visionary talk describing the possibility of building computers which were \sub-microscopic" (1). Despite remarkable progress in computer miniaturization this goal has yet to be achieved. In this report the possibility of computing directly with molecules is explored. A directed graph G with designated vertices v in and v out,issaidtohave a Hamiltonian path (2) if and only if there exists a sequence of compatible `one way' edges e 1 e 2 ::: e z (that is, a `path') which begins at v in,endsat v out and enters every other vertex exactly once. Figure 1 shows a graph which for v in = 0 and v out = 6 has a Hamiltonian path, given by the edges 0!1,1!2,2!3,3!4,4!5,5!6. If the edge 2!3 were removed from the graph then the resulting graph with the same designated vertices would not have a Hamiltonian path. Similarly if the designated vertices were changed to v in =3,v out = 5 there would be no Hamiltonian path (since for example there are no edges entering vertex 0). There are well known algorithms for deciding whether an arbitrary directed graph with designated vertices has a Hamiltonian path or not. However, all

4 Adleman 4 known algorithms for this problem have exponential worst-case complexity and hence there are instances of modest size for which these algorithms require an impractical amount of computer time to render a decision. Since the directed Hamiltonian path problem has been proven to be NP-complete, it seem likely that no ecient (that is, polynomial time) algorithm exists for solving it (2,3). The following (non-deterministic) algorithm solves the directed Hamiltonian path problem: Step 1: Generate random paths through the graph. Step 2: Keep only those paths which begin with v in and end with v out. Step 3: If the graph has n vertices, then keep only those paths which enter exactly n vertices. Step 4: Keep only those paths which enter all of the vertices of the graph at least once. Step 5: If any paths remain, say \YES", otherwise say \No".

5 Adleman 5 The graph shown in Figure 1 with designated vertices v in =0andv out =6 was solved using the algorithm above implemented at the molecular level. Note that the labeling of the vertices in such away that the (unique) Hamiltonian path enters the vertices in sequential order, is only for convenience in this exposition and provides no advantage in the computation. The graph is small enough that the Hamiltonian path can be found by visual inspection however, it is large enough to demonstrate the feasibility of this approach. It seems clear that the methods described here could be scaled-up to accommodate much larger graphs. To implement Step 1 of the algorithm, each vertex i in the graph was associated with a random 20-mer sequence of DNA denoted O i.for each edgei!j in the graph an oligonucleotide O i!j was created which was the 3' 10-mer of O i (unless i = 0 in which caseitwas all of O i ) followed by the 5' 10-mer of O j (unless j =6inwhichcaseitwas all of O j ). Notice that this construction preserves edge orientation. For example, O 2!3 will not be the same as O 3!2. The 20-mer sequence Watson-Crick complementary to O i was denoted O i. Figure 2 contains examples.

6 Adleman 6 For each vertex i in the graph (except i = 0 and i = 6) 50 pmol of O i and for each edgei!j in the graph 50 pmol of O i!j were mixed together in a single ligation reaction (4). The O i served as splints to bring oligonucleotides associated with compatible edges together for ligation (see Figure 2). Hence the ligation reaction resulted in the formation of DNA molecules encoding random paths through the graph. The scale of this ligation reaction far exceeded what was necessary for the graph under consideration. For each edge in the graph, approximately copies of the associated oligonucleotides were added to the ligation reaction. Hence it is likely that vast numbers of DNA molecules encoding the Hamiltonian path were created. In theory the creation of a single such molecule would be sucient. Hence, for this graph, sub-attomol quantities of oligonucleotides would probably have been sucient. Alternatively, amuch larger graph could have been processed with the pmol quantities employed here. To implement Step 2 of the algorithm, the product of Step 1 was amplied by polymerase chain reaction (PCR) using primers O 0 and O 6 (5). Thus

7 Adleman 7 only those molecules encoding paths which begin with vertex 0 and end with vertex 6 were amplied. To implement Step 3 of the algorithm, the product of Step 2 was run on an agarose gel and the 140bp band (corresponding to dsdna encoding paths entering exactly seven vertices) was excised and soaked in ddh 2 Otoextract DNA (6). This product was PCR amplied and gel puried several times to enhance purity. To implement Step 4 of the algorithm, the product of Step 3 was anity puried using a biotin-avidin magnetic beads system. This was accomplished by rst generating single stranded DNA from the dsdna product of Step 3 and then incubating the ssdna with O 1 conjugated to magnetic beads (7). Only those ssdna molecules which contained the sequence O 1 (and hence encoded paths which entered vertex 1 at least once) annealed to the bound O 1 and were retained. This process was repeated successively with O 2, O 3, O 4 and O 5. To implement Step 5, the product of Step 4 is PCR amplied and run on a

8 Adleman 8 gel. Figure 3 shows the results of these procedures. In panel A, lane 1 is the result of the ligation reaction in Step 1. The smear with striations is consistent with the construction of molecules encoding random paths through the graph (8). Panel A, lanes 2-5 show the results of the PCR reaction in Step 2. The dominant bands correspond to the amplication of molecules encoding paths which begin at vertex 0 and end at vertex 6. Panel B shows the results of a `graduated PCR' performed on the ssdna molecules generated from the band excised in Step 3. Graduated PCR is a method for `printing' results. Graduated PCR is performed by running 6 dierent PCR reactions using as right primer O 0 and left primer O i in the i th tube. For example, on the molecules encoding the Hamiltonian path 0!1,1!2,2!3,3!4,4!5,5!6, graduated PCR will produce bands of 40bp,60bp,80bp,100bp,120bp,140bp in successive lanes. On the molecules encoding the path 0!1,1!3,3!4,4!5,5!6, graduated PCR will produce bands of 40bp,x,60bp,80bp,100bp,120bp in successive lanes where the x denotes the absence of a band in lane 2 (corresponding to the omission of vertex

9 Adleman 9 2 along this path). On molecules encoding the path 0!3,3!2,2!3,3!4,4!5,5!6, graduated PCR will produce bands of x,60bp,80bp/40bp,100bp,120bp,140bp in successive lanes, where 80bp/40bp denotes that both a 40bp and an 80bp band will be produced in lane 3 (corresponding to the double passage of vertex 3 along this path). The most prominent bands in Panel B appear to be those which would arise from superimposing the bands predicted for the three paths described above. The bands corresponding to path 0!1,1!3,3!4,4!5,5!6, were not expected and suggest that the band excised in Step 3 contained contamination from 120bp molecules. However, such lowweight contamination is not a problem since it does not persist through Step 4. Panel C shows the results of graduated PCR applied to the molecules in the nal product of Step 4. The bands demonstrate that the these molecules encode the Hamiltonian path 0!1,1!2,2!3,3!4,4!5,5!6 (9). The computation above required approximately 7 days of lab work. Step 4 (magnetic bead separation) was the most labor intensive, requiring a full day at the bench. In general, using the algorithm above, the number of proce-

10 Adleman 10 dures required should grow linearly with the number of vertices in the graph. The labor required for large graphs might be reduced by using alternative procedures, automation or less labor intensive molecular algorithms. The number of dierent oligonucleotides required should grow linearly with the number of edges. The quantity ofeach oligonucleotide needed is a rather subtle graph theoretic question (8). Roughly, the quantity used should be just sucient to insure that during the ligation step (Step 1) a molecule encoding a Hamiltonian path will be formed with high probability ifsuch a path exists in the graph. This quantity should grow exponentially with the number of vertices in the graph. The molecular algorithm used here was rather naive and as with classical computation, nding improved algorithms will extend the applicability of the method. As the computation is scaled up, the possibility of errors will need to be looked at carefully. During Step 1, the occasional ligation of incompatible edge oligonucleotides may result in the formation of molecules encoding `pseudo paths' which do not actually occur in the graph. While such molecules may be amplied during Step 2 and persist through Step 3, they

11 Adleman 11 seem unlikely to survive the separation in Step 4. Nonetheless, at the completion of a computation, it would be prudent to conrm that a putative Hamiltonian path actually occurs in the graph. During the separation step, molecules encoding Hamiltonian paths may fail to bind adequately and be lost, while molecules encoding non-hamiltonian paths may bind nonspecically and be retained. The latter problem might be mitigated by more stringent or repeated separation procedures. The former problem might be dealt with by periodically applying PCR with primers designed to amplify Hamiltonian paths (in the example above primers O 0 and O 6 ). The balanced use of these techniques may be adequate to control such errors. The choice of random 20-mer oligonucleotides for encoding the graph was based on the following rationale. First, since mer oligonucleotides exist, choosing randomly made it unlikely that oligonucleotides associated with dierent vertices would share long common subsequences which might result in `unintended' binding during the ligation step (Step 1). Second, it was guessed that with high probability potentially deleterious (and presumably rare) features such assevere hairpin loops would not be likely to arise. Finally, choosing 20-mers assured that binding between `splint' and `edge'

12 Adleman 12 oligonucleotides would involve ten nucleotide pairs and would consequently be stable at room temperature. This approach was successful for the small graph considered above however, how to best proceed for larger graphs may require additional research. What is the power of this method of computation? It is premature to give denitive answers however some remarks seem in order. Atypical desk top computer can execute approximately 10 6 operations per second. The fastest super computers currently available can execute approximately operations per second. If the ligation (concatenation) of two DNA molecules is considered as a single operation and if it is assumed that about half of the approximately `edge' oligonucleotides in Step 1 were ligated, then during Step 1 approximately operations were executed. Clearly, this step could be scaled-up considerably and or more operations seems entirely plausible (for example by usingmol rather than pmol quantities). At this scale, the number of operations per second during the ligation step would exceed that of current super computers by more than a thousand fold. Further, hydrolysis of a single molecule of ATP to AMP

13 Adleman 13 plus pyrophosphate provides the energy (G = ;8kcal/mol) for one ligation operation (10,11) hence in principle 1 joule is sucient for approximately such operations. This is remarkable energy eciency, considering that the second law of thermodynamics dictates a theoretical maximum of (irreversible) operations per joule (at 300 K) (12,13). Existing super computers are far less energy ecient executing at most 10 9 operations per joule. The energy consumed during other parts of the molecular computation such as oligonucleotide synthesis and PCR should also be small in comparison to that consumed by super computers. Finally, storing information in molecules of DNA allows for an information density of approximately 1 bit per cubic nm, a dramatic improvementover existing storage media such as video tape which store information at a density ofapproximately 1 bit per cubic nanometers. Thus the potential of molecular computation is impressive. What is not clear is whether such massivenumbers of inexpensive operations can be productively used to solve real computational problems. One major advantage of electronic computers is the variety of operations they provide and the exibilitywithwhich these operations can be applied. While two 100 digit integers can be multiplied quite eciently on an electronic computer it would be a daunting task to do such a calculation on a

14 Adleman 14 molecular computer using currently available protocols and enzymes (14). Nonetheless, for certain intrinsically complex problems, such as the directed Hamiltonian path problem where existing electronic computers are very inecient and where massively parallel searches can be organized to take advantage of the operations that molecular biology currently provides, it is conceivable that molecular computation might compete with electronic computation in the near term. It is a research problem of considerable interest to elucidate the kinds of algorithms which are possible using molecular methods and the kinds of problems which these algorithms can eciently solve (12,15,16). For the long term one can only speculate about the prospects for molecular computation. It seems likely that a single molecule of DNA can be used to encode the `instantaneous description' of a Turing machine (17) and that currently available protocols and enzymes could (at least under idealized conditions) be used to induce successive sequence modications which would correspond to the execution of the machine. In the future, research in molecular biology may provide improved techniques for manipulating macro-

15 Adleman 15 molecules. Researchinchemistry may allow for the development of synthetic `designer' enzymes. One can imagine the eventual emergence of a general purpose computer consisting of nothing more than a single macromolecule conjugated to a ribosome-like collection of enzymes which act upon it.

18 Adleman 18 ticles as above and washed 3 times in 400l of 0.5x SSC. ssdna was then incubated with these particles in 150l 0.5x SSC for 45 minutes at room temperature with constant shaking. Particles were washed 4 times in 400l of 0.5x SSC to remove unbound ss- DNA and then heated to 80 C in 100 l ddh 2 Ofor5minutes to release ssdna bound to O 1. The aqueous phase with ssdna was retained. This process was then repeated for O 2, O 3, O 4 and O 5. [8]From a graph theoretic point of view, the use of equal quantities of each oligonucleotide in the ligation reaction is not optimal and leads to the formation of excess numbers of molecules encoding paths which do not start at vertex0ordonotendatvertex 6. A better way to proceed is to rst calculate a ow on the graph and use the results to determine the quantity ofeach oligonucleotide that is necessary. [9]On an n vertex graph G with designated vertices v in and v out there may be multiple Hamiltonian paths. If it is desirable to have an explicit description of some Hamiltonian path, that can be accomplished by extending the algorithm as follows. At the end

19 Adleman 19 of step 4 one has a solution (in the chemistry sense) containing molecules encoding all Hamiltonian paths for < G v in v out >. The graduated PCR performed at the end of step 4 will produce the superimposition of the bands corresponding to all of these Hamiltonian paths in the n ; 1 successive lanes. For some lane i, a band of least weight (40bp) will appear. This indicates that some Hamiltonian path begins with v in and proceeds directly to vertex i. By PCR amplifying the solution with primers O i and O n, running a gel and excising the 20 (n ; 1)bp band, only those molecules encoding such Hamiltonian paths will be retained. One now has a solution containing molecules encoding all Hamiltonian paths for <G 0 i v out > where G 0 is the graph where vertex v in has been removed. One now iterates. [10]J.D. Watson, N.H. Hopkins, J.W. Roberts, J.A. Steitz and A.M. Weiner, Molecular Biology of the Gene (The Benjamin/Cummings Publishing Co., Menlo Park, CA, ed. 3, 1987). [11]M.J. Engler and C.C. Richardson, in The Enzyme P.D. Boyer, Ed. (Academic Press Inc., New York, NY, ed. 3, 1982) vol. XVB pp.

Chapter IV Molecular Computation These lecture notes are exclusively for the use of students in Prof. MacLennan s Unconventional Computation course. c 2013, B. J. MacLennan, EECS, University of Tennessee,

Quantum and Non-deterministic computers facing NP-completeness Thibaut University of Vienna Dept. of Business Administration Austria Vienna January 29th, 2013 Some pictures come from Wikipedia Introduction

Real-Time PCR Vs. Traditional PCR Description This tutorial will discuss the evolution of traditional PCR methods towards the use of Real-Time chemistry and instrumentation for accurate quantitation. Objectives

Revised Fall 2011 The Techniques of Molecular Biology: Forensic DNA Fingerprinting The techniques of molecular biology are used to manipulate the structure and function of molecules such as DNA and proteins

1 of 5 Illumina TruSeq DNA Adapters De-Mystified James Schiemer The key to sequencing random fragments of DNA is by the addition of short nucleotide sequences which allow any DNA fragment to: 1) Bind to

Roche Applied Science Technical Note No. LC 18/2004 Purpose of this Note Assay Formats for Use in Real-Time PCR The LightCycler Instrument uses several detection channels to monitor the amplification of

Rapid Acquisition of Unknown DNA Sequence Adjacent to a Known Segment by Multiplex Restriction Site PCR BioTechniques 25:415-419 (September 1998) ABSTRACT The determination of unknown DNA sequences around

1. True or False? A typical chromosome can contain several hundred to several thousand genes, arranged in linear order along the DNA molecule present in the chromosome. True 2. True or False? The sequence

On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China zeminjin@eyou.com, x.li@eyou.com Abstract In this paper we

Electronic Supplementary Material (ESI) for ChemComm. This journal is The Royal Society of Chemistry 215 Supplementary Information Real-time monitoring of rolling circle amplification using aggregation-induced

How is genome sequencing done? Using 454 Sequencing on the Genome Sequencer FLX System, DNA from a genome is converted into sequence data through four primary steps: Step One DNA sample preparation; Step

TECHNICAL BULLETIN Reverse Transcription System Instruc ons for use of Product A3500 Revised 1/14 TB099 Reverse Transcription System All technical literature is available on the Internet at: www.promega.com/protocols/

GM and non GM supply chains: Their CO EXistence and TRAceability Outcomes of Co Extra Comparison of different real time PCR chemistries and their suitability for detection and quantification of genetically

DNA Sequence Analysis Two general kinds of analysis Screen for one of a set of known sequences Determine the sequence even if it is novel Screening for a known sequence usually involves an oligonucleotide

OpenStax-CNX module: m44488 1 DNA Replication in Prokaryotes OpenStax College This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 By the end of this section,

Lab 5: DNA Fingerprinting You are about to perform a procedure known as DNA fingerprinting. The data obtained may allow you to determine if the samples of DNA that you will be provided with are from the

DNA Fingerprinting Unless they are identical twins, individuals have unique DNA DNA fingerprinting The name used for the unambiguous identifying technique that takes advantage of differences in DNA sequence

Beginner s Guide to Real-Time PCR 02 Real-time PCR basic principles PCR or the Polymerase Chain Reaction has become the cornerstone of modern molecular biology the world over. Real-time PCR is an advanced

CHAPTER 2 STRUCTURES OF NUCLEIC ACIDS What is the chemical structure of a deoxyribonucleic acid (DNA) molecule? DNA is a polymer of deoxyribonucleotides. All nucleic acids consist of nucleotides as building

RT-PCR: Two-Step Protocol We will provide both one-step and two-step protocols for RT-PCR. We recommend the twostep protocol for this class. In the one-step protocol, the components of RT and PCR are mixed

Introduction to Logic in Computer Science: Autumn 2006 Ulle Endriss Institute for Logic, Language and Computation University of Amsterdam Ulle Endriss 1 Plan for Today Now that we have a basic understanding

Essentials of Real Time PCR About Real-Time PCR Assays Real-time Polymerase Chain Reaction (PCR) is the ability to monitor the progress of the PCR as it occurs (i.e., in real time). Data is therefore collected

Mitochondrial DNA Analysis Lineage Markers Lineage markers are passed down from generation to generation without changing Except for rare mutation events They can help determine the lineage (family tree)

APPENDIX 1: Structures of Base Pairs Involving at Least Two Hydrogen Bonds Provided by Mark E. Burkard and Douglas H. Turner Department of Chemistry, University of Rochester Rochester, New York 14627-0216

Boulder Dash is NP hard Marzio De Biasi marziodebiasi [at] gmail [dot] com December 2011 Version 0.01:... now the difficult part: is it NP? Abstract Boulder Dash is a videogame created by Peter Liepa and

MIT Department of Biology 7.02 Experimental Biology & Communication, Spring 2005 Primer design Where do primers come from? generally purchased from a company, who makes them by chemical synthesis How do

Identification of the VTEC serogroups mainly associated with human infections by conventional PCR amplification of O-associated genes 1. Aim and field of application The present method concerns the identification

Gene Mapping Techniques OBJECTIVES By the end of this session the student should be able to: Define genetic linkage and recombinant frequency State how genetic distance may be estimated State how restriction

GENOTYPING ASSAYS AT ZIRC A. READ THIS FIRST - DISCLAIMER Dear ZIRC user, We now provide detailed genotyping protocols for a number of zebrafish lines distributed by ZIRC. These protocols were developed