The 17th Conference on DNA Computing and Molecular Programming (DNA 17) was held Sept. 19-23 in Pasadena, California, at Caltech. In previous years the conference was held in early summer, but from now on it will be late summer/early fall in order to stagger 6 months apart from its sister conference FNANO (Foundations of Nanoscience), held every April in Snowbird, Utah.

The conference is not dedicated to theoretical computer science, of course, but like many inter-disciplinary fields such as algorithmic game theory or computational biology, theoretical computer science finds its way into many results in the field. As Luca Cardelli said during the conference, while the computing revolution was about the systematic manipulation of information, nanoscience is about the systematic manipulation of matter, so it is not surprising that theoretical computer scientists are finding interesting problems in this area. I find it fascinating to watch a speaker prove a result relating DNA self-assembly to context-free grammars, just before the next speaker shows atomic force microscopy images of a self-assembled DNA nanostructure.

There’s always some impedance mismatch when experimentalists and theorists in any field get together to talk, but I believe our field promotes excellent cross-communication. The program committee, for instance, had members from the following university departments:

Biological Chemistry and Molecular Pharmacology (1)

Computer Science (17)

Biophysics (1)

Chemistry (4)

Mathematics (1)

Electrical Engineering (3)

Bioengineering (3)

Bioinformatics (4)

Physics (2)

Computation & Neural Systems (1)

Cognitive Science (1)

Systems Biology (1)

Interesting Conference Features

First I want to discuss some interesting features of the conference that I think could be beneficially adopted by general TCS conferences (some of these we are already seeing in TCS). As one of the local organizers, I was partially responsible for implementing some of these ideas, and I think it was worth the effort.

Tutorials

The first day was dedicated to three 90-minute lecture-style tutorials (slides available), and in parallel, there was an all-day wet-lab tutorial run by Elisa Franco, Josh Bishop, and Jongmin Kim,1 in which the students constructed a chemical oscillator based on Jongmin’s and Elisa’s work on constructing chemical oscillators from DNA and transcription enzymes. Most of the tutorial attendees were theoreticians who wanted to see what all the fuss was about in the lab. It seems that about half of the oscillators worked properly on the first try. (They only got one try because the period of the oscillation is a few hours, so they had to run overnight.)

Tracks

One interesting aspect of the conference is the tracks, designed to appeal to both theoretical and experimental researchers. Track A looks familiar to TCS people: 15-page extended abstracts that appear in the conference’s LNCS proceedings. These are usually later submitted to CS journals such as SICOMP or TCS, or perhaps in the special issue of invited papers in Natural Computing associated with the DNA conference. Track B submissions are 1-page abstracts submitted for oral presentation only. Authors must provide a full paper for the program committee to judge, but the paper is not published. This is because Track B submissions are experimental results destined for eventual publication in physical science journals such as Nature, Science, or PNAS. These journals have much stricter requirements than CS journals regarding prior publication, so it is critical to the Track B presenters that nothing they submit can be construed as a publication.

Track C is posters, which are very common at physical science conferences and starting to make some headway at TCS conferences. I think posters are a great way to present your research, and I think the TCS community should adopt poster sessions at every conference. Maybe the person you most wanted to see your talk won’t attend it, but you can always grab them in the hallway and drag them over to your poster. It’s a great way to meet big shots in the field. There were three 90-minute poster sessions, with every poster at every session, and we encouraged the presenters to keep their posters up the whole week. This way, you could stand by your poster for a while, but you could also feel free to walk around to other posters without worrying that someone won’t get a chance to hear you explain your poster while you are away.

Panels

The panels consisted of four top researchers sitting at a table. Each gave a 5 minute talk about their vision for the future of the field, and then the audience could ask questions or heckle them for the next 25 minutes. These were a lot of fun. I think students especially benefited from the perspective given by high-level discussion of long-term research goals.

Impromptu Sessions

The impromptu sessions were a great idea, and I think all conferences could benefit from them. I think of them as a formalization of the idea that “the real conference interaction happens in the hallways” (as Lance Fortnow likestoremindus.) Often graduate students are intimidated by the idea of walking up to a couple of famous big-wigs talking in the hallway, even if they are talking about the student’s research area. For the impromptu sessions, there was a wiki where over the course of the week, anyone could schedule a session on any topic in a number of rooms that were reserved for the sessions. The sessions were required to be public, and I found it to be a great way for people to get together to chat about interesting problems, while inviting anyone else interested in the same problem to listen in or participate.

Theoretical Computer Science Results

I will highlight a few theoretical results that I found interesting. There were of course many great experimental results, and a lot of great CS talks on topics such as simulation, but since this is a TCS blog, I will focus on my favorite TCS-style results.

Self-Assembly and Context-Free Grammars

The winners of the best student paper award were Andrew Winslow and Sarah Eisenstat, for their excellent paper One-Dimensional Staged Self-Assembly, with Erik Demaine and Mashhood Ishaque.2 Fix a finite alphabet $latex \Sigma$ and a finite set $latex G$ of “glues”. A tile type is a square labeled with a symbol from $latex \Sigma$, with its east and west sides labeled with (different) glues from $latex G$ (such tiles, both 1D and 2D, can be experimentally implemented with DNA). Initially all tile types start in separate test tubes. When two test tubes are mixed, any tile can bind to the west of any other tile if the first tile’s east glue matches the second tile’s west glue. Subsequent mixing may bind whole rows of tiles together. After each mixing, it is assumed that individual tiles are washed away so that only terminal assemblies (assemblies that cannot attach to anything else in the tube) remain.

The goal: design a fixed set of tile types so that any string over $latex \Sigma$ can be “spelled” by efficiently mixing the tiles in the correct order. How efficiently? The authors show that if each intermediate test tube is required to contain only one terminal assembly, then the number of mixing stages required to spell the string $latex x$ is within a constant multiplicative factor of the smallest context-free grammar that produces the singleton language $latex \{x\}$ (and they show that this bound is tight).

What does this mean? There is a linear-time $latex O(\log n)$-approximation algorithm for finding the smallest context-free grammar representing a string in this way (due to Sakamoto), which automatically translates to a linear-time algorithm for finding efficient mixing protocols for self-assembling one-dimensional patterns (implemented by the authors; here is an efficient mixing to spell the final verse of Edgar Allen Poe’s “The Raven” with DNA tiles).

However, if intermediate mixing stages are allowed to contain multiple terminal assemblies, even though the final stage is required to have only one terminal assembly (the assembly spelling $latex x$), then the number of mixing stages can be dramatically reduced (by a multiplicative factor of at least $latex \frac{n}{\log n}$).

Anne, Alan, Jan, and Chris showed how to implement a simple and pervasive computation — a counter that iterates through $latex 2^n$ different states using $latex O(n)$ different species — while consuming only $latex O(n^3)$ total fuel molecules (and producing the same amount of total waste molecules). A naïve implementation would consume fuel at every step, using $latex \Omega(2^n)$ fuel.

However, their counter requires that certain species have exactly one molecule present in solution, a tall order to implement experimentally. A more robust counter would work even with many copies of each species present, i.e., if many counters were thrown in together, they would each independently iterate from $latex 1$ to $latex 2^n$, without interfering with each other.

My favorite theorem in the paper shows this task to be impossible.4 In particular, they show a contrapositive result: any chemical reaction system (not just those implemented by DNA strand displacement) with $latex n$ species that is tolerant to having many copies of the system all reacting at once, has the property that any species is producible after $latex O(n^2)$ steps. In other words, if there is some species $latex S_{\text{end}}$ whose presence signifies the “end” of computation, there is no way to deterministically visit more than quadratically many states that do not contain a copy of $latex S_{\text{end}}$.

As an example, if we wanted to implement a chemical system simulating an $latex O(n^3)$-time Turing machine with only $latex O(n)$ species,5 it could not possibly work unless some species are present in small quantities; i.e., multiple copies of the system would provably interfere with each other if placed in the same test tube.

While this is not a complexity theory result (telling us nothing about the relationship between $latex \mathsf{P}$ and $latex \mathsf{NP}$, for instance), nor did they use any classical complexity theorems such as the time hierarchy theorem, nonetheless, only a complexity theorist would even think to conjecture such a statement about chemistry. This is why TCS is often needed to study molecular systems.

In their variant of the problem, the rectangle grows “rectilinearly” from an L-shaped “seed”, where all tiles attach via their west and south glues, and both glues must match for them to attach. The authors use heuristics combined with an exponential-time branch-and-bound search algorithm to find small (not necessarily minimal) tile sets.

They also analyze the reliability of the tiles in the face of errors (tiles attaching by only a single matching glue at some small rate $latex \varepsilon$), finding that the tile sets their algorithm produces become more reliable on average, the longer the algorithm runs before manual termination.

In one well-characterized variation of this problem, the input is a shape $latex S$ rather than a coloring, and the question is what is the smallest tile set that is guaranteed to place tiles on exactly the points in $latex S$. If we require only one terminal assembly, the problem is $latex \mathsf{NP}$-complete (see here). If we allow multiple terminal assemblies, but require that they all have the shape $latex S$, then the problem is $latex \mathsf{NP^{NP}}$-complete (see here).

I strongly suspect that the pattern version of the problem is $latex \mathsf{NP}$-complete (and variants of it, say, if the tiles grow from a single seed tile, or if they are merely required to stay inside the $latex m \times n$ rectangle but do not have to fill the whole rectangle). However, the main technique for the hardness results on shapes crucially use the fact that optimal tile sets for tree shapes are very well-characterized (and can be computed in polynomial time, see here). These techniques do not seem to work at all with patterns. I would be very excited by any progress on hardness results for this question.

Cooperative binding in tile assembly refers to the requirement that a tile with two strength-1 glues cannot attach to an assembly unless both glues match the assembly. So-called “temperature 1” self-assembly models the situation in which all individual glues have sufficient strength to bind tiles stably, so that cooperative binding cannot be enforced. We conjectured that universal computation (e.g., the ability to simulate a Turing machine) in self-assembly requires cooperative binding. This is known to be false in 3D, but the proof crucially uses the third dimension to allow tiles to “escape” a closed region in one plane by growing into the adjacent plane. In a planar self-assembling system, deterministic computation seems very difficult to do, but proving its impossibility is an open problem.

Matt, Robbie, and Scott show that temperature 1 universal computation is possible if we introduce negative glues. Specifically, they need only introduce one single type of negative glue, so we could imagine it being implemented, for instance, by magnets that repulse any other copy of the glue.6 Essentially, the negative glue is put in place where cooperation is desired in advance of any neighboring positive-strength glues, guaranteeing that by the time any tile could bind, it must bind to 2 positive strength glues to overcome the repulsive force of the negative glue already present. With this cooperation comes universal computation, the ability to assemble large structures (e.g., $latex n \times n$ squares) from a small ($latex \frac{\log n}{\log \log n}$) number of tile types, and other hallmarks of the computational power of cooperative binding.

But the original question stands: what is the computational power of deterministic, planar, positive-strength, temperature 1 self-assembly? Is cooperative binding truly necessary to compute by planar self-assembly?

Future DNA Conferences

Next year, the DNA conference will be at Aarhus University in Denmark, hosted by Kurt Gothelf. The following year, it will be held at Arizona State University, hosted by Hao Yan. To avoid instances of heat stroke in Tempe, Arizona, the conference will likely be later in the fall, but that is yet to be determined. I hope to see you there!

Footnotes

1
Jongmin’s web presence, like many experimentalists, is minimal.

2
Erik and Mashhood are not students. Since almost all experimental papers have the lab PI as last author, to avoid automatically excluding students in experimental labs, the DNA conference allows the best student paper award to go to papers with non-student authors, as long as a student is the main author and the PI writes a letter of support stating this.

3
i.e., you can write down a list of chemical reactions such as $latex A + B \to C + D, C + X \to C + Y, \ldots$, and you can give them as input to a compiler that will output a list of DNA complexes, some of which correspond to the abstract chemical species $latex A,B,C,X,Y,\ldots$, and the dynamic evolution of the DNA concentrations will mimic that described by the abstract reactions.

5
If the Turing machine uses linear space, this is easy to implement if single-copy molecules are allowed, by having a constant number of species for each tape cell to represent its symbol and, if the tape head is there, the current state.

6
There has been some work (here and here) attaching magnets to DNA, so this is not an infeasible idea.

2 Comments

Great post Dave. I especially wanted to say that as a first-time DNA attendee, I was sort of shocked by how `thoroughly-mixed’ the community is; the barrier to talking to biologists and chemists as a computer scientist was zero.

Thanks again to the organizers, and for doing the (unconventional?) poster sessions and impromptu sessions, which I thought turned out to be valuable.

Musatov did some International work similar in scope (very unconventional) and somewhat controversial. See: Y number of lines in a compound byte of data is 5.426966292134831.
MM’s Constant.
The number of strings per page is 15.0705882352941176.
The alternation is 17.30555555555556.
The number a string begins with is 0.
The number 5 appears one time in each string.
The number a string ends with is 5.
The number of the font size is 9.
The number of pages in a complete set is 11.
The number of type two strings per page is 14.
The number of white spaces to follow a type one string is 38.
The number of lines present per page is 42.
The number of type one strings in a set is 83. -tracershttp://www.scottaaronson.com/blog/?p=1170#comments-1173
The number of white spaces to follow a complete type two string is 83.
The number of type two strings in a complete set is 83.
The number of characters per line is 89.
The number of characters in string one is 216.
The number of characters in string two is 267.
The number of lines in a complete set is 462.
The number of characters in a compound byte of data is 483.
The number of characters per page is 3,738.
The number of characters in a complete set of data is 41,118.