Archive for October, 2011

The 17th Conference on DNA Computing and Molecular Programming (DNA 17) was held Sept. 19-23 in Pasadena, California, at Caltech. In previous years the conference was held in early summer, but from now on it will be late summer/early fall in order to stagger 6 months apart from its sister conference FNANO (Foundations of Nanoscience), held every April in Snowbird, Utah.

The conference is not dedicated to theoretical computer science, of course, but like many inter-disciplinary fields such as algorithmic game theory or computational biology, theoretical computer science finds its way into many results in the field. As Luca Cardelli said during the conference, while the computing revolution was about the systematic manipulation of information, nanoscience is about the systematic manipulation of matter, so it is not surprising that theoretical computer scientists are finding interesting problems in this area. I find it fascinating to watch a speaker prove a result relating DNA self-assembly to context-free grammars, just before the next speaker shows atomic force microscopy images of a self-assembled DNA nanostructure.

There’s always some impedance mismatch when experimentalists and theorists in any field get together to talk, but I believe our field promotes excellent cross-communication. The program committee, for instance, had members from the following university departments:

Biological Chemistry and Molecular Pharmacology (1)

Computer Science (17)

Biophysics (1)

Chemistry (4)

Mathematics (1)

Electrical Engineering (3)

Bioengineering (3)

Bioinformatics (4)

Physics (2)

Computation & Neural Systems (1)

Cognitive Science (1)

Systems Biology (1)

Interesting Conference Features

First I want to discuss some interesting features of the conference that I think could be beneficially adopted by general TCS conferences (some of these we are already seeing in TCS). As one of the local organizers, I was partially responsible for implementing some of these ideas, and I think it was worth the effort.

Tutorials

The first day was dedicated to three 90-minute lecture-style tutorials (slides available), and in parallel, there was an all-day wet-lab tutorial run by Elisa Franco, Josh Bishop, and Jongmin Kim,1 in which the students constructed a chemical oscillator based on Jongmin’s and Elisa’s work on constructing chemical oscillators from DNA and transcription enzymes. Most of the tutorial attendees were theoreticians who wanted to see what all the fuss was about in the lab. It seems that about half of the oscillators worked properly on the first try. (They only got one try because the period of the oscillation is a few hours, so they had to run overnight.)

Tracks

One interesting aspect of the conference is the tracks, designed to appeal to both theoretical and experimental researchers. Track A looks familiar to TCS people: 15-page extended abstracts that appear in the conference’s LNCS proceedings. These are usually later submitted to CS journals such as SICOMP or TCS, or perhaps in the special issue of invited papers in Natural Computing associated with the DNA conference. Track B submissions are 1-page abstracts submitted for oral presentation only. Authors must provide a full paper for the program committee to judge, but the paper is not published. This is because Track B submissions are experimental results destined for eventual publication in physical science journals such as Nature, Science, or PNAS. These journals have much stricter requirements than CS journals regarding prior publication, so it is critical to the Track B presenters that nothing they submit can be construed as a publication.

Track C is posters, which are very common at physical science conferences and starting to make some headway at TCS conferences. I think posters are a great way to present your research, and I think the TCS community should adopt poster sessions at every conference. Maybe the person you most wanted to see your talk won’t attend it, but you can always grab them in the hallway and drag them over to your poster. It’s a great way to meet big shots in the field. There were three 90-minute poster sessions, with every poster at every session, and we encouraged the presenters to keep their posters up the whole week. This way, you could stand by your poster for a while, but you could also feel free to walk around to other posters without worrying that someone won’t get a chance to hear you explain your poster while you are away.

Panels

The panels consisted of four top researchers sitting at a table. Each gave a 5 minute talk about their vision for the future of the field, and then the audience could ask questions or heckle them for the next 25 minutes. These were a lot of fun. I think students especially benefited from the perspective given by high-level discussion of long-term research goals.

Impromptu Sessions

The impromptu sessions were a great idea, and I think all conferences could benefit from them. I think of them as a formalization of the idea that “the real conference interaction happens in the hallways” (as Lance Fortnow likestoremindus.) Often graduate students are intimidated by the idea of walking up to a couple of famous big-wigs talking in the hallway, even if they are talking about the student’s research area. For the impromptu sessions, there was a wiki where over the course of the week, anyone could schedule a session on any topic in a number of rooms that were reserved for the sessions. The sessions were required to be public, and I found it to be a great way for people to get together to chat about interesting problems, while inviting anyone else interested in the same problem to listen in or participate.

Theoretical Computer Science Results

I will highlight a few theoretical results that I found interesting. There were of course many great experimental results, and a lot of great CS talks on topics such as simulation, but since this is a TCS blog, I will focus on my favorite TCS-style results.

Self-Assembly and Context-Free Grammars

The winners of the best student paper award were Andrew Winslow and Sarah Eisenstat, for their excellent paper One-Dimensional Staged Self-Assembly, with Erik Demaine and Mashhood Ishaque.2 Fix a finite alphabet $latex \Sigma$ and a finite set $latex G$ of “glues”. A tile type is a square labeled with a symbol from $latex \Sigma$, with its east and west sides labeled with (different) glues from $latex G$ (such tiles, both 1D and 2D, can be experimentally implemented with DNA). Initially all tile types start in separate test tubes. When two test tubes are mixed, any tile can bind to the west of any other tile if the first tile’s east glue matches the second tile’s west glue. Subsequent mixing may bind whole rows of tiles together. After each mixing, it is assumed that individual tiles are washed away so that only terminal assemblies (assemblies that cannot attach to anything else in the tube) remain.

The goal: design a fixed set of tile types so that any string over $latex \Sigma$ can be “spelled” by efficiently mixing the tiles in the correct order. How efficiently? The authors show that if each intermediate test tube is required to contain only one terminal assembly, then the number of mixing stages required to spell the string $latex x$ is within a constant multiplicative factor of the smallest context-free grammar that produces the singleton language $latex \{x\}$ (and they show that this bound is tight).

What does this mean? There is a linear-time $latex O(\log n)$-approximation algorithm for finding the smallest context-free grammar representing a string in this way (due to Sakamoto), which automatically translates to a linear-time algorithm for finding efficient mixing protocols for self-assembling one-dimensional patterns (implemented by the authors; here is an efficient mixing to spell the final verse of Edgar Allen Poe’s “The Raven” with DNA tiles).

However, if intermediate mixing stages are allowed to contain multiple terminal assemblies, even though the final stage is required to have only one terminal assembly (the assembly spelling $latex x$), then the number of mixing stages can be dramatically reduced (by a multiplicative factor of at least $latex \frac{n}{\log n}$).

Anne, Alan, Jan, and Chris showed how to implement a simple and pervasive computation — a counter that iterates through $latex 2^n$ different states using $latex O(n)$ different species — while consuming only $latex O(n^3)$ total fuel molecules (and producing the same amount of total waste molecules). A naïve implementation would consume fuel at every step, using $latex \Omega(2^n)$ fuel.

However, their counter requires that certain species have exactly one molecule present in solution, a tall order to implement experimentally. A more robust counter would work even with many copies of each species present, i.e., if many counters were thrown in together, they would each independently iterate from $latex 1$ to $latex 2^n$, without interfering with each other.

My favorite theorem in the paper shows this task to be impossible.4 In particular, they show a contrapositive result: any chemical reaction system (not just those implemented by DNA strand displacement) with $latex n$ species that is tolerant to having many copies of the system all reacting at once, has the property that any species is producible after $latex O(n^2)$ steps. In other words, if there is some species $latex S_{\text{end}}$ whose presence signifies the “end” of computation, there is no way to deterministically visit more than quadratically many states that do not contain a copy of $latex S_{\text{end}}$.

As an example, if we wanted to implement a chemical system simulating an $latex O(n^3)$-time Turing machine with only $latex O(n)$ species,5 it could not possibly work unless some species are present in small quantities; i.e., multiple copies of the system would provably interfere with each other if placed in the same test tube.

While this is not a complexity theory result (telling us nothing about the relationship between $latex \mathsf{P}$ and $latex \mathsf{NP}$, for instance), nor did they use any classical complexity theorems such as the time hierarchy theorem, nonetheless, only a complexity theorist would even think to conjecture such a statement about chemistry. This is why TCS is often needed to study molecular systems.

In their variant of the problem, the rectangle grows “rectilinearly” from an L-shaped “seed”, where all tiles attach via their west and south glues, and both glues must match for them to attach. The authors use heuristics combined with an exponential-time branch-and-bound search algorithm to find small (not necessarily minimal) tile sets.

They also analyze the reliability of the tiles in the face of errors (tiles attaching by only a single matching glue at some small rate $latex \varepsilon$), finding that the tile sets their algorithm produces become more reliable on average, the longer the algorithm runs before manual termination.

In one well-characterized variation of this problem, the input is a shape $latex S$ rather than a coloring, and the question is what is the smallest tile set that is guaranteed to place tiles on exactly the points in $latex S$. If we require only one terminal assembly, the problem is $latex \mathsf{NP}$-complete (see here). If we allow multiple terminal assemblies, but require that they all have the shape $latex S$, then the problem is $latex \mathsf{NP^{NP}}$-complete (see here).

I strongly suspect that the pattern version of the problem is $latex \mathsf{NP}$-complete (and variants of it, say, if the tiles grow from a single seed tile, or if they are merely required to stay inside the $latex m \times n$ rectangle but do not have to fill the whole rectangle). However, the main technique for the hardness results on shapes crucially use the fact that optimal tile sets for tree shapes are very well-characterized (and can be computed in polynomial time, see here). These techniques do not seem to work at all with patterns. I would be very excited by any progress on hardness results for this question.

Cooperative binding in tile assembly refers to the requirement that a tile with two strength-1 glues cannot attach to an assembly unless both glues match the assembly. So-called “temperature 1” self-assembly models the situation in which all individual glues have sufficient strength to bind tiles stably, so that cooperative binding cannot be enforced. We conjectured that universal computation (e.g., the ability to simulate a Turing machine) in self-assembly requires cooperative binding. This is known to be false in 3D, but the proof crucially uses the third dimension to allow tiles to “escape” a closed region in one plane by growing into the adjacent plane. In a planar self-assembling system, deterministic computation seems very difficult to do, but proving its impossibility is an open problem.

Matt, Robbie, and Scott show that temperature 1 universal computation is possible if we introduce negative glues. Specifically, they need only introduce one single type of negative glue, so we could imagine it being implemented, for instance, by magnets that repulse any other copy of the glue.6 Essentially, the negative glue is put in place where cooperation is desired in advance of any neighboring positive-strength glues, guaranteeing that by the time any tile could bind, it must bind to 2 positive strength glues to overcome the repulsive force of the negative glue already present. With this cooperation comes universal computation, the ability to assemble large structures (e.g., $latex n \times n$ squares) from a small ($latex \frac{\log n}{\log \log n}$) number of tile types, and other hallmarks of the computational power of cooperative binding.

But the original question stands: what is the computational power of deterministic, planar, positive-strength, temperature 1 self-assembly? Is cooperative binding truly necessary to compute by planar self-assembly?

Future DNA Conferences

Next year, the DNA conference will be at Aarhus University in Denmark, hosted by Kurt Gothelf. The following year, it will be held at Arizona State University, hosted by Hao Yan. To avoid instances of heat stroke in Tempe, Arizona, the conference will likely be later in the fall, but that is yet to be determined. I hope to see you there!

Footnotes

1
Jongmin’s web presence, like many experimentalists, is minimal.

2
Erik and Mashhood are not students. Since almost all experimental papers have the lab PI as last author, to avoid automatically excluding students in experimental labs, the DNA conference allows the best student paper award to go to papers with non-student authors, as long as a student is the main author and the PI writes a letter of support stating this.

3
i.e., you can write down a list of chemical reactions such as $latex A + B \to C + D, C + X \to C + Y, \ldots$, and you can give them as input to a compiler that will output a list of DNA complexes, some of which correspond to the abstract chemical species $latex A,B,C,X,Y,\ldots$, and the dynamic evolution of the DNA concentrations will mimic that described by the abstract reactions.

5
If the Turing machine uses linear space, this is easy to implement if single-copy molecules are allowed, by having a constant number of species for each tape cell to represent its symbol and, if the tape head is there, the current state.

6
There has been some work (here and here) attaching magnets to DNA, so this is not an infeasible idea.

I saw a nice result sketched a few weeks ago, on the “online matching” problem. Below I try to re-explain the result, using some idiomatic (and as a bonus, inoffensive) terminology which I find makes it easier to remember what’s going on.

Online Matching. There is a set of items, call them Lunches, which you want to get eaten. There is a set of people who will arrive in a sequence, call them Diners. Each Diner is willing to eat certain Lunches but not others. Once a Diner shows up, you can give them any remaining Lunch they like, but that Lunch cannot be re-sold again. The prototypical problem is that a Diner might show up but you already sold all of the Lunches that they like. Your task: find a (randomized) strategy to maximize the (expected) number of Lunches sold, relative to the maximum Lunch-Diner matching size.

This is an online problem: you don’t have all the information about Diner preferences before the algorithm begins, and you need to start committing to decisions before you know everyone’s preferences in full.

Consider the following strategy:

Algorithm: pick a random ordering/permutation of the lunches; then when a diner arrives, give her the first available lunch in the ordering that she likes.

How does this perform? In expectation at least (1-1/e)M lunches are sold, where M is the maximum (offline) matching size in the Lunch-Diner graph. But proving so is tricky. (This ratio 1-1/e turns out to be the best possible.) Birnbaum and Mathieu recently gave a simpler proof that this ratio is achieved. Amin Saberi, whose talk sketched this, rightly pointed out that it feels like a “proof from The Book!”

With a small lemma (e.g. Lemma 2 in Birnbaum-Mathieu), we may assume that the number of diners and lunches are equal, and that there is a perfect matching between diners and lunches. Let the number of diners and lunches each be n, and so M=n.

Let xt be the probability that in a random lunch permutation, upon running Algorithm, the lunch in position t is eaten.

Experiment 1. Take a random lunch permutation. Score 1 if the lunch at position t is not eaten, 0 if it is eaten. The expected value of this experiment is 1-xt.

Experiment 2. Take a random lunch permutation, and a random diner. Score 1 if that diner eats a lunch in one of the first t positions, 0 otherwise. The expected value of this experiment is (x1 + … + xt)/n.

The key is to show that the second experiment has expected value greater than or equal to the first one, then we will finish up by an easy calculation. We do this with a joint experiment, similar to “coupling” arguments.

Joint experiment. Let t be fixed. Take a random lunch permutation π1, and look at the lunch L in position t. Let D be matched to L in the perfect matching. Obtain π2 from π1 by removing L and re-inserting L in a random position.

Key claim: π2 and D are independent, uniformly distributed random variables. This implies that we can run both experiments at the same time using this joint distribution on π1, π2, L, D. Moreover,

Deterministic Lemma. For any π1, and for an uneaten lunch L at position t in π1, obtain π2 from π1 by removing L and re-inserting L in any position. Let D be matched to L in the perfect matching. Then in π2, D eats one of the first t lunches.

This implies that whenever experiment 1 scores a point, so does experiment 2. So taking expectations,

(x1 + … + xt)/n ≥ 1-xt.

This equation is the crux. Let x≤t = x1+…+xt, then we see x≤t ≥ (n/n+1)(1+x≤t-1). This recursion is easy to unravel and it gives a lower bound of n(1-(n/n+1)n) ≥ n(1-1/e) on x≤n, the expected size of the matching!

There is a new page to this blog: Theoretical Computer Science Conferences and Workshops. It provides names and links to theory venues — with theory broadly defined, including, for example, the theory of programming languages, which is usually considered outside the SIGACT purview (EDIT: see comments). This page is the result of dozens of contributions to a community wiki question on CSTheory, and Joe and I would like to thank everyone who participated.

Those interested in lists of TCS venues may also want to check out Microsoft Research’s list of computer science conferences. That list considers all of CS, not just theory, and provides some added information, like citations per conference. The list on this blog contains some conferences that the MR list does not.

If you see errors, or can think of useful additions, please comment here or on the new page, or edit the community wiki answer that contains the conference information. Thanks, and we hope you find this helpful.