Neither models nor miracles: a look at synthetic biology

Ars takes an in-depth look at the field of synthetic biology—the computer …

The 20th century broke open both the atom and the human genome. Physics deftly imposed mathematical order on the upwelling of particles. Now, in the 21st century, systems biology aims to fit equations to living matter, creating mathematical models that promise new insight into disease and cures. But, after a decade of effort and growth in computing power, models of cells and organs remain crude. Researchers are retreating from complexity towards simpler systems. And, perversely, ever-expanding data are making models more complicated instead of accurate. To an extent, systems biology, rather than climbing upwards to sparkling mathematical vistas, is stuck in a mire of its own deepening details.

Synthetic biology does away with systems biology's untidiness by focusing on individual parts, creating a tool set for engineering organisms unconstrained by biology as we know it, making the discipline more like software programming. But instead of modularity, synthetic biology often encounters messiness. What a particular “part” actually does depends on the rest of the system, so synthetic biology rediscovers the complexity it hoped to escape.

A dream deferred

Systems biology takes over where the sequencing of the human genome left off. Reading DNA, we once believed, would disclose the genes underlying disease. While hundreds of genes have been implicated in cancer, for instance, the idea that genes directly cause complex diseases like cancer, diabetes, or atherosclerosis is too simple. Systems biology seeks to understand how biology really works by looking beyond genes, aspiring to a universal understanding of biological organization and to models that precisely predict biological events—like disease.

Focused use of systems biology has been very successful, drawing information out of the huge bodies of data produced by DNA chips and proteomic gels. But our growing genomic knowledge and powerful computers have tempted a number of researchers to try to skip the analysis of data produced using biology, and attempt to produce a realistic cell entirely in silico.

These efforts began with a cell related to those that make up humans, but much simpler: yeast. The first yeast model appeared in 2001, created by the Institute for Systems Biology. In 2002, the Alpha Project began its trek toward omega, with yeast also as the first step. 2004 saw the unveiling of another model, YeastNet, which researchers expected would become more and more accurate "simply by continued addition of functional genomics data..."

Genomics data cooperated, propagating with exponential fervor. Data on proteins obligingly followed a similar path, according to a Moore’s law for proteomics. Consequently, a 2007 update to YeastNet encompassed 82 percent of yeast genes and more than 95 percent of the yeast proteome. But it showed no real gain in accuracy.

Edward Marcotte, who leads the effort at the University of Texas, said in March that his team had updated YeastNet. He claimed to see "a nice increase in predictive power over v. 2," however he has "yet to write it up or release it," suggesting that, even with improvements, the results might not be so prepossessing. YeastNet’s publication arc has tailed down, beginning from the commanding heights of Science, descending in version two to the egalitarian plain of PLos ONE and now apparently a file drawer at UT Austin. Similar difficulties beset the Alpha Project, which repeatedly scaled back its ambitions until it winked out of existence in 2008.

Premonitions of yeast’s unruliness came sooner in some quarters. In 2004, the Institute for Systems Biology turned to a much simpler species it believed to be "better suited for the initial phase" of systems biology. The new organism, a bacteria called H. salinarum, had just 2,400 genes, compared to over 6,000 in yeast. A group at the European Molecular Biology Lab switched in 2009 to the even smaller M. pneumoniae, which has a mere 687 genes. As "-omics" data grew, the complexity of organisms modeled by systems biology dropped.

Running ≠ Hiding

But retreat from complexity is not escape. Looking at M. pneumoniae, scientists concluded that "there is no such a thing as a 'simple' bacterium." The tiny organism's modest genetic machinery proved baffling, causing a mismatch between what the genomic model predicted and the proteomic reality. Mum about its own inner workings, neither did M. pneumoniae disclose any fundamental principles of biology which researchers said "remain elusive…"

The outpouring of data has failed to coalesce into a solid theoretical foundation from which to build. As Anne-Claude Gavin, senior author on the M. pneumoniae paper, said: "I believe what we still miss in the majority of the cases is a structuring frame on which to integrate or superimpose the large datasets gathered..."

Of course the search continues, but each iteration adds to the difficulties. For example, to better understand the swirling protein complexes seen through the window of proteomics, we could zoom in and supply any number of biophysical details, like how quickly interactions take place, for instance. But this adds layers to the model. And more complicated models are less likely to work or to reveal simple, elegant principles of biology.

Yet the cycle almost necessarily continues. As two researchers put it in the pages of Nature: "The inescapable reality in systems biology is that models… will continue to grow in size, complexity, and scope."

A growth industry

There’s not really an upper limit on model size, either. Because living organisms progress through time which can be sliced to arbitrary thinness, data space is effectively infinite. Time presents serious difficulties for systems biology. To create a virtual physiologic human, the discipline would need to span 17 or more orders of magnitude, from the nanoseconds of molecular motion all the way up to the years and decades of human life spans.

The number of spatial scales too is daunting, from nanometers to meters, at least nine orders of magnitude. And as we’ve studied this enormous biological time-space in more detail, it’s produced a profusion of discoveries, the many new kinds of RNA, for instance. Estimates for the total number of molecular species in a human cell range as high as one million.

"There are so many unknowns that it seems we are condemned to spend many years collecting data before we can even start to think about modelling what is going on," as Mike Williamson at the University of Sheffield put it. In the meantime, concluded Williamson, "it is only reasonable to expect that the model can predict something that it was designed to predict…"

That’s a rather large concession. Lee Hood, whose lab at Caltech invented the DNA sequencer, once envisioned predicting the behavior of a system "given any perturbation. Not just the ones you’ve seen before, which we’re really good at, right? But any perturbation." That was in 2003, shortly after Hood founded the Institute for Systems Biology.

47 Reader Comments

I just registered a profile here in Ars to congratulate Robert on an excellent article. I am a researcher associated with systems and synthetic biology and I thorougly enjoyed this poignant analysis of two promising fields that have been getting a reality check in the last few years.

First, and most importantly, I'm glad to see such a thoughtful analysis of the progress of systems and synthetic biology. This is a good thing, and I hope to see more of it.

That being said, I have to disagree with the overall representation of these fields. These have been tremendous growth fields, and their capacity has increased simply remarkably in the relatively few years that the fields have existed in their current forms. This isn't to say that they've solved all of their goals--obviously, they haven't, but neither has any active research field. But the achievements have really been quite remarkable, and they show that many of the goals laid out, even the large ones, are potentially feasible.

I'll just give two counterpoints to those discussed in the article, but I think they serve to make the point:

(1) regarding the lack of increased complexity of designed circuits.

Perhaps the growth in the numbers of promoters in designed circuits isn't tremendous (it's apparently stalled at 6 promoters in the plot above), but the _capacity_ of the field to make more complex circuits has increased far, far beyond that. Just to name two recent examples, consider Venter's synthesis of an entire bacterial genome, and Boeke's syntheses of entire yeast chromosomes (still to be published). For that matter, Ryan Gill's lab just reported they could engineer 8,000 E. coli promoters in a single experiment (http://www.nature.com/nbt/journal/v28/n8/abs/nbt.1653.html), which obviously blows the so-called 6 promoter limit of designed circuits out of the water. So, just because people aren't explicitly constructing large circuits (this too will change very soon) doesn't mean the field has not grown dramatically in this department. The hard problems in this field are now shifting from construction and synthesis to design. This is an important conceptual shift and marks real progress.

(2) regarding yeast's unruliness and being forced back to simpler systems.

Again, this belies simply tremendous progress in the field. Since YeastNet was mentioned, I'll just say there's simply no point reporting every new incremental version of such networks. However, this doesn't mean that they aren't getting much better, but the emphasis changes from the initial big concept (Science paper) to making new versions available (PLoS One or annual NAR database paper) and actually applying them to advance biology (many downstream application papers).

For example, while we published YeastNet v. 2 in PLoS One (incidentally, just as an aside regarding the "egalitarian plain of PLoS One", I like the journal, and while the spread of article quality is large, they publish quite a few nice articles that would have been equally at home in Science, especially in fields like paleontology. PLoS One has the notable advantage of being able to publish results remarkably quickly when circumstances warrant), the results of applying the network to new discoveries have since come out in at least 6 other journals, e.g. testing ~100 predicted new ribosome biogenesis proteins, a large fraction of which we could successfully validate (http://dx.doi.org/10.1371/journal.pbio.1000213). It's not the construction of big gene networks that is so interesting now (we know we can do this), it's their application. Again, this is an important conceptual shift and marks important progress.

Researchers like Nitin Baliga aren't "deferring their dreams", they're figuring out how to build good computer models in systems simple enough to understand, then advancing the models into more complex systems. For example, in our case, we followed up YeastNet with networks for worms (http://www.nature.com/ng/journal/v40/n2/abs/ng.2007.70.html) and plants (http://www.nature.com/nbt/journal/v28/n2/abs/nbt.1603.html). These systems are obviously far more complex than yeast, and it wasn't obvious that these sorts of computer models could really predict highly specific traits in multicellular organisms. Nonetheless, they did, and we successfully predicted genes to e.g. reverse model tumors and increase lifespan in worms and control root and pigment formation in plants.

The short message--people aren't shying from using these approaches in increasingly complex systems, including humans, and the models are showing real world utility. And they keep getting better.

So, in my opinion, these fields have actually already fulfilled much of their early promise and then some (in spite of the hype). The fun part now, again IMHO, is the impending collision of these two sister fields, with the systems biology approaches now good enough to find and model many players in important biological pathways, and the synthetic biology approaches now powerful enough to make and modify many pathways at genome-scale. That's going to be really exciting...

It's like I always said: biology is just like code. The problem with the analogy people normally take with code is they think of biology as GOOD code. Code that is modular (not spaghetti), readable (has good names) and basically understandable (simple elegant logic). But what this analogy misses is that the more stupid and incompetent the programmer the less these things hold until at one point it is no longer worth it to fix bugs in code instead of scrapping it altogether and starting from scratch. The thing with biology is that the programmer is natural selection. A blind agent with absolutely no intelligence whatsoever. Any code written by natural selection, almost by definition, has to be horrible. And this is exactly what we're finding.

Excellent... my thoughts exactly. In fact not being traceable or easily understandable is ALSO a property of more complex genetic-algorithm generated computer programs and circuit designs.

Anything that works sticks... so it's pure spaghetti. "It's" primitives are ANY physical phenomena and structures that happen to produce more desired results. Our primitives are variables, methods and abstracted representations of ideas and objects. Biology has no such limitations, and even if it did work using such primitives, the "code" would be utter and complete spaghetti. Anything that works sticks.

Another problem is that the DNA is not exactly like software at all, as it interacts with a particular environment in a particular way to produce the results(as the article mentioned). There are things that are inherited not because of DNA but simply features of the cytoplasm that are passed on during division. So putting a plant genome into a human zygote will not produce a plant embryo and vice-versa.

The blind-idiot designer that is evolution (Azathoth as God? ) also helps to explain why the human brain is so hard to understand.

So what are our options?

1- Go with evolutionary approaches in-silico or IRL and forget about too much "intelligent engineering"2- Try to understand things like the brain or how a cell works (what if we are not able to?)3- Go with 1 until we invent AIs that can help us with 2.

A blind agent with absolutely no intelligence whatsoever. Any code written by natural selection, almost by definition, has to be horrible. And this is exactly what we're finding.

I don't necessarily buy that.

It's horrible from the perspective of a rational agent trying to comprehend/change it. But from a functional perspective? It's quite elegant in places.

Biology wastes very little; it can't afford to, what with all that competition out there. Human beings have fairly few active genes, but the amino acid sequences used by these genes are used in such a staggering diversity of places and ways that we don't need a lot of genes.

Indeed, I think of it not as the stereotypical bad code, where a team has been hacking away at a codebase for decades, with the original developers long since removed, and nobody knows how it works anymore. Instead, I think of it more as the pinnacle of hyper-efficient programming. You know the kind, where people optimize code just to get a meager 0.001% efficiency increase. The kind where people use oddball programming logic and global variables galore just so that the executable takes up slightly less memory. Etc.

It's good code from the point of view of efficiency (sometimes), it's bad code from the point of view of understandability and maintainability (by humans). Modularity and structure go right out the Window. You get massive coupling, genes involved in many physiological phenomena and single traits that huge numbers of genes "participate" in.

Firstly, thanks for the timely article, I'm a computational biophysicist, we work a few layers more detailed and more limited in scope than the systems people but it has always been a subject of great interest to me.

WesGordon wrote:

I can relate to the scale of the problem biologists face [sic]. . . .I first met this conflict when I briefly worked with a firm designing electronic circuitry. The behaviour of the circuit could be described and simulated in very concrete, rational terms. Problems could be solved by relatively tiny algorithms that simultaneously weighed up all combinations and permutations of possible design solutions (although sometimes consuming huge amounts of RAM). Those who had worked in this field all their lives could not conceive of a problem that couldn't be reduced to an algorithm. It strikes me that this kind of thinking started biologists down this track, and I'm beginning to believe the attempt is misguided. This article confirms my concerns.

I wanted to make a subtle point here that biology culture is actually the opposite of electrical engineering, it's top down. Anyone who has worked on a living system (even single-celled ones) is resigned to the fact that you can never truly change only one thing at a time and oversimplification is the quickest path way to irrelevance. You are always hip-deep in noisy signals, crosstalk, and contradictory lines of evidence. The few that are able to fish out some new universal underlying principle that actually explains reams of confusing results are the ones that get free trips to Stockholm. Biologists are under no preconceptions that they are working on intrinsically reducible problems that can be black-boxed, even if they secretly hope it will turn out that way. Context always matters, and sometimes just coming up with an appropriate set of control experiments is actually more challenging than the "interesting" experiment itself.

So it's probably fair to say that when the funding agencies started throwing big money at people who are experts at modeling complex (but reducible!) systems for a living (i.e. engineers and mathematicians) and charged them with the task of making sense of the mountains of biological data that is being accumulated - these projects were not exactly situated for success from the get-go. I have hope yet that there might be some diamonds in the rough that these approaches might prove fruitful for, but I think what everyone secretly fears is that the reams of data produced by consortium "-omics" projects are simply not good enough for these everything-but-the-kitchen-sink models to succeed.

As for the concluding sentence which implies biology does not have a "Bohr" model with which to build up from, I have to disagree. We have plenty of models here at the bottom, as it were, from Monod's theory of allostery (the Lac operon), not to mention all of the beautiful work done on Hemoglobin allostery (Wyman/Gill & others) back in the 60's. What we lack is any promise of a new framework that will bridge the gap between test-tube biochemistry and the real, noisy, interacting systems in a cell just as quantum mechanics bridged the dual particles and wavelike nature of matter. And that's the missing link that is needed before the systems-type approaches will really flourish.