Skepticism

EVENTS

Why would anyone want a complete simulation, anyway?

The NY Times is touting a computer simulation of Mycoplasma genitalium, the proud possesor of the simplest known genome. It’s a rather weird article because of the combination of hype, peculiar emphases, and cluelessness about what a simulation entails, and it bugged me.

It is not a complete simulation — I don’t even know what that means. What it is is a sufficiently complex model of a real cell that it can uncover unexpected interactions between components of the genome, and that is a fine and useful thing. But as always, the first thing you should discuss in a model is the caveats and limitations, and this article does no such thing.

I’d like to know how fine-grained the model is; I get the impression it’s an approximation of interactions between molecular components based on empirically determined properties of those elements. Again, I don’t think the authors have claimed otherwise, but it’s implied by the NY Times that now we have an electronic simulation that we can plug variables into and get cures for cancer and Alzheimer’s, without ever having to dirty our hands with real cells and animals anymore.

That’s nonsense. Everything in this model has to be a product of analyses of molecules from living organisms; they certainly aren’t deriving the functions and interactions of individual proteins from sequence data and first principles. We can’t do that yet! The utility of a model like this is that it might be able to generate hypotheses: upregulating gene A leads to downregulation of gene Z, a gene distantly removed from A, in the model, and therefore we get a preliminary clue about indirect ways to modulate genes of interest. The next necessary step would be to test potential drug agents in real, living cells. This model will have a huge mountain of assumptions built into it — and you can only build further on those speculations so far before it is necessary to cross-check against reality.

Also, isn’t it a bit of a leap to jump from a single-celled, parasitic organism like M. genitalium to human cancers and brain disease? Yet there it is in the second paragraph, a great big bold exaggeration.

And then there’s the really weird stuff. Some people need to step back and learn some biology.

“Right now, running a simulation for a single cell to divide only one time takes around 10 hours and generates half a gigabyte of data,” Dr. Covert wrote. “I find this fact completely fascinating, because I don’t know that anyone has ever asked how much data a living thing truly holds. We often think of the DNA as the storage medium, but clearly there is more to it than that.”

What the hell…? Look, I could (if I had the skills) generate an hourglass simulator that calculated the shape and bounciness and stickiness of every grain of sand, and stored the trajectory of each as they fell, and by storing enough data for each grain, generate even more than half a gigabyte of data. So? This doesn’t mean that an hourglass is a denser source of information than a cell. The storage requirements for the output of this program do not tell us “how much data a living thing truly holds” — that statement makes no sense.

As for “We often think of the DNA as the storage medium, but clearly there is more to it than that”…jebus, does a professor of bioengineering really need to go back and take some introductory cell biology courses, or what? Heh. “More to it than that.” I’m glad to see that someone needed an elaborate computer simulation to figure that out.

I am, for some reason, reminded of the time I attended a seminar by a computer scientist on an exciting new simulation of the genetic behavior of viruses that I was told would have great predictive power for epidemiology. One of the first things the speaker carefully explained to us was how they’d incorporated sexual reproduction into the model. I wish she’d waited to the end to say that, because it meant that I sat there listening to the whole hour talk with absolutely no interest in any other details.

Comments

“Everything in this model has to be a product of analyses of molecules from living organisms; they certainly aren’t deriving the functions and interactions of individual proteins from sequence data and first principles. We can’t do that yet!”

And never will be able to.

Anyone who’s even seen molecular modelling knows that in principle it’s impossible to calculate the possible interactions of every molecule in a cell, to model the quantum mechanical interactions of every atom, in every molecule in the cell with sufficient accuracy to get anything like a “complete” model. It’s a meaningless phrase.

I have to agree with PZ’s paragraph here:

“What it is is a sufficiently complex model of a real cell that it can uncover unexpected interactions between components of the genome, and that is a fine and useful thing. But as always, the first thing you should discuss in a model is the caveats and limitations, and this article does no such thing.”

Indeed that is a fine and useful thing if it does what it says on the tin. But that’s still a HELL of a model. And if it’s just going to do genomic interactions then it’s going to be missing the majority of the useful cell chemistry sadly.

Not wishing to piss on any fireworks, but this sort of hype really irritates me. Congrats to the researchers for their work, definitely +100 cool points. -1000000 cool points for irresponsible reporting/press releasing if that’s what they did.

Again, I don’t think the authors have claimed otherwise, but it’s implied by the NY Times that now we have an electronic simulation that we can plug variables into and get cures for cancer and Alzheimer’s, without ever having to dirty our hands with real cells and animals anymore.

Not really. It’s not devoid of hype, but many of the claims are often fairly modest, with implicit caveats:

For medical researchers and biochemists, simulation software will vastly speed the early stages of screening for new compounds. And for molecular biologists, models that are of sufficient accuracy will yield new understanding of basic cellular principles.

Still a bit breathless there in the “vastly speed the early stages of screening,” yet “early stages of screening” isn’t overpromising much.

The lab-bomber kooks are going to be canarding this for the next lifetime, aren’t they?

And the IDiots are going to be yapping about the huge amount of design that went into the simulation, like that means anything at all.

It would be pretty cool if you could run some kind of molecular-level simulation, given just an organism’s DNA, and observe everything going on in the cell. Unfortunately, that seems a bit beyond what our technology can do at this point.

I hate to say it but the “early stage screening” thing is over promising a HUGE amount.

Drug discovery has been here. We get these “advances” every other minute, and they are usually trumpeted to the high heavens by pharma management desperate to have an impact on drug development costs (although this is the cheapest end to target, it’s when things get to people it gets pricey). This is not where the bottleneck is at all. It’s not where the hard work is either. High throughput screening of hundreds of thousands, millions even, of compounds is relatively simple. Data management and knowing that what the label says is really in the vial is the hard part of this type of screening. Models to reduce the number of compounds screened are a dime a dozen and all need to be tested by experiment.

The vast majority of drugs act at receptor sites/enzymes/proteins. Genomic linkages and relationships will find new areas for drugs perhaps, new biochemical motifs, but this hype grossly underestimates the work needed to translate that into actual drug targets.

Unfortunately I think the general pattern for this kind of research (not having read the paper I’m generalising broadly, and perhaps unfairly) is to develop the underlying mathematical and computer science techniques for constructing models of complex biological systems (important! worthy! necessary!), and then trying to write the motivation section in which they find it necessary to claim biological plausibility (ugh). Of course, the modelling technology is still in its infancy, which is why people tend to choose fairly simple mechanisms to model, or they break off a nice self-contained bit of biology that they don’t necessarily understand well in order to validate the theory.

It’s probably a sign that responsible biologists should adopt a CS student today and teach them how natural sciences work.

As an ex-modeler, I think there’s some slippage between the work and the write up. Meaning the article claims are probably inflated and the expectations of the actual model are more modest and not presented as a replacement for actual organism.

A cursory look at the article (which you can click through to form the NYT article) pretty much supports my suspicions.

Right now, running a simulation for a single cell to divide only one time takes around 10 hours and generates half a gigabyte of data,” Dr. Covert wrote. “I find this fact completely fascinating, because I don’t know that anyone has ever asked how much data a living thing truly holds.

In a sloppy way, I think he’s trying to say that this drives home the amount of data connected to a simple process of a living thing. We all know that cellular instructions that are ultimately encoded for by DNA drive cell division; however, we have a really good feel for what 500 MiB is. That’s most of a movie. We know what 10 hours of computation time means. That connects the abstract with a more contextualized value.

My first thought was, “I bet the model breaks completely at the first point mutation in any of the proteins.”

Which makes me wonder how they’d ever translate it into cancer research.

I wonder. Do they have sufficient spacial resolution to determine this? If they are actually modeling protein shapes and such, I bet it wouldn’t break — or if it does, it’ll be very useful in refining future models. If the model is “element A does thing B”, then yeah, I think you’re right.

Yes, it annoyed me, too. No doubt some PR person at Stanford decided to call it a “complete model” whatever that means.

That said, PZ, you seem to have an unwarranted chip on your shoulder in regard to computer scientists working in biological research. Cut ‘em a break — if they had years of study in biology they would be biologists, in which case their models wouldn’t exist at all.

Ignore the neo-Luddite in charge. This is an extraordinary advance. Complete simulation just means they’ve gotten the cell to run through an entire cell division cycle. Once extended in complexity to eukaryotic cells, this could indeed provide an amazing tool for fighting cancer. We could run batteries of either directed or random changes to cell regulation genes and see how they affect cell division and control. We could compare the simulated cell protein profiles with those of cells from cancerous tissue and form hypotheses about transcription errors, find targets for cancer drugs, and so on. The possibilities are endless.

In regards to how fine grain this model is, I thought this might be a course grain molecular dynamics simulation from the description, however it appears not to have any molecular resolution at all, and thus wouldn’t be able to handle any mutations or changes in cellular chemistry.