Tuesday, January 11, 2011

Secret Alien Messages in Your Genome

Today is the first day of my course on molecular evolution and I want the students to experience the give-and-take of scientific—and not so scientific—debate in the blogosphere.

Their first assignment is to read the following quotation from an article by Paul Davies and answer the question that follows.

Paul Davies is a professor at Arizona State University. He was trained as a physicist and he lists his interests as cosmology, quantum field theory, and astrobiology. The quotation is from an article he wrote last April in the Wall Street Journal [Is Anybody Out There?: After 50 years, astronomers haven't found any signs of intelligent life beyond Earth. They could be looking in the wrong places.]

Another physical object with enormous longevity is DNA. Our bodies contain some genes that have remained little changed in 100 million years. An alien expedition to Earth might have used biotechnology to assist with mineral processing, agriculture or environmental projects. If they modified the genomes of some terrestrial organisms for this purpose, or created their own micro-organisms from scratch, the legacy of this tampering might endure to this day, hidden in the biological record.

Which leads to an even more radical proposal. Life on Earth stores genetic information in DNA. A lot of DNA seems to be junk, however. If aliens, or their robotic surrogates, long ago wanted to leave us a message, they need not have used radio waves. They could have uploaded the data into the junk DNA of terrestrial organisms. It would be the modern equivalent of a message in a bottle, with the message being encoded digitally in nucleic acid and the bottle being a living, replicating cell. (It is possible—scientists today have successfully implanted messages of as many as 100 words into the genome of bacteria.) A systematic search for gerrymandered genomes would be relatively cheap and simple. Incredibly, a handful of (unsuccessful) computer searches have already been made for the tell-tale signs of an alien greeting.

Here's the question. Assume that the aliens inserted a 1000 bp message in the same place in the genomes of every member of our ancestral population from five million years ago. At that point every organism in the species had exactly the same message in a region of junk DNA.

If you were to sequence that very same region of your own genome what would the message look like today? Would it be different from the original message of five million years ago? Is there a way of reconstructing the original message and interpreting it?

Comments will be held until tomorrow evening in order to give everyone a fair shot at coming up with an answer.

Since I have no measurable knowledge of biochemistry I won't embarrass myself by delving into how fast 2 kilobits would deteriorate for any given rate and which statistical method should be used to salvage information from a population sample. I'll probably make a severe methodological mistake early on by sheer ignorance of elementary biochemical facts.

So instead I will just say that if I had the job of encoding some message in non-coding DNA, I would try for erosion resistance and message distinctiveness.

Therefore, to detect anomalies I would run some SETI tests on junk DNA regions with significant preservation (and only those of unknown function).

Then again I'm just clueless, so I'm looking forward to informed comments.

The real question is, what kind of message could possibly be encoded in DNA by Aliens? The Aliens would first have to know how we would name nucleotides in the future. (I.E. that we call Adenosine A, and thus give the 'message' a much needed vowel) Then, as if that isn't unrealistic enough, they would have to know what languages we speak and presumably how to give short form versions of the words in these languages (they don't have the entire alphabet to their disposal).

To answer the original question:If the message was in a piece of junk DNA, and hence it provides no inheritable advantage, it should be mutated over time so that it no longer resembles the original message. However, if you know the mutation rate and you know where the insertion took place, you may be able to use bioinformatics to line up the sequences of a the sample population and decipher the message based on the mutation rate and probabilities.

'Junk' DNA would have mutated over the 250k or so generations that passed since the aliens' code was inserted in our genome. As a result, there would be significant changes (point mutations, insertions, deletions, duplication etc.) from the original sequence.

It would be difficult to reconstruct the original message, even if there were thousands of available individual sequences of this genomic locus. Assume, for instance, there was a bottleneck some 100k years ago: one population of human ancestors survived. That population acquired 4.9m years' worth of mutations at our locus of interest, and we have since obtained 100k years' worth of mutation. However, the current genomic record cannot go back further that 100k years, since that is the oldest common ancestor. We might be able to determine the locus sequences at that point in time, but we could not look back further and determine, with certainty, what sort of mutations occurred before.

You didn't specify whether the sequence is functional. One could argue that it would be ridiculously implausible for the aliens to implant a functional, highly conserved, sequence, and also pointless because the source of such a sequence could not be distinguished from ordinary evolution (I'm guessing this is the answer you have in mind). Or one could argue that, given the intent of the aliens to leave a lasting message, they wouldn't bother leaving anything but a functional, highly conserved, sequence. Proponents of intelligent design would presumably argue that this is not only possible, but that the existence of conserved sequences constitutes evidence for exactly this sort of tampering :-)

Assuming that the region the 1000 bp message is in is actual junk DNA and thus under no constraints by natural selection then I would think that the message today that I have would look very different from the original message five million years ago. It is possible for all 1000 base pairs to have mutated and the mutation would be passed on from parent to child over successive generations. Of course it is very unlikely that everyone would have the same mutations and thus it would be possible to reconstruct the original message by sequencing the genome of all human populations and then by looking at the relative frequencies of each base pair in each human population (as defined by geographic region) one would be able to deduce that the highest frequency base across the largest number of human populations would most likely be the one which was in the original message. If this is done for all bases in the 1000 bp message then a cost estimate to the original message could be made.

This of course assumes that the far more technologically advanced aliens would just leave the message in a piece of junk DNA without any means of self-preservation. I find it morel likely that the aliens would have preserved the message by making it critical to the survival of the organism by making any mutation to the message lethal to the organism. There are many examples of this already modern medicine and I think that instead of looking in junk DNA people should be looking for messages in the genes which encode functions that are essential to life instead. Or possibly genes which enable higher cognitive processes which would be necessary if the aliens meant for the organisms they planted the message in to see it.

The message would certainly not look the same. Some changes would likely have been fixed by drift over the 5 million years, and a number of variants would exist in the population as well. With the statistical tools of population genetics, we may be able to sort through the existing variation of the message, but without the ability to compare the sequence in our genomes with other species, we would probably not be able to reconstruct the true ancestral sequence. Whether or not we could interpret the message would depend on how many mutations had actually been fixed in our lineage.

The message would be different, because neutral mutations could have been incorporated into the 1kbp region at any point in time. However, reconstructing the alien DNA would be a hard task. Although one might be able to store the regions of every single human into a database and sequence it to find conserved regions, the conserved regions may have descended from an earlier mutation as well (assuming alien DNA is subject to mutations and they had not developed machinery to ensure that region remains intact throughout history). Reconstructing this region, therefore, is similar to a phylogenetic tree - although we infer the evolutionary relationships based on similarities and differences, there is also a chance that all of our inferences are wrong.

Steve, you also need to consider how many rounds of cellular replication occur from the initial zygote to the cell that ultimately undergoes meiosis in each individual human generation. If the pre-meiotic mass were 1e6 cells, that would be 20 more rounds of replication.

@Anonymous you also need to consider how many rounds of cellular replication occur from the initial zygote to the cell that ultimately undergoes meiosis in each individual human generation.

It should be painfully obvious that I have no biochemistry/molecular genetic background.

My take on the MARSHALL 1999 paper, which is the source of the 175 mutations per generation in a 7x10^9 bp genome thus giving a mutation rate of 2.5x10-8, this is an inter-generational mutation rate and should account for all the rounds of cellular replication between generations.

@Larry At this rate, given that our lineage has been through several bottlenecks, the alien message would be corrupted and probably impossible to read.

Strictly hypothetical, if the aliens had the wherewithal to implant a message in our genome 5 mya presumably they had some idea of mutation rates and took steps to surmount this problem, otherwise why bother in the first place ?

@LM: Could you comment on the (apparently sourced) claim on Wikipedia of "highly conserved" regions of junk DNA?http://bit.ly/hcBh83

I don't know how junk DNA could achieve conservation. If those replicators were a program, ie a genetic heuristic, a quick-and-dirty solution would be to surround every encoding of a letter in the message alphabet with a lethal letter. So I would have error detection with scrapping.

A sophisticated approach would use error correction by copying any mutated letter back to the original, plus layered error detection to catch higher level errors. The latter would either be lethal or activate an overwrite of the faulty portion by the mirror (redundancy).

Also note that the message has to be learnable rather than arbitrary, otherwise we wouldn't understand it; maybe even be unable to distinguish it from a random string.

Therefore there wouldn't be a letter "A" in the message. What names you assign to letters doesn't matter. Also putting in "Sorry for the inconvenience" would require maybe an entire book preceding the message that explains the language. And by book I'm not referring to literature, not even Adams's.

Like Carl Sagan said: If you want to make an apple pie from scratch, you must first invent the universe.

I'm an engineer, not a biologist, so I'm pretty ignorant when it comes to this. I tried going through similar calculations to Steve Oberski, and don't get that high of a chance that the message could be corrupted. Could someone point out where I'm going wrong?

I started at the same point as Steve - 250,000 generations. Given 129 mutations per generations yields 32,250,000 mutations. Assuming a genome size of 3,000,000,000,000 base pairs, the odds of any given base pair having mutated over that time is 1.075e-5. Multiplying that times the 1000 base pairs for the message gives a less than 1 percent chance that the message would be corrupted.

That just doesn't seem right. Where's my mistake?

Hmm. I just followed Larry Moran's link to mutation rates. I see that my mistake is that in assuming only 129 mutations per generation, when in fact it's 129 mutations per individual. But then, how would you go about calculating how corrupted the message would be? It seems to me that following any particular lineage would still give a chance of mutation similar to what I calculated, so I still must be missing something.

Well, lets see: taking your estimate of 129 mutations per offspring, but randomly spread out over a 3 billion bp genome, yields an estimate of the probability that a given offspring will have a mutation in a particular 1000 bp region of 0.000043. That means the probability a mutation won’t be in this 1000 bp message to be 0.999957. Further, assuming 5 million years and 20 year generation time, that’s 250,000 generations. 0.999957 to the 250,000 power is .0000214. So it is very unlikely that this region won’t suffer at least one (neutral) mutation in 5 million years.

I haven’t thought about it too deeply, but maybe you need to divide the probability of a mutation in half if its autosomal dna (but maybe not: you need another for this one individual to mate with). And I think the population size is not really relevant (cancels out when factoring population mutation rate vs fixation rate).

And once fixed in the population, no hope (other than access to ancient dna sequences) to recover the original message (assuming the message was coded such that errors are not identifiable (unlike an erlor in an english sentence)

Could you comment on the (apparently sourced) claim on Wikipedia of "highly conserved" regions of junk DNA? [http://bit.ly/hcBh83]

The Wikipedia article discusses functional elements in noncoding DNA. It lists a bunch of them. They are well-known, see: Genomes and Junk DNA for a partial list.

Some functional regions of noncoding DNA may not have been identified. One way of detecting them is to compare sequences in different genomes and look for conservation of regions that don't have a known function. Such regions have been detected. They're probably not junk.

The flip side of this coin is regions that are not conserved. Those region are likely to be junk (with some exceptions). Most of the mammalian genome outside of transposons falls into this category. That's part of the evidence for non-functionality (i.e., junk).

Strictly hypothetical, if the aliens had the wherewithal to implant a message in our genome 5 mya presumably they had some idea of mutation rates and took steps to surmount this problem, otherwise why bother in the first place ?

I think you're missing the point. The point is that an astronomer, Paul Davies, claims that aliens could leave a message in junk DNA. He says that DNA sequences are "little changed" in 100 million years.

I'm simply pointing out that his knowledge of genomes and evolution is about as good as my knowledge of quantum field theory.

If I were an alien wanting to leave a message for humans I'd probably build a big shiney black slab and hide it in the desert.

If I were an alien wanting to leave a message for humans I'd probably build a big shiney black slab and hide it in the desert.

A friend of mine suggested the moon for something like this. It's geologically inactive, and a signal from there sent in response to radio signals from us would certainly spur technological advance to get us there.

The problem with setting it in DNA is having to know in advance which line is likely to develop to the point it can gaze at its own deoxyribonucleic navel, and, of course, ensuring the integrity of that message for the indeterminate length of time that would take. Like you, I don't think it's a likely choice. Just an intriguing suggestion.

I found the problem intriguing as an exercise in the reliable transmission of data through a channel that I don't deal with in my day job. This is a classic simplex data transmission scenario where there is no reverse channel that allows the receiver to signal the sender. If nothing else it was a chance to learn a bit about biochemistry and molecular genetics.

@Larry If I were an alien wanting to leave a message for humans I'd probably build a big shiney black slab and hide it in the desert.

Which reminds me, 2001: A Space Odyssey is being shown at the TIFF Lightbox theater in it's original 70mm glory.

@Larry If I were an alien wanting to leave a message for humans I'd probably build a big shiney black slab and hide it in the desert.

When we finally get round the minor technical hurdles of traversing distances that start at 100 million trips to the moon and back, we will, of course, need to leave some kind of calling card if we only find unicellular slime. Rude not to! The message will read: "Called but you were out. Here's a big slab, plus we made some pyramids, and perched a few big stones on top of each other. Got bored waiting for you to evolve. The reading frame, by the way, is 4 bits."

Certainly, a billion-year-old alien message encoded into our DNA would be easy to crack, too. Surely it couldn't be more difficult than figuring out the workings of a mathematics based on the square root of a 4th quadrant rutabega?

There are messages in every single painting, I have seen images that have been implanted in every lind of picture and painting that i have clicked on,Including snaps took from google earth.Most of the images are of different kinds of alien looking creatures and Also I have found what I believe 100% is gods word or words letters etc. These words/letters are in the shape of ancient creatures such as the Giant Sloth,Wood pecker,Rat etcthese words/letters you can find anywhere and everywhere, I am no linguist so I do not know what letters or words they represent but I would say there is over 30 different ones, Thats how I believe Jesus knew the name of Judas he read his name from his face.

If you're going to leave a message in DNA for hundreds of thousands of generations to decode, why not just preserve samples of the DNA in something gooey that hardens (like amber) without worrying about the mutation rate?

Recent Comments

Principles of Biochemistry 5th edition

Disclaimer

Some readers of this blog may be under the impression that my personal opinions represent the official position of Canada, the Province of Ontario, the City of Toronto, the University of Toronto, the Faculty of Medicine, or the Department of Biochemistry. All of these institutions, plus every single one of my colleagues, students, friends, and relatives, want you to know that I do not speak for them. You should also know that they don't speak for me.

Superstition

Quotations

The old argument of design in nature, as given by Paley, which formerlyseemed to me to be so conclusive, fails, now that the law of natural selection has been discovered. We can no longer argue that, for instance, the beautiful hinge of a bivalve shell must have been made by an intelligent being, like the hinge of a door by man. There seems to be no more design in the variability of organic beings and in the action of natural selection, than in the course which the wind blows.

Charles Darwin (c1880)Although I am fully convinced of the truth of the views given in this volume, I by no means expect to convince experienced naturalists whose minds are stocked with a multitude of facts all viewed, during a long course of years, from a point of view directly opposite to mine. It is so easy to hide our ignorance under such expressions as "plan of creation," "unity of design," etc., and to think that we give an explanation when we only restate a fact. Any one whose disposition leads him to attach more weight to unexplained difficulties than to the explanation of a certain number of facts will certainly reject the theory.

Charles Darwin (1859)Science reveals where religion conceals. Where religion purports to explain, it actually resorts to tautology. To assert that "God did it" is no more than an admission of ignorance dressed deceitfully as an explanation...

Quotations

I have championed contingency, and will continue to do so, because its large realm and legitimate claims have been so poorly attended by evolutionary scientists who cannot discern the beat of this different drummer while their brains and ears remain tuned to only the sounds of general theory.

The essence of Darwinism lies in its claim that natural selection creates the fit. Variation is ubiquitous and random in direction. It supplies raw material only. Natural selection directs the course of evolutionary change.

Rudyard Kipling asked how the leopard got its spots, the rhino its wrinkled skin. He called his answers "just-so stories." When evolutionists try to explain form and behavior, they also tell just-so stories—and the agent is natural selection. Virtuosity in invention replaces testability as the criterion for acceptance.

The first commandment for all versions of NOMA might be summarized by stating: "Thou shalt not mix the magisteria by claiming that God directly ordains important events in the history of nature by special interference knowable only through revelation and not accessible to science." In common parlance, we refer to such special interference as "miracle"—operationally defined as a unique and temporary suspension of natural law to reorder the facts of nature by divine fiat.

Quotations

My own view is that conclusions about the evolution of human behavior should be based on research at least as rigorous as that used in studying nonhuman animals. And if you read the animal behavior journals, you'll see that this requirement sets the bar pretty high, so that many assertions about evolutionary psychology sink without a trace.

Jerry Coyne
Why Evolution Is TrueI once made the remark that two things disappeared in 1990: one was communism, the other was biochemistry and that only one of them should be allowed to come back.

Sydney Brenner
TIBS Dec. 2000
It is naïve to think that if a species' environment changes the species must adapt or else become extinct.... Just as a changed environment need not set in motion selection for new adaptations, new adaptations may evolve in an unchanging environment if new mutations arise that are superior to any pre-existing variations

Douglas Futuyma
One of the most frightening things in the Western world, and in this country in particular, is the number of people who believe in things that are scientifically false. If someone tells me that the earth is less than 10,000 years old, in my opinion he should see a psychiatrist.

Francis Crick
There will be no difficulty in computers being adapted to biology. There will be luddites. But they will be buried.

Sydney Brenner
An atheist before Darwin could have said, following Hume: 'I have no explanation for complex biological design. All I know is that God isn't a good explanation, so we must wait and hope that somebody comes up with a better one.' I can't help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist

Richard Dawkins
Another curious aspect of the theory of evolution is that everybody thinks he understand it. I mean philosophers, social scientists, and so on. While in fact very few people understand it, actually as it stands, even as it stood when Darwin expressed it, and even less as we now may be able to understand it in biology.

Jacques Monod
The false view of evolution as a process of global optimizing has been applied literally by engineers who, taken in by a mistaken metaphor, have attempted to find globally optimal solutions to design problems by writing programs that model evolution by natural selection.