Skepticism

EVENTS

Deconstructing metaphors

Oh, that’s right — that’s what philosophers are good for. They’re really good at questioning models. John Wilkins has been busily dismantling the cheap and easy metaphors we use to describe molecular biological concepts in a series of posts, taking on genes as language, other popular gene myths and metaphors, and explaining why genes aren’t information. The problem is that when we explain stuff we know well to students, we use metaphors and analogies to get across the initial ideas, and unfortunately, because scientists are human, the metaphors take on a life of their own and sometimes become the dominant paradigm for understanding the reality. And that can be hazardous.

I’ve lived through the era in which everyone started thinking of the genome as an elaborate computer program — we still have lots of people thinking that way, and in some ways it’s gotten worse as bioinformatics has brought in a synergy with computer science. But it’s not! It’s nothing like a series of instructions! This model has become a serious impediment to getting the new generation of advanced students to understand the biology, and worse, they try to shoehorn the biology into how they think a sophisticated computer program ought to work.

We’ve also got the problem of naive idiots thinking the metaphor is the thing and drawing false conclusions. The genome is a recipe, and every recipe needs a cook, therefore God, etc., etc., etc., ad nauseam.

Genes and DNA are one important component of a complex of compartmentalized biochemical reactions, in which every reaction product interacts with and influences the state of the whole. We’re seeing an excessive reductionism borne of the last 50 years of success in molecular biology, and it’s about time the pendulum swung back to a more balanced perspective. One gene tells us very little; you need to step back and look at the interactions of networks of gene products in a complex environment to understand what’s going on in the cell, and then you have to step back further to look at patterns of interactions between cells, and then further still to see how individuals interact with one another and the environment, and then you have to step way back to see how populations interact, and then, maybe then, you’re really talking about evolution.

Share this:

Comments

Following Weiner here, DNA is a physical structure, and it is not “information” in the sense used by communications or computation theories.

And radio waves are quantized photons, a physical structure, and it is not “information” in the sense used by communications or computation theories. Bits are magnetic or electric potentials, a physical structure, and are not “information” in the sense used by communications or computation theories.

Most elaborate computer programs aren’t really sequences of instructions either… Programming is largely about the interactions between multiple components in a complex environment. Sure, bits of it are sequential, and ultimately all the code actually running on a single physical core ends up getting executed one instruction at a time, but almost nobody actually writes the stuff like that.

Which is not to say that the analogy is any more apt than it would be if programming was like that…

We’ve also got the problem of naive idiots thinking the metaphor is the thing and drawing false conclusions

That seems to happen in every field where metaphors are used as a means of explaining things. Nowadays, in my field, I try to discourage use of metaphors entirely – usually a field has a rich enough vocabulary that metaphors aren’t necessary; they’re just intellectual laziness on both sides of the discussion.

We’re seeing an excessive reductionism borne of the last 50 years of success in molecular biology, and it’s about time the pendulum swung back to a more balanced perspective.

Agree. Greedy reductionism is one of the major ways of first framing and then rejecting atheism. “If nature is fundamentally patterns of matter and energy in motion …then that’s all there is to talk about. It’s nothing BUT.”

I don’t think the computer language metaphor is inapt, as long as you add that’s it’s written in an esoteric language far worse than ‘brainf*ck’, running on notoriously unreliable parallel hardware and in an environment with scores of undocumented API calls.

DNA is ‘code’ to make a protein only. Even then there is some wiggle room on the exact chemical nature of the final protein product. DNA is also used to control when and where a protein is made. None of this can lead to simple explanations of phenotype. Most phenotypes we care to talk about are not even proteins.

This is how the metaphor to computer code breaks down. A gene does not code for blue eyes. Actually, even an entire collection of genes can not guarantee your eyes will be blue. There is so much nuance that takes place. Besides gene regulation, the availability of precursor molecules to make blue pigment must also be present to make eyes blue. Computer code rarely is ever subject to such wildly variable input signals and most computer code would crash, throw exceptions or give nonsensical behavior if it did not encounter a very precise controlled environment. Again, this is how the metaphor breaks down. DNA ‘code’ is clearly doing something different and we lose that appreciation if we take the metaphor too seriously.

I think Wilkins insistence on “not information” goes a little too far. I agree that a gene is not information, and in that sense his explanation is literally true. But information is more than a metaphor, and it is true and meaningful to observe that DNA is a molecule that can store information.

In a response to a comment, Wilkins notes “Information is too mystical for this physicalist.” which also makes me wonder what other well-defined mathematical abstractions would “too mystical”. Much of mathematics deals with concepts that are not actually present in the physical world. These are still useful abstractions, not metaphors. It would be wrong, for instance, to say that a salt crystal is a cube. But a cube is a well-defined abstraction, as is its symmetry group. By forgetting about salt and studying cubic symmetry one can derive conclusions that may provide insight into salt crystals. That’s not mysticism; it’s mathematics.

Obviously all physical entities store information, but in the case of DNA it’s useful to restrict attention to the part of the information that determines the sequence of base pairs. What’s significant about this information is that it gets “copied” (scare quotes because I know I’m drifting into metaphor) during certain processes of biological interest.

I could store a phone number by encoding it as base pairs and that could be spliced into DNA. I could also store it by tying a series of knots in a string. True, a knotted string is not information. But information can be stored in the knots of a string. Am I allowed to say that information can be stored in knots on a string?

Conversely, the devices we are now most accustomed to thinking of as storing information (computer memory and persistent storage) have turned out to be tremendously useful in studying the sequence of base pairs of DNA. If “information” was just a metaphor, this would be a surprising coincidence. DNA as a molecule can be studied in many ways, but there is nothing wrong with restricting attention to its information content for certain purpose, encoding it as a sequence of bits, studying these bits (e.g. for homologies), and applying the conclusions to actual DNA.

I agree that “information” can be misleading if it causes people to make the leap to semantic content, but Shannon information is a well-defined abstraction having nothing to do with semantic content, and I don’t think people should hesitate to use “information” in the context of DNA, at least if they actually know what they mean by information.

…as long as you pretend it’s a computer language for which we have no analog, and that functions completely differently from any computer language.

This is the fucking stupidest thing I have read today.

Information is an abstraction. The cell doesn’t deal with abstractions. You could also pretend that pyruvate has a set of instructions coded in its pattern of chemical bonds, but that doesn’t mean that metabolism is an information processing exercise in its execution, except as a model human minds use to grasp it.

That’s the whole point. You’re insisting that your abstract model of information is the reality of the biology.

Did both Marcus #3 and PZ #11 not notice my qualifiers? I think I shot for pithy but should have explained in more depth.

It’s nominally a set of instructions, but transcription, regulation and so forth are heavily error-prone and largely stochastic, the ways different parts of it can effect one another are subtle and unpredictable, it relies on a biological context that can vary considerably and the whole shebang is ruinously complicated to get a handle on.

You could make a computer language like that (esoteric languages are essentially extended CS jokes, after all), but it would be a fundamentally useless.

I fully agree that it’s an unhelpful metaphor; I just wanted to make the aside that if you really, really wanted to you could create a computing environment with a similar level of opacity – and that such an environment would be useless for a designer, and any functional programme in such a messed up context would have much the same problems as genetics when it came to understanding how the thing worked.

I don’t think the computer language metaphor is inapt, as long as you add that’s it’s written in an esoteric language far worse than ‘brainf*ck’, running on notoriously unreliable parallel hardware and in an environment with scores of undocumented API calls.

…and then given to thousands of programmers and system admins who all have different quirks, don’t like comparing notes, and never document their code. For millions of years.

There’s one use for the analogy at least; it really doesn’t make “intelligent design” look good. If design worked through a DNA “code,” you’d need to propose a whole swarm of semi-competent designers working in an ill-suited language to approach the convoluted mess of real life.

I would distinguish between metaphor and abstraction. A metaphor can be useful in inspiring hypotheses, but you can’t really trust the conclusions you reach that way. An abstraction is an oversimplification that focuses on part of the object of study. An abstraction has the property that you can work exclusively within the abstraction and still derive conclusions that apply to what you started with, under certain simplifying assumptions. The distinction is often overlooked in popular explanations of science, leading people to think they understand something just because they have heard a metaphor.

wasn’t this discussion about metaphors already done, talking about analogies?
but this whole metaphor of DNA being computer code, was also done in the reverse. Computer Virus, was a metaphor to describe a piece of code, that can’t run by itself, but can insert itself into the functional code already running in a processor, causing the computer to produce more copies of the virus and propagate them throughout the network. This metaphor led me to put virii on the “not” side of the “living?” question. Cuz the virus doesn’t reproduce or metabolize by itself; it depends on insertion into a living cell, to do so. So a virus is just a code-fragment, not a complete program. Thus DNA is just code, to run the cell (just like a computer).

I probably can’t articulate this properly without fully thinking through this more, but I still think there is a sense in which genes are determinative and have a priority over things like say proteins or cells. Of course everything works together in extremely complex ways. But if you alter a particular gene, you may die or you may cause drastic changes to a protein for example if it is involved in protein expression, or cellular functioning. For example Huntington’s disease is caused by a change in a single gene. Why do cells deploy such elaborate energy expensive proofreading apparatus, could it be that DNA sequences are really important? Once you alter a gene, the protein expression is altered and all the downstream stuff changes. Non genetic changes such as environmental changes can of course have drastic effects too, but these don’t permanently alter cells and organisms the way genes do. I think there has been a healthy push back to the “selfish gene” view toward evolution, but sometimes I think the push back has gone to the other extreme in which genes are seen as irrelevant, or so dependent on complex interactions that the role of individual genes to some biologists it seems, is irrelevant. We need a balanced approach which views individual and collections of genes playing causal and determinative roles in organism traits, but at the same time fully accounts for how environmental, cellular, developmental and ecological factors alter gene expression as well as traits.

I went to a relatively good highschool and still pretty much all I got out of it was Mendel, Punnett squares, and some really general concepts like geographic isolation and genetic drift (and CONTROVERSY! – gradualism or punctuated equilibrium?!). Reading this site has disabused me of the notion of DNA ‘codes’ and genes for phenotypes, but I need a real biology course to actually understand any of it.

fundamentally this confusing of a symbol with the thing is built into language and the way we think. It can and does lead to many problems as PZ shows here. Problems of misunderstanding nature as well as useful insights We often go off the rails when someone comes up with a new metaphor, a new abstraction to describe some aspect of reality some phenomenon encountered
Not just in biology either, the abstract metaphor of “the Market” to describe how we. distribute resources is an oversimplification of a very complex process of the interaction of biological and cultural needs.
I would go so far as to say the the concept of a god is a metaphor that has been taken for a reality and has led to some serious difficulties.
As we humans reach deeper into the nature of reality we are confronted with the with the fact that in a very profound sense there are no things as such but just events in time and our short linear experience of living makes our understanding.prone to misunderstanding.http://en.wikipedia.org/wiki/This_is_not_a_pipe

A gene does not code for blue eyes. Actually, even an entire collection of genes can not guarantee your eyes will be blue. There is so much nuance that takes place. Besides gene regulation, the availability of precursor molecules to make blue pigment must also be present to make eyes blue. Computer code rarely is ever subject to such wildly variable input signals and most computer code would crash, throw exceptions or give nonsensical behavior if it did not encounter a very precise controlled environment.

Then, of course, there are the exceptions. Code using “fuzzy logic”, and things like natural language processing, which at least try to fudge a semi-predictable result, even if the inputs of seemingly garbage. Or, on the other end, you get the funny stuff like the resent experiment with err.. I don’t remember the exact gene. PR followed by a bunch of numbers, or something, which seems to be specific enough that it is only expressed in a) fish scales, and, by extension, only in hair pigmentation, if you transplant it into an animal. While it would do stranger things in animals with other pigmentation factors, a blond would end up with neon blue hair, and, as far as they are aware, so far, likely no other changes.

I am not sure the problem is that its not code, or that we are still trying to work out how to bloody write code like it, and only hitting the edges. lol A few languages do.. interesting things, like Erlang, which can “restart” just about the whole entire program, if errant data crashes it, but usually just restarts subprocesses, uses “instanced” processing, which is to say that each execution of a function is its own static variable space, where sharing is done using something almost like the database, rather than global variables, and everything being performed, when the process ends, goes away, unless saved via that external memory, after execution is finished. Figure out how to do that, while still accessing global data too.. and it would get really crazy (and a lot more like what goes on in a cell). But, interestingly, the whole signally system that goes in in cells, where DNA is pulled out, translated, then pushed into the core, to do its thing, while making dang sure it never interacts with anything else in transit, is a bit like calling subroutines. Sure, once its in the core, it has to fight with what ever other “subroutine” is running in there, but, outside of there, its kept separate and non-interacting, as much as possible, preventing transfer of RNA into the DNA, or the other way around, never mind stopping HGT from taking place, and allowing for a more stable mutation system (which includes the whole, “many ways to encode this protein”). Single cell organisms without a nucleus have much more limited safe guards, allow HGT all the time, and retain function not by coopting partly broken copies into doing the same function, but by actually deleting the ones that don’t work right, fairly aggressively, most of the time, and borrowing working copies from their neighbors. I.e., they not only don’t, but can’t, encode a protein more than one way, unless placed under sufficient stress that their bug detection/repair system is loosened up, and more errors are allowed through.

So, yeah, the real problem is assuming that any current analogy is accurate in the sense of how either one actually works *now*, which is a valid point. Its not quite as certain that programs will not, at some point, develop the same traits, when someone comes across a situation where something as sloppy as what genes do will work better. And, there are always, albeit minor and specific, exceptions to the, “Nothing in genes is a 1:1 process.” This is definitely true in complex forms, but.. stuff I have been reading on the differences suggest that, while still sloppy, its not that far off the mark with say, Ecoli, when compared to even an earth worm. The step from a gene system that “has to be” the same generation to generation, for anything to work at all, to one where you have dozens of alternative encodings for the same protein kind of throws a wrench into the mix – it means you “get” those cross linked things, and new functions out of the mix, which where not there before. The result would be a bit like the old Win 3.1 system, where “some” applications wouldn’t work without the earlier dlls, because they “depended” on the glitches in the code to run at all, while new versions of the same dll had fixed the code, and hence, broke the program. So, you get 12 version of like dothisstuff.dll, and 40 different programs, each dependent on the “correct” version.

Mind, this was considered “sloppy” coding, since, eventually, relying on the broken stuff would make your program extinct, but.. lol

And radio waves are quantized photons, a physical structure, and it is not “information” in the sense used by communications or computation theories.

Radio waves are NOT information. The information is in the pattern carried by the waves, which may or may not be meaningful.

Bits are magnetic or electric potentials, a physical structure, and are not “information” in the sense used by communications or computation theories.

Bits are NOT magnetic or electric potentials. Magnetic and electric potentials are ONE WAY of storing a bit. Other ways include beads on an abacus, knots on a string, holes in a punch card, graphite smudges on paper, impressions in clay, piles of rocks.

Bravo, my friend, you have just scored a rare Pharyngula double play. You make a fundamental category error in one sentence, and in the very next sentence you make the same category error going the other way.

Next time, before thinking yourself qualified to jump on motes of perceived stupidity in other’s eyes, check first for the beam in your own.

You can read the code of a computer program subroutine, and you can figure out the function of that subroutine, just from the code. You look at the base pair sequence of DNA and you CAN’T determine its function just from the sequence. Because the function is not actually encoded in the sequence. The function comes from the structural of the protein/RNA that is transcribed and translated from the DNA, a structure determined by the folding of that molecule.

And the DNA sequence does not contain any of the information pertaining to that folding pattern or the final 3D shape. THAT information, if it exists anywhere outside of the protein/RNA itself is encoded in the very laws of chemistry itself.

The information in DNA is not a set of instructions on how to perform functions, like a computer code. It is a parts-list, plus labels for the parts. No where in the DNA code is the information pertaining to HOW the parts should be put together and used to perform a function stored. And the labels (genetic switches) do not contain within the DNA sequence any information about when the labeled part should be made. It is just a flag. It does not contain within its sequence the equivalent of “in circumstance X go to flag A and make the part the flag points to”. THAT information, too, comes from elsewhere.

You can read the code of a computer program subroutine, and you can figure out the function of that subroutine, just from the code. You look at the base pair sequence of DNA and you CAN’T determine its function just from the sequence.

If you don’t know anything about how the computer works, then you can’t figure out anything about the function of a code. If you know a lot about embryology, then you can figure out what is likely to happen from the sequence of base pairs.

No where in the DNA code is the information pertaining to HOW the parts should be put together and used to perform a function stored.

I could just as easily say that nowhere in a computer code is it prescribed how a transistor works.

You can read the code of a computer program subroutine, and you can figure out the function of that subroutine, just from the code.

Strictly speaking, you cannot even figure out in general if it will terminate. We just usually restrict our attention to code written by people with a specific purpose, and those people (if they’re competent and want to be understood) restrict their code to the subset of possibilities that is legible in the way you suggest above.

You can write unreadable code in any language, though there might be languages (mentioned in other comments) in which it is very hard to write readable code.

That said, I think the “code” and “recipe” metaphors don’t work very well for DNA except to get a general sense of what’s going on. This is in contrast to “information” as encoded in base pairs, which is one rigorously defined property of DNA, though not sufficient to understand everything about it.

This is how Eric Davidson puts it in the preface to his book The Regulatory Genome. Notice the strong statement that regulation of genes (by genes -regulatory genes) underlies the causality of development and evolution. I’m not sure he doesn’t understate the role of the genes themselves and the role of protein functions underlied by them as well as regulation of them (a point of contention and research). But anyway, genes are pretty essential and determinative.

“This book is about the system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution. Because the sequence of the DNA regulatory elements is the same in every cell of each organism, the regulatory genome can be thought of as hardwired, and genomic sequence may be the only thing in the cell that is. Indeed that is a required property of gene regulatory elements, for they must endow each gene with the information-receiving capacity that enables it to respond properly to every conditional regulatory state to which it might be exposed during all phases of the life cycle, and in all cell types. For development, and therefore for major aspects of evolution, the most important part of the core control system is that which determines the spatial and temporal expression of regulatory genes. As used here, “regulatory genes” are those encoding the transcription factors that interact with the specific DNA sequence elements of the genomic control apparatus. The reason that the regulation of genes encoding transcription factors is central to the whole core system is, of course, that these genes generate the determinant regulatory states of development.”

If you don’t know anything about how the computer works, then you can’t figure out anything about the function of a code.

You can know everything about the language and computer architecture and the problem of determining what the code outputs or whether it terminates at all is still undecidable. You can even write very simple and seemingly clear code that embeds something like the http://en.wikipedia.org/wiki/Collatz_conjecture and if you could just look at it and figure it out, then you’d be one up on every other mathematician that has tried so far. (Let alone actual undecidable problems expressed as code.)

If you know a lot about embryology, then you can figure out what is likely to happen from the sequence of base pairs.

You can presumably figure out more than you could if you didn’t know much about embryology, but it might also turn out that there is no more efficient way to figure out what it does than to let the embryo grow. This analogous case appears to hold for the vast majority of possible computer programs, just not the programs that people normally write.

If you don’t know anything about how the computer works, then you can’t figure out anything about the function of a code. If you know a lot about embryology, then you can figure out what is likely to happen from the sequence of base pairs.

But if you DO know everything about how a computer works, you CAN predict what the code will do, exactly. And the code will do the same thing every time you turn your computer off and on again and run the code fresh. And if you build a computer from scratch, and run the code on it, you can predict beforehand what the code will do.

But even if you know EVERYTHING about how an organism works, you can’t predict what the DNA code will do, not without knowing one additional piece of information that is not contained anywhere within the organism itself, which is the initial environmental condition, internal and external, when the DNA code “starts” running.

This analogous case appears to hold for the vast majority of possible computer programs, just not the programs that people normally write.

Perhaps we need to specify what precisely is the metaphor we are talking about?

Because based on the OP, my understanding of this discussion is that the metaphor in question specifically refers to computer code as produced by humans as the metaphor for DNA that is in dispute, ie the code that people write and only the code that people write.

Not every, any, and all possible types of code that the laws of information allow to exist.

I mean, I took that as THE POINT of PZ being careful to specify the phrase “nothing like a series of instructions” in the OP, and the point of his post at @11.

Because based on the OP, my understanding of this discussion is that the metaphor in question specifically refers to computer code as produced by humans as the metaphor for DNA that is in dispute, ie the code that people write and only the code that people write.

This seems obviously wrong, since DNA was not written by people, and there is no requirement for it to be understood, debugged, or maintained like a computer program, and no reason at all to expect it to look like one. I assumed this discussion was at least limited to things than cannot be dismissed trivially.

But if you DO know everything about how a computer works, you CAN predict what the code will do, exactly.

I guess so, in the sense that it is deterministic. For instance, if I ran a simple program to show that the Collatz sequence length of 596311 is 97, then I could predict that it will be 97 every time I run it again. But, similarly, if I grafted a branch from a red delicious apple tree, I could predict with some reasonable degree of likelihood that the fruits it will bear will be red delicious apples. But neither the computer code nor the DNA sequence (if I later sequenced it) is legible. The only known way of predicting the outcome without knowing it ahead of time is to allow the process to be carried out. (There may be a few shortcuts in the the case of the Collatz sequence, but no simple, general formula as far as I know.)

So, leaving aside whether “computer program” is a good metaphor for DNA (and I think it’s not good) even this is not the same as saying DNA is legible. Computer programs are not legible either.

My take on metaphors is that it depends on how you use them as well as your awareness of their limitations. But there are also well defined theories which are also used as metaphors, so we have to be careful to distinguish these. For example there are perfectly valid ways of using information theory in biology as well as making countless more misleading and confusing statements about it. We must not forget that entropy can be defined in terms of information in a perfectly mathematically consistent manner that however you want to put it, does actually describe real physical things. All biology is driven by entropic processes, so any of these processes can be translated in terms of information in a perfectly valid manner that doesn’t have anything to do with how humans think information works. In this manner we can determine the information content of DNA as well as any other biological process.

But even if you know EVERYTHING about how an organism works, you can’t predict what the DNA code will do, not without knowing one additional piece of information that is not contained anywhere within the organism itself, which is the initial environmental condition, internal and external, when the DNA code “starts” running.

hmm, same would hold for computer programs that intake all kinds of data from sensors, and have their behavior affected by the data…

Because based on the OP, my understanding of this discussion is that the metaphor in question specifically refers to computer code as produced by humans

My previous comment may have been too glib. I agree that that was probably PZ’s original point, and I accept his view that the metaphor is an impediment to learning biology.

But my point (since it’s more up my alley) is that there is nothing particularly tractable about a series of computer instructions (at least if they are surrounded by a loop with no fixed time bound). If someone thinks they understand biology because they understand computer programs, they’re “in for a bad time.” But if they thought they really understood computer programs, they were already in for a bad time.

“We’re seeing excessive reductionism……One gene tells us very little; you need to step back and look at the interactions of networks of gene products in a complex environment to understand what’s going on in the cell, and then you have to step back further to look at patterns of interactions between cells, and then further still to see how individuals interact with one another and the environment, and then you have to step way back to see how populations interact, and then, maybe then, you’re really talking about evolution”

But isn’t this statement also reductionist? Ultimately your statement acknowledges that there is a materialist/reductionist hierarchy with biochemistry being at the base. I have a quibble with anti-reductionist statements like “the whole is greater than the sum of its parts”. That too is a misleading metaphor when one has to ask how 2+2 +?=4. Society is made of interacting individuals and each individual has a mind which is the result of the activity of interacting neural networks which are the result of the activity of interacting neurons, etc.. If you’re on board with that, well you’re a reductionist in my books. If not, then it’s your job to explain the woo in 2+2+?=4.

One more comment. Before I ever heard the “recipe” or “computer program” metaphor for DNA, I had heard the “blueprint” metaphor, which is very weak even as a metaphor. Clearly, you don’t decode the DNA into a picture that shows where the various parts of anatomy are located.

So I always took the computer program metaphor to mean that you can’t look at DNA to figure out what it does. It has to be run like a computer program over the whole development process to determine the outcome. Even then, the metaphor has lots of problems, but I was unaware of anyone seriously suggesting that it was legible in the sense that, say, an especially well written computer implementation of the fast fourier transform might be identified as such by inspection.

. . . And the DNA sequence does not contain any of the information pertaining to that folding pattern or the final 3D shape. THAT information, if it exists anywhere outside of the protein/RNA itself is encoded in the very laws of chemistry itself.

The information in DNA is not a set of instructions on how to perform functions, like a computer code. It is a parts-list, plus labels for the parts. No where in the DNA code is the information pertaining to HOW the parts should be put together and used to perform a function stored . . .

It is true that DNA is not procedural code, but it isn’t true that it contains no information about its function, final 3-d shape or when it is invoked. There are algorithms that predict, to a degree, structure and function from sequence alone. To say that this is dependent on the laws of chemistry is no different from saying that the function of a program depends on the compiler.

Of course, the information metaphor is horribly abused by creationist “biologists” who think they understand information theory and coders who think they understand biology, but they usually just state a rehashed Paley’s watchmaker argument. Anyone who understands the math would know that you can describe the information/complexity of any number of naturally occurring systems, it implies nothing about design.

There are algorithms that predict, to a degree, structure and function from sequence alone. To say that this is dependent on the laws of chemistry is no different from saying that the function of a program depends on the compiler.

Actually it’s hugely different:

When you say “to a degree”, that translates “some”. The problem lies with what “degree” is, compilers are not probabilistic like chemistry is: Same code + same compiler + same computer + same operating system = same final product. Sit at your computer and recompile the same code 100,000 times and see if there are differences between the resulting programs.

DNA will not give you the same final product every single time. Some genes apparently even rely on this fact to produce variant forms. Calling it “fuzzy” logic is a crude understatement.

Of course there’s variation. The probabilistic nature of the system is crucial to evolution. However, the laws of nature are regularities. A particular set of hydrophobic amino acids followed by a loop followed by another set of hydrophobic amino acids will give you the same tertiary structure 100,000 times over. It’s regular enough that a bacterium, trillions of generations removed from a human, can synthesize insulin whereas a 1980s Casio calculator will not be able to compile Windows 8.

Of course, this comes back to how you structure the metaphor. Above, I’m defining the scope, so my metaphor works. When it comes to the actual mathematics, which is abstraction rather than metaphor, the information content of DNA sequence has a specific meaning tied to its physical structure. That’s the abstraction vs. metaphor distinction that I want to emphasize.

I am not a biologist but I believe I have a basic idea of how genes work. Then, occasionally, I meet people who do not have the faintest clue at all and I find myself to be the one eyed man in the land of the blind, unable to explain it to them.

I could tell in a few minutes how a mechanical clock works, a four stroke engine, planetary orbits, a transistor. These things are simple and most people can understand them.

But how do you explain genes? What would you tell a 10-year old? Without relying on confusing metaphors and/or a lot of jargon? How exactly does DNA determine the shape of a living thing?

Ugh. Sorry PZ, but having a biologist “explain” information theory is exactly as painful as you would find having a civil engineer “explaining” biology to you. I read through the Wilkins post, and couldn’t find anything that was even remotely on point. It picks a few descriptions of information from some well-known papers (starting with Shannon, which is a pretty good start) but then disappears off into the weeds and doesn’t address information theory at all.

Almost everything in that post is, for some reason, talking about “instructions” and “programs” as if those things had anything at all to do with information theory. They don’t. Information theory is exclusively about the communication of messages via unreliable channels. If there is a good reason to not describe DNA as encoding information, it’s not contained in that post or in anything I’ve seen on this page.

Please, please, for the love of Clapton, would everybody who wants to comment about information theory read a book or something about what it’s about, rather than just talking about what you think it might be about?

Thanks PZ for succinctly stating what bothered me about The Selfish Gene. The metaphor of a short piece of DNA with feelings, just seemed wrong, giving emotion to something that is incapable of it.Putting genes in their place as one part of a larger network make them more comprehensible to me

OMIF, I do hope this deconstruction is successful. Because in the feast of life, a beautiful and helpful creature, resplendent like sunrise above the sea, metaphor is; though it seems often to land in the gravy boat and flaps its wings, besmirching host, guest, and all possible discourse.

But seriously, I really do so hope the philosophers are successful… yet I fear for the enterprise, because they couldn’t do a damn thing for the fundamentals of electricity. (Or maybe they just didn’t bother to try… in which case, never mind.)

Information theory is exclusively about the communication of messages via unreliable channels. If there is a good reason to not describe DNA as encoding information, it’s not contained in that post or in anything I’ve seen on this page.

You’ve provided the “good reason to not describe DNA as encoding information” yourself. If information theory is, indeed, about “communication of messages”, it follows that information theory cannot apply to anything which doesn’t have both a message-sender and a message-receiver.
With DNA, who or what is the message-sender? And who or what is the message-receiver?

Actually, I think that this is precisely where religion went wrong in the first place.

The concept of “God” makes a certain amount of sense if you think of it as a metaphor for “good.” Goodness is something real, but it’s kind of abstract, and humans aren’t very good at dealing with abstractions. So goodness became a simile “like something your parents would want you to do” which became a metaphor “a person who is like a parent” until by the middle ages you were getting people put to death because they had a slightly unorthodox interpretation of the trinity. Metaphors can be useful tools, but they can go poisonous on you pdq.

Rasal @ 49, do explain to me how genes are information. Information, by tautological definition, informs, this does not make information the thing itself. For the same reason a description of an apple is not an apple, the sequence of base pairs in a genetic sequence (even accounting for histone modifications, transcription/translation/replication errors, hell, even the occasional little cytosine which has deaminated into uracil) still describes the molecule. The molecule is what is doing the acting, not the sequence or meticulously accurate descriptions we apply.

Is, for example, the book on my shelf discussing Kant’s philosophy information? Think about this carefully…

I would argue (as I think Wilkins sort-of is regarding the “DNA is Information” trope) that the book contains information in the patterns of printed words when we interpret it, it is subject to errors (or fictions, in some cases) but the book is not the information, it is an object and the thing in itself.

DNA? Sure, it’s information and instructions, but there’s so much more to it than that, and we’re barely starting to scratch the surface of just what’s there.

Along the way, well… people are going to try to understand it using imperfect terms and concepts they already have a grasp of and are similar (but also vastly different) to how DNA “works”.

The question then is how to convey all this STUFF to laypeople without those inaccurate inadequate metaphors. Until then, we’re stuck with “It’s like a programming language (but it’s not)” and similar.

(On that note, I’d like to know more about the hows and whys and ins and outs of DNA without hopelessly entangling myself in field-specific jargon.)

You’ve provided the “good reason to not describe DNA as encoding information” yourself. If information theory is, indeed, about “communication of messages”, it follows that information theory cannot apply to anything which doesn’t have both a message-sender and a message-receiver.
With DNA, who or what is the message-sender? And who or what is the message-receiver?

I’m not seeing what you think the problem is, unless you’re implying that there are some kind of constraints on what those things need to be. The “message sender” would be whatever process generated the information, and it could be anything from a sentient being to a stochastic chemical process. There are no particular requirements for a “receiver” either. If you really want to identify a receiver, the most obvious thing to point to would be whatever chemical process is influenced by it.

Rasal @ 49, do explain to me how genes are information. Information, by tautological definition, informs, this does not make information the thing itself.

I don’t think genes are information; the only sensible description would be that genes encode information, which is why I described it in those terms.

Please note, however, what I actually said above, which is that I haven’t seen an argument as to why this is not true. I was choosing my words carefully.

Maybe it isn’t (I think it is, but I’m willing to be persuaded otherwise) but the discussion is totally off point; there’s a whole bunch of stuff about “programs” and “instructions” which is not what information theory is even about.

But those metaphors are now vocabulary and have specific meanings in that field. If I started elaborating on “transcription” to a molecular geneticist and started making florid references to medieval illuminated manuscripts, I’d expect them to look at me strangely.

You’re right, each field’s vocabulary is probably built out of metaphors, to keep language compact, but there are new specific meanings that divorce from the metaphor. In my field, computer security, nobody asks whether your “firewall” is rated for specific burn-through times – though that would be an appropriate question for a real “firewall” We need to recognize that a field develops its own vocabulary that is unique, and – if we want to speak effectively in that field – we must invest the time to learn that vocabulary. That’s one of the reasons I try to encourage people not to use metaphors (or, as you may be pointing out, layering metaphor atop metaphor* – it eventually obscures things to the point where we’re left arguing about the metaphors rather than the actual whatever it was.

A decade or so ago, some of my friends in the security field published a paper in which they argued the Microsoft’s dominance of operating systems amounted to a “monoculture” and consequently increased community risk. It was an interesting argument but quickly got lost in hashing and rehashing how the metaphor was to be applied, rather than assuming that anyone participating in the discussion at that level had enough of a basic understanding of the problem to be able to discuss it using computer security’s own terminology. The problem was, without the metaphor’s power, a lot of the argument went away – so I concluded that the metaphor was being used deliberately to obscure and manipulate rather than illuminate. To me, that’s the point we need to look for – if a metaphor is being used to explain something to someone who is completely unfamiliar with such-and-such field, are we doing our duty to them to provide them honest information if we reach for a metaphor that may actually manipulate their perception of the topic?

(* I was tempted to say something about “bad lasagna” here but I bit my tongue)

It’s also a misuse of metaphor. “Fuzzy logic” is a form of an inference engine that uses probability distributions at the edge-nodes of its decision tree. “Fuzzing” is a form of error injection based on pseudorandomly varying inputs. That’s using vocabulary from late 1970s artificial intelligence and 1990s computer security flaw detection/quality assurance(*)

I’m not trying to rathole on this particular term, but it’s a decent example of what I’m talking about. I’m not a molecular biologist but maybe they have special words (probably evolved from metaphors) in their vocabularies that they use to describe the uncertainty or not-complete-predictability of some of these processes. If I were talking to a molecular biologist I’d be happy to have them use the vocabulary of that field, and I’d ask for backfill where necessary, and remember the new vocabulary I was given as being specific to that field and I’d try to avoid abusing it metaphorically in another. It doesn’t substantially lengthen a discussion and it saves a huge amount of time if you wind up getting in a pointless discussion about properties of metaphors that have no bearing on the topic at hand.(**)

I remember, back when I was a full-time programmer, an 8-hour long heated disagreement with my co-worker Mike S., in which we thoroughly thrashed out all of the details of a design decision only to discover that we were both using the same word to mean something different. We had to meticulously disassemble our vocabularies, re-achieve matching, and then repeat the entire discussion, which then took about 4 minutes. I’ll never forget that experience; it was surreal.

(* “Fuzzing” was not something QA people would talk about until the 1990s when it became a popular technique in security flaw-hunting; now QA vocabularies may include the term)
(** At this point, my internal mnemonic is the scene in “Enter the dragon” where Bruce Lee tells the student ‘consider a finger pointing at the moon’ …)

@ Rasalhague
I agree. I haven’t seen any specific arguments against this understanding. There are two separate issues here. One is the analogy between genes and “programs” and “instructions” which we all agree, I presume, is flawed. (As are all analogies) Here I agree with Wilkins.

The second is the whole question of whether the information encoded in a sequence is a “real thing”. I don’t have a strong opinion on the subject, but I bristle at his references (in the comments) to information as “mystical”. It’s no more mystical than multiplication.

I think Wilkins conflates the two and proceeds to poo-poo the usefulness of any attempt at applying symbolic, abstract descriptions to how genes operate, saying “Why not just say that genes and organisms and the environment gives the later organism?” Well, if you want an algorithm to detect important proteins or pathways, you’re going to need to describe the system in terms of strings, signals and probabilities.

Both terms are so ambiguous in common usage, that this is not a useful starting point. And obviously a gene is not information just like a picture isn’t a pipe, but this does not make information irrelevant to DNA sequence (as distinguished from genes).

Actually the field of Information Theory (which Shannon apparently called communication theory based on some references I found) isn’t a great starting point either. I believe that the information metaphor is closer to what computer scientists think of as information and usually measure as a number of bits.

Information theory is a related concept, and doesn’t have to be restricted to communication (contrary to Rasalhague’s comment) though that was the original motivation. For example, I don’t think Rasalhague will object if I say that Hamming codes are a topic in information theory (will you?) but they are just as useful in constructing error correcting encodings in a faulty storage medium as they are in correcting errors that occur during communication.

What I found just now scanning Google books for information theory text books is that “information” is almost never used formally; entropy is used instead. There is a concept of “mutual information” (a measure of the dependence of random variables) and I could shoehorn this into a more CS-like information definition (e.g. if I copy the contents of one hard drive to another, then the distribution of bits between them will be highly correlated). But I’d prefer to drop the shoehorning entirely and think of information in terms of what people find practical in “information science”. That is, number of bits (either as storage or transfer rates).

I.e., information is the abstract concept associated with identifying one state out of many. A one megabit RAM is not information, but it can be abstracted as having 2^(2^20) possible states, and the thing we find useful about this RAM is that it will reliably hold exactly one of these states if it is functioning correctly (modulo a little variation introduced by cosmic rays and so forth). Informally, we just say it holds 2^20 bits. These bits don’t need to “inform” in any very satisfying sense. They could be the result of coin flips and they would still represent “information” in a commonly used sense of the word.

I think it goes well beyond metaphor and is an accurate abstraction to say that the DNA molecule (note, not “genes”) can be made to function like a storage medium. Actually, most of the base pairs of most DNA are non-encoding anyway, so there are good reasons to reject any discussion of “genes” in this context. But it’s not a coincidence that the field of bioinformatics, which concerns itself among other things with sequence homologies generally does this kind of pattern matching on a computer and not in the chemistry lab. DNA has lots of interesting properties, but in this context, we concern ourself with one property of DNA, that it has a sequence of base pairs, and that DNA derived from it mostly preserves the same sequence.

Do I hear any objection if I say that computers with large storage capacities are the workhorse of bioinformatics, and bioinformatics is a subfield of the study of DNA and has useful scientific things to say about it? If I’m mistaken about this, well, then I guess a couple of my past jobs were a waste of time, among other things. But I had been under the impression that I was hired to do information processing, and this information was specifically the sequence data encoded in the base pairs of DNA.

Information theory is a related concept, and doesn’t have to be restricted to communication (contrary to Rasalhague’s comment) though that was the original motivation. For example, I don’t think Rasalhague will object if I say that Hamming codes are a topic in information theory (will you?) but they are just as useful in constructing error correcting encodings in a faulty storage medium as they are in correcting errors that occur during communication.

No, I certainly don’t disagree, but there’s a terminology issue here: a “communication channel” is just something you put messages in and have them come out again, typically corrupted by noise. “Communication” is not literal in this context. It’s completely conventional to describe a hard drive, or flash memory, as a “communication channel”.

A one megabit RAM is not information, but it can be abstracted as having 2^(2^20) possible states, and the thing we find useful about this RAM is that it will reliably hold exactly one of these states if it is functioning correctly (modulo a little variation introduced by cosmic rays and so forth). Informally, we just say it holds 2^20 bits. These bits don’t need to “inform” in any very satisfying sense. They could be the result of coin flips and they would still represent “information” in a commonly used sense of the word.

Yes, but an important note of caution: “data” and “information” are not the same thing, even though both measured in bits. In a message containing N bits of data, the maximum number of bits of information is N, but it could also be zero, or any number in between. The mutual information you mention elsewhere is important in quantifying the difference. In a typical case, you would look at the mutual dependence between data bits in a message; if they have a statistical dependency between them, the information they contain will be less than if they are independent.

But yes, it’s absolutely the case that information can be the result of coin flips. In fact, in the conventional description of information theory (rather than complexity theory) it’s required that information originates from a stochastic process. This does seem counter-intuitive.

A measure of the information content of a message is the uncertainty that is removed by it’s being observed. Observation of a variable taken from a random distribution removes uncertainty about it’s value. On the other hand, observing the output of a deterministic process removes no uncertainty, since it was already known.

It’s exactly this confusion that is exploited by the likes of William Dembski with his “specified information” bullshit. The name by itself is essentially an oxymoron, but to anybody not well acquainted with the subject, it’s not at all obvious where the flaw in his argument is.

I should probably concede that even in computer science, “data” is probably a more useful and less loaded term than “information.” (Though “information” shows up as information science, information technology, information processing, and informatics.)

So am I allowed to use a “data” metaphor for DNA, namely that a measurable amount of data can be stored and retrieved from the sequence of base pairs of a DNA molecule? The retrieval part is certainly true in that DNA is now routinely sequenced and represented as strings from a 4-character alphabet in computer memory.

The other direction is clearly much harder, and I don’t know the length of the longest synthetic sequence that has been realized in physical DNA.

It’s not clear to me (as a computer scientist–that part is for real–but I don’t even play a chemist or biologist on TV) whether DNA can even “store” every possible sequence, because the sequence also affects physical conformation of the macromolecule (true of RNA anyway) and some sequences might be chemically unstable (anyone know?). But even in that case, you could use some kind of redundant encoding or chunking strategy to use DNA as a computer storage medium.

So if you can make a round trip from bit image on a hard drive to a molecule back to the same bit image, I don’t think it’s that crazy to talk about this as a molecule that stores data. Knowing that doesn’t mean you’ve even scratched the surface of what there is to know about DNA chemistry let alone genes or biology, but it does suggest that thinking of DNA sequence as a 4-character encoding is a sound and useful abstraction.

The second is the whole question of whether the information encoded in a sequence is a “real thing”. I don’t have a strong opinion on the subject, but I bristle at his references (in the comments) to information as “mystical”. It’s no more mystical than multiplication.

Yes, that’s a bit bizarre.

I would actually say that the encoding of information is very much physical. What I think many people are getting hung up on is the difference between the physical thing and a property of the physical thing.

From an example mentioned earlier, you wouldn’t be looking at photons as information, but you might be looking at their wavelength. Sure, it’s not the photon itself, but I don’t see how that makes it in any way “abstract” and it sure as hell isn’t “mystical”.

The second is the whole question of whether the information encoded in a sequence is a “real thing”. I don’t have a strong opinion on the subject, but I bristle at his references (in the comments) to information as “mystical”. It’s no more mystical than multiplication.

Yes, that’s a bit bizarre.

Agreed (and I said this in #10 above). Mathematical formalisms are not “real” in the sense of being composed of physical matter or energy. An ellipse is not a “real thing” but it can be useful in understanding orbits. The octahederal symmetry group isn’t either, but it can be useful in understanding crystal structure. I don’t think Wilkins is even trying to say that all such abstractions are “mystical”. I think he’s mostly just worried about vague metaphors being misused and isn’t thinking about ways in which “information” is actually a very precise abstraction that may have something (though far from everything) to say about DNA. But I do think the “not information” part of his series presents the weakest arguments.

Let’s say we have a menagerie A of egg-laying animals, half of which have feathers and half of which have scales. We have another menagerie B of animals that hatched out of eggs laid by animals in A We define a random variable by choosing an animal uniformly from A and determining if it has feathers or scales (so the probability is 0.5 of feathers). We define another random variable by choosing an animal uniformly from B and again determining if it has feathers or scales. In the latter case, the probability might not be 0.5, but the distributions are independent.

Now, let’s say that after choosing from A, we instead choose uniformly from B but with the constraint that if the first choice is “feathered”, we must choose an animal from B that hatched from the egg of a feathered animal in A. Likewise if it is “scaly”, we must choose an animal from B that hatched from the egg of a scaly animal.

You don’t have to be Gregor Mendel to figure out that the distributions will not be independent in the second case.

Now I’m just an unfrozen caveman computer scientist and definitely not an information theorist, but I think this scenario can be analyzed in terms of http://en.wikipedia.org/wiki/Mutual_information and while various environmental factors could influence whether feathered or scaly things are chosen in this way, to a first order approximation, the correlation between these distributions is very much mediated by genes. So I don’t think it’s even that crazy to say “genes carry information” if this is what we mean by information. To say “genes are information” is poorly phrased, but I think the information metaphor actually holds up very well if you are clear about what you mean by it.

My 2 cents (from a molecular biology perspective):
Nucleic acids contain information in the same way in which antibodies contain information about what epitope to bind or hydrogen contains information about how to form water molecules or photons contain information about the speed of light.

in the same way in which antibodies contain information about what epitope to bind

I agree with the above (though I think you were going for reductio ad absurdum).

For example, I define two random distributions over a human population: (1) has been exposed to Human cytomegalovirus, (2) tests positive for Human cytomegalovirus antibodies, then these distributions will be correlated, and this correlation is due to the information content of the the antibody. And this holds for many other diseases as well. (Wikipedia: “Though the general structure of all antibodies is very similar, a small region at the tip of the protein is extremely variable, allowing millions of antibodies with slightly different tip structures, or antigen-binding sites, to exist.” ) Assuming wikipedia is correct about “millions” of tip structures, I would conclude that each antibody has at least 20 bits of information (probably much more) as I might use the term (though I agree that it is problematic).

or hydrogen contains information about how to form water molecules

Doubtful, but maybe in context. If I define a random distribution over choices of atoms that could be either hydrogen or helium, say, this will correlate to whether the atoms can form water molecules or not. So the “hydrogen/not hydrogen” state of an atom encodes a single bit. It isn’t so much that this is untrue, but the abstraction is no longer very useful in this case, whereas it seems at least potentially useful in the case of antibodies.

or photons contain information about the speed of light.

This seems incorrect because the speed of light is generally understood as a universal constant, so I’m not sure what it would mean to have information about the speed of light. Possibly you could have a distribution over whether a particular human agent knew the speed of light, and define the information in these terms. But photons wouldn’t help convey this information. (You could run an experiment, and that experiment could involve photons, but the result depends on other laws, not just photons.)

In any case, antibodies do seem analogous as information-carrying molecules, but the other examples don’t seem analogous at all.

There is nothing spooky about information. If you can use something to determine between N states, it is reasonable and commonplace to define this as having log_2 N bits of information. All physical systems carry some information. DNA is unusual among macromolecules in having a base sequence that maps very well to strings from a finite alphabet, and the biotech industry would have to do things very differently if not for that fact. I still don’t buy the idea that is harmful to think of DNA in this way.

Even though we can determine the speed of light by measuring photons, the photons don’t contain any information about c. They simply have to move at the speed of light because that is the way this universe works (“The way this universe works” is usually refered to as the laws of physic).
Hydrogen doesn’t form water under the right conditions because it contains any information on that. The formation simply follows the laws of physic (or physical chemistry or chemistry. however you want to classify that ). The epitope-paratope interaction of antibodies works the way it does (under the right conditions) because of the underlying chemistry/physics, not because the antibody carries any information about the epitope. Watson-Crick base pairing works under the right conditions because of hydrogen bonds, not because the bases in the nucleic acids contain any information.

That of course doesn’t mean “DNA contains information” is a bad metaphor. It’s really useful in bioinformatics, for example in the assembly of genomes. But it is still a metaphor, not the reality and you have to keep that in mind. Otherwise the limits of the metaphor will hinder your understanding of what is really going on. Antibodies for example can be seen as containing information when we are talking about the aminoacid sequence of the antigen binding site. But not when talking about how they are made, because that involves random gene rearrangement and other things for which the information metaphor is useless.
I think, that’s what the OP is criticising.

It’s not clear to me […] whether DNA can even “store” every possible sequence, because the sequence also affects physical conformation of the macromolecule (true of RNA anyway) and some sequences might be chemically unstable (anyone know?).

No sequence is unstable at or near room temperature; no sequence affects the conformation of a double strand with matching sequences (whether DNA or RNA).

I still don’t buy the idea that is harmful to think of DNA in this way.

I don’t think it is either. But it’s not the whole picture and once this metaphor is dried up, it won’t be useful for discovering new ideas. I think the question of which metaphor is “good” is too complex to solve. The goodness of fit depends entirely on the situation.

As far as bioinformatics need be concerned, DNA contains information and the models will build all kinds of useful things…but they will always be limited to that assumption. Civil engineers don’t consider gravity as the curvature of space-time by mass-energy when designing buildings. However their approach works just fine for building many kinds of useful things, but won’t lend itself to relativity or astrophysics.

I think a lot of the problem here is lack of agreement on what we mean by “information.” I agree that a photon does not contain information about the speed of light (and I said so above). But I would consider an antibody (let alone DNA) to be an information-carrying molecule.

E.g., picking a person at random from the human population, I can say little about what specific diseases they have been exposed to. If I run tests of which antibodies are present, I can now narrow things down significantly. It is as reasonable to say that antibodies carry information as it is to say that the printout I might get after my hospital visit contains information about what diseases I’ve been exposed to.

This isn’t a metaphor. It is one generally understood definition of information.

Where do you draw the line, by the way? If I say that the human brain can store information, is that still a metaphor?

To say I have N bits of information is, roughly, to say that I can narrow down the possible set of states of an unknown by a factor of 2^N. This kind of information is all over the place, not just in DNA, but its is present their and there is nothing mystical or even metaphorical about it.

But it’s not the whole picture and once this metaphor is dried up, it won’t be useful for discovering new ideas.

I’m not sure metaphors actually dry up, but I agree that the applicability is limited.

That’s true of any scientific abstraction. Is a comet a “dirty snowball” or is it a coordinate triple that changes (more or less) according to Kepler’s laws? Depending on which kind of science you’re interested in, these might both be useful abstractions. Obviously, neither of them is going to give you the whole picture.

But not when talking about how they are made, because that involves random gene rearrangement and other things for which the information metaphor is useless.

The “information metaphor” for a hard drive is useless when I’m trying to figure out if I can plug it into an older machine that only has an IDE connection. That is the nature of abstractions. They only describe specific parts of the system being studied.

No sequence is unstable at or near room temperature; no sequence affects the conformation of a double strand with matching sequences (whether DNA or RNA).

That’s true in vitro.

However, its conformation is a product of its environment (pH, metal ions, solvent). In vivo, that environment is heavily modified by cellular processes that reference said sequence to build the proteins necessary for maintaining the appropriate cellular conditions. Hypothetically, one could modify the conformation by editing in/out genes coding for proteins that affect supercoiling. Or, in a multicellular organism, the genes coding for heat regulatory processes.

That depends on how you’re defining the term information. If you mean in the everyday sense, then sure, but in the information theory sense information isn’t “about” anything any more than the temperature of a rock is “about” anything. It’s a quantifiable property just the same as entropy or mass.

I think this is the big disconnect that is permeating this discussion. There’s a bias that necessarily follows from the choice of the word “information” that causes people to ask what the information “means”. There isn’t an expectation of meaning in the definition of information.

As mentioned earlier, I think this confusion is what causes creationists to latch on to information theory as one of the things that will finally give them the support for what they think must be true. If you ascribe “meaning” to information, and “senders” to messages, pretty soon you’ve arrived at the conclusion that the existence of information in the universe must mean that there’s an intelligent entity at the root of it all. And they know who it is. All that remains is to have some math guy fill in the blanks, and presto! He’s working on it right now, and all will be ok as long as he can keep up the pretense that he doesn’t already know that it just doesn’t work.

I think a lot of the problem here is lack of agreement on what we mean by “information.” I agree that a photon does not contain information about the speed of light (and I said so above). But I would consider an antibody (let alone DNA) to be an information-carrying molecule.

But both antibodies and photons simply follow the laws of physic (/chemistry/etc). What is the difference between them?

E.g., picking a person at random from the human population, I can say little about what specific diseases they have been exposed to. If I run tests of which antibodies are present, I can now narrow things down significantly. It is as reasonable to say that antibodies carry information as it is to say that the printout I might get after my hospital visit contains information about what diseases I’ve been exposed to.

Yes it is a useful abstraction to say the mixture of antibodies in your blood carries information when we are talking about infections. But the thing is, the information you are talking about has to be derived by taking a blood sample, exposing it to the pathogens in question and then detecting any antibody-antigen binding.
The antibody itself just “blindly” binds its epitope, no matter whether it is part of protein from the pathogen or artificially fused to a completely different protein. Under some conditions the antibody can even bind epitopes that are only similar to its “normal” epitope. And in these processes the information abstraction isn’t useful.

This isn’t a metaphor. It is one generally understood definition of information.

But is there anything that doesn’t contain information according to this definition?

Where do you draw the line, by the way? If I say that the human brain can store information, is that still a metaphor?

That’s a tricky question because it is not fully understood how this actually works. As far as I know the sensory input is “stored” in the pattern of the neural network, not in the neurons themselves. But I’m actually not sure if “information storage in the brain” isn’t a metaphor itself, because the reality is more like ion gradients traveling along a membrane (and jumping gaps through neurotransmitter signaling) shape a neuronal network. And “shaping” a neuronal network is again a metaphor.

To say I have N bits of information is, roughly, to say that I can narrow down the possible set of states of an unknown by a factor of 2^N. This kind of information is all over the place, not just in DNA, but its is present their and there is nothing mystical or even metaphorical about it.

But again that would mean hydrogen contains information. Of course it is possible to use such a definition, but what is that actually good for? A hydrogen atom can only do a limited number of things and knowing the specific conditions we can determine which outcome is the most likely. But what do you gain by claiming this information is contained in the hydrogen atom? The hydrogen atom follows the laws of physic. The bits of information that tell you the most likely outcome have to be derived by someone through the measurement of the specific conditions and the knowledge of the involved laws of physics/chemistry/etc.

Another example: Let’s take six hydrogen and two carbon atoms. There is only a finite number of molecules that can be formed by these atoms. If I understand you correctly we can say the carbon and hydrogen atoms contain information because we can determine the molecules most likely to form under a specific set of conditions (pressure, temperature, etc) by looking at the bits of information like which orbitals can be involved in the covalent bindings etc. The question is: What have we gained by determining the atoms as “containing information”? Because the actual formation of the molecules is not a process of exchanging or passing along bits of information. (Or is it according to information theory? If so, what do we gain by viewing the formation of the molecules as a process of exchanging or passing along bits of information?)

The “information metaphor” for a hard drive is useless when I’m trying to figure out if I can plug it into an older machine that only has an IDE connection. That is the nature of abstractions. They only describe specific parts of the system being studied.

The limits of abstractions is exactly the point that has been criticized.

@Rasalhague

That depends on how you’re defining the term information. If you mean in the everyday sense, then sure, but in the information theory sense information isn’t “about” anything any more than the temperature of a rock is “about” anything. It’s a quantifiable property just the same as entropy or mass.

I think this is the big disconnect that is permeating this discussion. There’s a bias that necessarily follows from the choice of the word “information” that causes people to ask what the information “means”. There isn’t an expectation of meaning in the definition of information.

Aren’t temperature and entropy abstractions as well? And quantifiable properties aren’t meaningless, they describe objects or systems, like the mass of an object describes its resistance to being accelerated (or if I understand that correctly, one aspect of mass describes this). So the informational content of a system or an object describes if and/or how many bits of information are contained within the system/object, right? And these bits of information can’t be empty, can they? So what is wrong with claiming that by this definition a photon contains (at least) one bit of information and the value of this bit is the photon’s speed?

To summarize this: I understand (or at least I think I do), that by the definition used in information theory DNA/antibodies etc contain information. But it is not always useful to apply the concepts of information theory to molecular biology, because not all processes in molecular biology can be explained by applying concepts of information theory. Nevertheless in many other cases it is indeed extremely useful, as the growing field of bioinformatics proves.

But is there anything that doesn’t contain information according to this definition?

Anything that can be in more than one state contains some information. So, sure, almost anything you can think of contains information. But this isn’t always a useful way of looking at things. It seems useful to me in the case of a DNA sequence because the sequence data corresponds to many bits of information (2 per base pair in fact). The information persists under a robust range of conditions. It can be copied. It can be read and even written experimentally.

There is also a lot of information (in the strict sense) in a balloon full of air at room temperature. Every molecule has coordinates and velocity, and it would take many bits to record this to distinguish it from other states of this gas. But it’s not analogous to DNA sequence because it changes rapidly and there is no obvious way to copy it, to read it, or to write it. (And a given DNA molecule likewise has information in its physical conformation, but this is not the part that the metaphor focuses on, for the same reason.)

Even in the case above, the number of states the balloon gas can be in is relevant to its entropy, which is an equivalent concept to information. One of the worst problems with ID people is that they can’t keep it straight that high entropy is high information, not the other way around. I never said information is “meaningful information.”

[does the brain contain information] That’s a tricky question because it is not fully understood how this actually works.

It doesn’t seem tricky to me at all. I can see a phone number on paper, walk away to where I don’t see it anymore and then write (usually) the same number down on another piece of paper. If the first phone number was determined by a statistical distribution, then the one I write down comes from a highly correlated statistical distribution. Ergo my brain conveyed information.

Do the papers in my example contain information. I could turn the question around. Are you wiling to say that anything contains information?

But again that would mean hydrogen contains information. Of course it is possible to use such a definition, but what is that actually good for?

It thought I said as much in #69: “If I define a random distribution over choices of atoms that could be either hydrogen or helium, say, this will correlate to whether the atoms can form water molecules or not. So the “hydrogen/not hydrogen” state of an atom encodes a single bit. It isn’t so much that this is untrue, but the abstraction is no longer very useful in this case …”

Note that an individual atom cannot have a hydrogen/not hydrogen state, so I meant this in terms of the random variable. But anyway, I agree on both counts. It is possible to use the definition, but in this case it doesn’t look like it is good for much. On the other hand, if I’m looking for homologous sequences, focusing on the DNA sequence as a series of discrete choices of base pairs does seem useful.

The limits of abstractions is exactly the point that has been criticized.

Sorry, I thought the point was metaphors that are obviously wrong , at best don’t give much insight, and at worst are misleading. DNA as a “computer program” belongs in this category, particularly if you start applying ideas that come from standard practices in software engineering. DNA as an “information storing molecule” does not seem to belong in this category. DNA has many other properties (e.g. melting point) but it is possible to take the information abstraction and carry out useful science as long as you don’t kid yourself that you know everything about it.

I guess I have to agree with Rasalhague that if a biologist or chemist can imagine how excruciating it would be to listen to me (a computer scientist) explain their fields to them, you might be able to see how I view all the opinions that have been tossed around about what is and is not information (and probably my attempts are equally annoying to someone who knows more information theory in particular, but I’m doing my best.)

But is there anything that doesn’t contain information according to this definition?

In the real world? No, I can’t think of an example.

The question is: What have we gained by determining the atoms as “containing information”?

I can’t think of an application in this instance. But what of it? They still do contain information, because of the way that information is defined, whether or not we happen to care about it in a particular instance. Information is defined in the way it is for very good reasons (outlined in great detail in many books and papers on the subject) and it happens to follow that information also exists in a bunch of other places too. This conclusion is not contingent on whether that is useful or not. In the same system, it probably wouldn’t be that interesting to evaluate the temperature of each of the electrons. That doesn’t mean that it isn’t true that those electrons have a temperature.

Aren’t temperature and entropy abstractions as well?

I have no idea what this means. Are mass and energy abstractions?

And quantifiable properties aren’t meaningless, they describe objects or systems, like the mass of an object describes its resistance to being accelerated (or if I understand that correctly, one aspect of mass describes this).

If you’re using a definition of “meaningless” that is something like “has no effect on anything” then I agree. But when people apply the term “meaning” to information it tends to get interpreted differently.

So the informational content of a system or an object describes if and/or how many bits of information are contained within the system/object, right? And these bits of information can’t be empty, can they? So what is wrong with claiming that by this definition a photon contains (at least) one bit of information and the value of this bit is the photon’s speed?

I don’t in any way disagree that photons encode information; they certainly do. I was objecting to your description of the information being “about” something.

To summarize this: I understand (or at least I think I do), that by the definition used in information theory DNA/antibodies etc contain information. But it is not always useful to apply the concepts of information theory to molecular biology, because not all processes in molecular biology can be explained by applying concepts of information theory. Nevertheless in many other cases it is indeed extremely useful, as the growing field of bioinformatics proves.

Completely agree. Just because the information in a system can be measured, doesn’t mean that that fact is useful.