Sunday, February 26, 2012

Yet Another Creationist Misunderstands Information Theory

It's always funny to see a creationist try to use information theory, because they almost always get it wrong. Here we have Joseph Esfandiar Hannon Bozorgmehr, who posts under the name "Atheistoclast", demonstrating his ignorance:"Matzke misunderstands what is meant by "new information".

He apparently thinks that new genes, produced by duplication, represent novel information. But if you copy one gene 1000 times over, the information content remains the same even though you have created many more genes.

Poor Bozorgmehr needs to sit in on my course CS 462 at the University of Waterloo, where we will shortly discuss this very issue. Then he can prove the following theorem:

27 comments:

"Welcome to Jeopardy, Prof. Shallit. Why don't you tell us two things about yourself?"

"Well, I have a beard. Also, I have a beard."

There must be some sense of the word "information" in which another copy of the same sentence does not add information. It may not be true of Kologorov information, but it seems true of what most people (even most scientists) think of information.

Is there a way to reconcile these two different ideas? Has Kolmogorov's definition failed to correctly formalise our intuitive concept of information? (GASP!)

The other question, of course, is which should apply to biological systems.

(Incidentally, is the reason (roughly) why K(x^n) - K(x) is unbounded because the program must specify n? We can make the program as long as we like by making n huge.)

If Kolmogorov complexity does not capture your notion of information, then it's up to you to provide a rigorous definition for discussion and debate. So far what creationists do is rely on our informal notions, which are vague and inconsistent.

Even in the example you give, it's clear that repeating a phrase twice has the potential to provide more information than the phrase uttered once. For example, suppose there's a confederate listening on the television, and we've pre-arranged an action depending on whether I say it once or twice. "One if by land, two if by sea".

As for your final question, no, the program need not necessarily "specify n", whatever that means. But you are right that at least most of the additional information can ultimately be attributed to the number of copies.

The second line can be made arbitrarily long by making n arbitrarily large (ignoring such practicalities the limits on the integer kind). So I can place an arbitrarily large amount of information into n itself.

Obviously, that's not how a mathematician would argue - I really should get around to reading Li and Vitanyi's textbook. Their article (with Kirchherr) you linked to a while back ("miraculous universal distribution") was excellent, though as a scientist I am a tad worried that the complexity of a string is non-computable. It doesn't seem like there will be a simple "numerical recipe" for hypothesis testing with Kolmogorov, and yet Kolmogorov's thesis says that his definition is provably better than yours.

"As for your final question, no, the program need not necessarily "specify n", whatever that means." I think rather concretely about this, so being a fortran programmer I had in mind:

Well, see, that's the point. You demonstrate a fundamental misunderstanding here: producing a single program to print x^n says nothing at all about the Kolmogorov complexity of x^n. You are confusing upper bounds with lower bounds - a common mistake among beginning students.

The correct argument goes the other way. Given a program to produce x^n, we can deduce from that a program to produce n. Since we know that infinitely many n are incompressible, this gives the result.

You could think of it in terms of a physical problem. If you had an string of alternating black and white beads. It would take a certain amount of energy to make the initial string.

Suppose that you then used that string as a template for making 1 copy. Even if copies were cheaper to make than the original, they would cost something in energy,it would take n times as much energy to make n copies as it would to make one copy.

I've subscribe to one of those brain-dead Christian podcasts, and for the last couple of episodes they've been going in detail over the "book" Me, The Professor, Fuzzy, and The Meaning of Life, which starts with basic principles and build up to a "proof" of the Christian God.

The syllogism basically goes like this:

1. The universe had a beginning.2. Every event has to have a cause.3. Entropy is always increasing.4. There are two exceptions to entropy increasing: life and intelligence.5. Entropy is the same as disorder. Complex things are ordered, therefore complexity is always decreasing.6. Except when there's intelligence involved.7. Since we see complex things in the world today, and complexity is always decreasing, then back at the beginning of the universe, it had to be much more complex.8. Since the only way to get complexity is with intelligence, there must have been a great intelligence who put all that complexity into the universe at the beginning.

9. Therefore the cause of the universe had to be very intelligent, which is another way of saying God.

(He goes on later to show how it must specifically be the Christian version of God)

I see three main problems with this. The first one I'm good with, but Jeffrey, can you comment on the second and third?

Premise #2, that every event has to have a cause, I know to be false from my knowledge of quantum physics (I was an EE major in college back 30 years ago, so I had a fair amount of QM and some information theory).

If you relate complexity and entropy, they're not inverses of each other - complexity is an equivalent concept to entropy, isn't it? Can't you basically get Boltzman's constant by taking the log2 of the probability of something or other?

And finally, does the 2nd Law of Thermodynamics apply to the information kind of entropy? If we say that complexity is kinda like entropy, does it follow that complexity must be increasing?

"Even in the example you give, it's clear that repeating a phrase twice has the potential to provide more information than the phrase uttered once."

Has the potential, yes. In biology, too, it has the potential. (IOW, Atheistoclast may be wrong.) But Luke said "There must be some sense of the word "information" in which another copy of the same sentence does not add information." You didn't answer yes or no, whether in the realm of speaking words, or in the realm of biology. You avoided answering by demanding a "rigorous definition for discussion and debate."

You didn't answer yes or no, whether in the realm of speaking words, or in the realm of biology.

Miranda: I'm not a linguist or biologist. I'm a mathematician and computer scientist. In the standard Kolmogorov definition of information as used by mathematicians and computer scientists, and explained in dozens of papers and books, doubling a string is guaranteed to increase information infinitely often.

The technical understanding of "information" should not be confused with various vague folk understandings of the word - in the same way that the folk understanding of the word "field" has little to do with how it used in algebra or vector analysis.

It is quite unwise to overload a term like "information", which is well-understood in mathematics and computer science, to mean something entirely different from what professionals in those subjects expect.

I could be misinterpreting you, but the following two sentences appear to contradict each other, or are at least talking about different things:1) "...it's clear that repeating a phrase twice has the potential to provide more information than the phrase uttered once."2) "... doubling a string is guaranteed to increase information infinitely often.

Most times I read nonsensical arguments in philosophy or theology, the underlying fallacy is of the same sort.

Usually there are two meanings of a word floating around: a jargon meaning (either pre-existing or cooked up specially for the purpose) and an everyday meaning. Then the error consists in managing to confuse the two: introducing an example in one sense, and using as if it were an example in the other sense.

In The Blind Watchmaker, Dawkins says that a tree shedding seeds is literally raining information - it wouldn't be more true if the tree were raining floppy disks.

Do biologists use Kolmogorov information when talking about the information in the biological systems? Is it part of the biologists mathematical toolkit? Does anyone know a good review paper or book on this?

Yes, Kolmogorov information (or variants of it) is used all the time in biology. One application I know of is in constructing phylogenies. You can read Ming Li's work on the subject, probably easily findable with a google search.

Interesting. I just happened to have one of Ming Li's papers on the table beside my desk this morning. He uses Kolmogorov information as an 'information distance' when clustering homologous sequences for, say, a protein family. He defines the information distance as 'the length of the shortest binary program that is needed to transform' two sequences into each other. I wouldn't say that Kolmogorov information is used in biology 'all the time', but it is an interesting distance measure for sequence clustering. Whether Kolmogorov distance is 'meaningful' to the cell is another question (i.e., makes any structural or functional difference). For example, some sequences can have a large Kolmogorov distance from each other, yet be indistinguishable to the cell as far as structure and function goes. Another sequence may be have a single 'knock out' mutation that renders the sequence non-functional for the cell, yet have a very short Kolmogorov distance. What I'm saying is that Kolmogorov information may be 'meaningful' to us, but not necessarily to biology. Sometimes yes, sometimes no ... an indication that more work needs to be done to come up with an information measure that is more relevant to biology.

I think it unlikely that there will be a single information measure "relevant to biology". Instead, mathematicians and biologists will develop measures relevant & appropriate to the problem at hand. But when ID creationists "prove" bogus theorems about their measures and fail to admit the theorems are wrong, they do a disservice to science and mathematics.