Claude Shannon: the man who failed to transform our understanding of information

Shannon’s ‘mathematical theory’ sets out two big ideas. The first is that information is probabilistic. We should begin by grasping that information is a measure of the uncertainty we overcome, Shannon said – which we might also call surprise. What determines this uncertainty is not just the size of the symbol vocabulary, as Nyquist and Hartley thought. It’s also about the odds that any given symbol will be chosen. Take the example of a coin-toss, the simplest thing Shannon could come up with as a ‘source’ of information. A fair coin carries two choices with equal odds; we could say that such a coin, or any ‘device with two stable positions’, stores one binary digit of information. Or, using an abbreviation suggested by one of Shannon’s co-workers, we could say that it stores one bit.

But the crucial step came next. Shannon pointed out that most of our messages are not like fair coins. They are like weighted coins. A biased coin carries less than one bit of information, because the result of any flip is less surprising. Shannon illustrated the point with this graph. You see that the amount of information conveyed by our coin flip (on the y-axis) reaches its apex when the odds are 50-50, represented as 0.5 on the x-axis; but as the outcome grows more predictable in either direction depending on the size of the bias, the information carried by the coin steadily declines.

…

This is where language enters the picture as a key conceptual tool. Language is a perfect illustration of this rich interplay between predictability and surprise. We communicate with one another by making ourselves predictable, within certain limits. Put another way, the difference between random nonsense and a recognisable language is the presence of rules that reduce surprise.More.

The unfortunate reality is that the vast majority of science writers, while working within an information paradigm even as they write, write as though the Big Fix (the universe as simply matter and energy) is still just around the corner, along with the multiverse and the success of string theory. And the sure-thing tale of human evolution.

5 Responses to Claude Shannon: the man who failed to transform our understanding of information

The thing you must understand is that Mr. Shannon was originally talking about “signal” and “noise” in WW2 radar signals. Later, he worked for AT&T on digital telephone signals. Reliably sorting Signal from Noise in a VERY narrow context is only distantly related to things like human speech.

But the fact that enhancing the quality of radar reception turned out to have direct application to making general statements about human speech and writing says a lot about how humans (who of course created RADAR and digital telephones) create Information generally. That is, having already created written languages, it was simple for humans to create machines that transmit, receive, and display information of interest to humans. Sorting signal from noise is of course a thing that human children learn as they build their vocabulary. Processing signal into information in context (is the wind “blue”? or did the wind “blew”?) requires a fair piece of processing.

I think Rob Goodman’s article is valuable. He introduces Shannon’s use of “weighted” probabilities in Shannon’s approach to ‘language’—which is what actually occurs in nature.

Taking in the bigger sweep of what Shannon is described to have done in this field of language, we see a “non-randomization” of machine language. Human intelligence is used to “constrain” the permissible, and these constraints are applied to an otherwise random process.

Hence, the bigger picture Goodmans is giving us is that of ‘intelligence’ finding a way of overcoming random processes so that “information,” as defined by Shannon and accepted by the larger scientific and mathematical community, can be detected; or, to put another way, be “transferred” from one intelligent agent, to another.

E.g., if I arrange 50 bricks to form a large SOS, the constraints imposed on the bricks are not of natural, but human, origin, and has a purpose: namely, to “transfer” information from me to possible rescuers.

Shouldn’t this give us a fairly large hint at what ‘information’ should look like in general? IOW, in the case that Goodman talks about, the “bricks,” i.e., the numbers at first randomly assigned to letters, are ‘constrained’ in their sequencing based on intelligent instructions to the machine that produces the letters, said instructions taking the form of electronic messages.
[This corresponds to my arranging the bricks in such a fashion that SOS is spelled out. What if, instead, I simply took the fifty bricks and threw them randomly in the general area? What would that look like? Would this have anything to do with ‘information’? Of course not.]

(2) An element of ‘information’.[In Goodman’s example, this corresponds to the entire length of the ‘letters’]

(3) ‘Encoding’ of this information into physical objects via their sequencing. [In Goodman’s example, we see this in “u” coming after “q” and “c” before “k”, as well as in certain ‘words’ coming before others.]

(4) Use of physical constraints to “fashion” the ‘information.’ [In Goodman’s example, we see this in the electronic messaging that tells the program how to sequence. The “electronic messaging” constrains what the machine is permitted to do.]

In common day experience, this is like me taking a pencil, and writing down a message. The path of the pencil tip is ‘guided’ by my hand, and constrains the space of the paper so as to convey ‘information.’

Then, finally, (5) the ‘transfer’ of information.

Without this last step, information cannot exist as an intelligent phenomena. This should be obvious.

In fact, all of this stuff should all be fairly straightforward.

Boiling down to essentials, here is what we need to have:
TWO intelligent agents, and a ‘transfer’ must take place.

If it isn’t already obvious, can a monkey tell the difference between what Shannon originally began with in Goodman’s example, and the final product? No. It’s not intelligent in the ways that humans are: i.e., it is not conscious and free. No ‘transfer,’ then, can take place if only one side of the ‘transfer’ is intelligent.

Now look at the cell.

First, there’s the DNA. It is obviously constrained. Just ask Craig Ventnor. [DNA is a product of intelligence in the same way that the final sequence of Goodman’s ‘Shannon’ example is the product of intelligence. Does anyone want to say that the change from the beginning sequence to the final one happened by chance? No.]

But, then there’s the ‘transfer of intelligent information’ that must take place.

So, we see, let’s say, Craig Ventnor is inputting DNA sequence into his primitive bacteria (again, an intelligent ‘product’),and must ask: how is his ‘information’—already transferred from his learned mind to the nucleotide sequence he’s inputting—going to be ‘transferred’?

Quite simply: it requires another “intelligent” agent. This “intelligent agent” is the splicesome/ribosome, which translates this ‘information’ into another physical form: proteins, and other cellular products.

It is the splicesome/ribosome that is the “other intelligent agent” to which the ‘information’ in Ventnor’s stretch of DNA is ‘transferred.’ [That Craig Ventnor can communicate with ribosomes means only one thing: the ribosome is an intelligent agent.]

So, the question becomes, then, can ‘random’ mutational processes bring about some physical object capable of ‘transferring’ INTELLIGENT information to another physical object recognizing this ‘transfer.’ (BTW, all this talk about entropy as information is no more than the use of a metaphor. In the physical world, where energy is ‘transferred’ and not information, only human beings, intelligent beings can even begin to process such transfers.)

Differential pulse code modulation is one digital coding scheme that uses the novelty of information to reduce bandwidth and/or increase bit resolution. It predicts what the next symbol should be, based on a shared algorithm available to both the transmitter and receiver, and encodes only the distance (in symbol space) from the expected symbol to that actual symbol.

This concept is also the foundation of many forms of data compression. If one is compressing an image with lots of blue sky, for example, it’s a lot more efficient to represent the color information for sky pixels by encoding the color difference from a certain sky blue color value than either from white or black.

This is also how inside jokes work. A group of friends may need only to say one word to each other to impart what to an outsider would take a great deal of explaining, simply because the group of friends knows what the others are most likely thinking. Only a single word is necessary to pin it down.

Marriages can work that way too, but there at least one of the partners relishes verbal communication for purposes in addition to the mere transfer of factual “information” :-).

If we say the goal is increasing mutual information between an organism and a complex feature such as an eye, and that evolution is not directed towards producing eyes, then evolution cannot account for complex functionality such as eyes.

Say Y is the current generation and Z is the next generation of an organism.

The DPI states that if X does not tell us anything about generating Z from Y (i.e. evolution is not directed towards X),

H(Z|Y) = H(Z|Y,X),

then the mutual information between organism and X can never increase during the course of evolution,

I(X;Y) >= I(X;Z).

So, evolution, insofar as it is undirected towards complex functionality, cannot account for its occurrence. The information that allows complex functionality, such as eyes, to form must come from outside of evolution.

This is really what Dembski’s CSI metric and conservation of information boil down to. CSI is the mutual information between the event and the target, and conservation of information says that without impartation of external information (exogenous) about the target, then CSI cannot be increased. So as you can see, Dembski’s work is a reformulation of the data processing inequality.