One of the important functions fulfilled by governments is to
provide enforcement of standards, like weights and measures.
Having clear and objective measures is essential for trade.
History provides many instances of people putting rather a lot
of effort into making sure that measurements are made fairly.
I think that those who wish to make arguments whose claims are
couched in terms of quantities must provide the scale by which
those quantities are determined. If they do not do so, and it
appears that the arguments presented otherwise show certain
flaws or inconsistencies, I think it is perfectly appropriate
to classify the original claims as mistaken apologetics or
polemics.
From Lee Spetner:
LS>Thank you for forwarding me the questions that have arisen
LS>about how I define and measure genetic information. The
LS>presumption is that unless I quantify the information in a
LS>gene, I am not entitled to say, for any mutation, whether
LS>the gene gains or loses information. I reject that
LS>presumption.
The presumption being that if one asserts a claim that certain
quantities differ, that one can provide an objective measure
by which those quantities are derived. This seems a generally
reasonable sort of presumption. I might have guessed that
Spetner would reject the presumption. What remained was to
see how good an argument Spetner could muster for avoiding
quantifying information change such that everyone could
objectively do the job. I remain unconvinced that Spetner's
argument does more than show that some certain classes of
change must necessarily show decrease in information under
some relevant information measure. Spetner does not show that
the sub-class of genetic change mentioned comprises all
possible genetic change.
LS>Before addressing the issue of quantifying the information
LS>in a gene, let me point out that all the random mutations I
LS>discussed in my book (and by extension, all known mutations
LS>whose molecular structures have been examined) cannot serve
LS>as prototypes for the mutations that are supposed to make
LS>up the long series of evolutionary steps claimed by
LS>neo-Darwinists to have led to major evolutionary advances.
Spetner's opinion is noted.
LS>They cannot serve as prototypes of the mutations in the
LS>steps that are supposed to have led from a single cell to
LS>an insect, from a fish to a mammal, and so on. Most of
LS>these mutations are single-nucleotide substitutions that
LS>disable a control gene. Disabling a gene cannot be a recipe
LS>for evolutionary advance. Although sometimes, perhaps, a
LS>gene would have to be disabled in the course of evolving a
LS>new enzyme, such disabling cannot represent a major portion
LS>of what has to occur to achieve a new function. It cannot
LS>even represent a small fraction of what must occur. Most
LS>mutations in a putative series of evolutionary steps
LS>leading to a new species or a new order, class, or phylum,
LS>must add to the genome the information necessary to achieve
LS>that advance. It should be clear that information must be
LS>added to the genome to evolve a bacterium into a human, or
LS>even into a fruit fly. One who insists that it is not
LS>obvious that a human genome contains vastly more
LS>information than that of a bacteriulm is a sophist.
On the other hand, those who would argue that the human
genome contains more information than certain amoebae or
amphibians simply are not familiar with the data.
But already Spetner has introduced *meaning* into the
discussion. Information does not have any simple relationship
with meaning. Whether a point mutation disables, enables, or
re-enables a protein product makes no difference whatever
under a Shannon measure of information applied to the sequence
of base pairs. And so one possibility, that Spetner simply
utilizes a Shannon information measure as his scale, is shown
to be false.
LS>If no mutation that has been studied is of the type needed
LS>for neo-Darwinian macroevolution, then there is no
LS>molecular evidence that random mutations and natural
LS>selection can achieve that evolution.
Even if one credulously assumes the premise is true, this is a
non sequitur. Research can (and does) uncover the condition
of linkage disequilibrium, which is indicative of the action
of natural selection, although identifying the mutation or
type of mutation which led to that condition is a separate and
tougher problem. The molecular evidence for natural
selection, though, exists with or without that identification.
Nor do I accept that no such mutations have been studied.
LS>Sure, we know many single-nucleotide substitutions that can
LS>lead to microevolution. But there is no argument about
LS>microevolution. My argument is against the premise that
LS>random mutation, even with the help of natural selection,
LS>is the driving force behind an evolutionary advance from a
LS>primitive cell to human beings. There is no genetic
LS>evidence for such a premise.
A relevant question is whether Spetner can admit that any
such evidence is possible. If no amount of evidence taken
from modern organisms can be accepted by Spetner as having
relevance to the issue, then it would seem that the claim,
while sounding portentous, means nothing.
But I'm looking at a claim at a different level. It has been
said here, with the invocation of Spetner as an authority,
that no mutation can in principle increase genetic
information. It is this claim that most clearly needs the
quantification method specified so that we can resolve whether
to consider it true (supported by the available evidence),
false (contradicted by evidence), or rhetorical (evidence is
superfluous to acceptance of the statement).
[Quote]
"Information theory, which was introduced as a discipline
about half a century ago by Claude Shannon, has thrown new
light on this problem. It turns out that random variation
cannot lead to large evolutionary changes. The information
required for large-scale evolution cannot come from random
variations. There are cases in which random mutations do lead
to evolution on a small scale. But it turns out that, in these
instances, no information is added to the organism. Most
often, information is lost. A process that adds no heritable
information into the organism cannot lead to the grand
evolutionary advances envisioned by the neo-Darwinians."
(Spetner, 1997 p.vii)
[End Quote - SE Jones quoting Lee Spetner, post on 1998/06/18]
I find the invocation of Shannon in this manner amusing,
since his analysis cited all sorts of stochastic sources
as producing information.
LS>I submit that one need not measure the information in a
LS>gene to know if a particular mutation has added or
LS>subtracted information.
I submit that if one wants to make a claim about comparing
quantities, one had better provide a means of measurement.
LS>There is no general way of measuring the information in a
LS>single message without relating it to the ensemble of
LS>messages from which it was chosen.
If one's task is to precisely determine the absolute number of
bits of information in a message such that this number does
not change, then Spetner's comment is both relevant and
correct. But our task is to determine whether a change in
some message causes an information increase, decrease, or no
change. For this purpose, one does not need to have an
absolute measure of information; a relative measure may serve
quite well, even if we have to specify a number of assumptions
in order to apply it. Since both our original analyzand and
the changed version represent messages taken from the same
ensemble of messages (or ergodic source to use Shannon's
terminology), we could specify a minimal ensemble that covers
production of both messages as a basis for measuring their
information content. We then find the information content of
each message relative to this minimal ensemble, and can then
compare the numbers, which tell us which of the two messages
has the greater information content. Of course, this does not
tell us in some absolute sense how many bits of information
each message contains, but that data was not necessary to
resolve our problem of interest.
If we have knowledge of the ensembles of possible messages,
then H is determined with respect to the known statistical
structure of the ensemble. That statistical structure
concerns the probability of each symbol, and sequence
probabilities where the prior symbols in the sequence are
correlated with following symbols with characteristic
probabilities. But this statistical structure is a finding of
empirical research in the biological cases that we are
discussing, and thus results from a process of discovery. Our
initial state is ignorance of the statistical properties of
possible ensembles. But by Shannon's definition of an ergodic
source, we know that the statistical structure will be
reflected to some degree in every message generated by it. We
can characterize the ensemble by the properties of the
messages taken from it. If Spetner's measure of information
does not retain this property, that is an important and
telling datum.
If despite what I pose above, Spetner finds that even a relative
information measure is inappropriate, then I submit that broad
claims about whether mutational processes increase or decrease
information are also inappropriate, as no suitable metric can
be applied in general as a basis for making the claim.
LS>Similarly, there is no general way of measuring the entropy
LS>in a single message without relating it to the ensemble of
LS>messages of which it is a member.
So, under a Spetnerian information measure, no analysis can
occur without the complete knowledge of the ensemble of
messages from which some particular message of interest is
taken. This is just about completely inverted from Shannon's
analysis, where the properties of the message approach the
properties of the ensemble in the limit as the message length
approaches infinity. The longer the message, the less likely
it is that the statistical distribution of symbols within it
might differ significantly from those of the ensemble of
possible messages. (And for an infinite-length message, that
likelihood is demonstrably *zero* under Shannon's analysis.)
If Spetner's measure of information does not retain this
property whereby sources can be characterized mathematically
from the contents of messages, then it is highly questionable
whether it ought to be applied to any real-world situation
where our knowledge of such sources is the result of a process
of discovery. Specifically, it should not be applied to
genetic information if it does not have this property of
Shannon's measure.
LS>Shannon was careful to avoid relating the information
LS>measure he was defining to the meaning contained in a
LS>message. The communication engineer must build a
LS>communication channel that will faithfully transmit a
LS>message regardless of how much meaning the customer
LS>attaches to that message.
Yes, the distinction that Shannon made between information and
meaning is one clue that what Spetner means by "information"
is not what Shannon meant by "information".
[Quote]
"You can easily add symbols to a message and not add
information: just add random symbols. Then you won't be adding
information - you'll be adding only nonsense. Similarly, if
you add random nucleotides to the genome you add no
information. Symbols without meaning carry no information."
(Spetner, 1997 p83)
[End Quote - SE Jones quoting Lee Spetner, post on 1998/06/18]
LS>There is no adequate definition of the information in a
LS>message without relating it to the ensemble of messages
LS>that could have been sent.
This is a statement that there is no adequate definition of
information that would yield an *absolute* measure that was
not subject to change if our knowledge of the ensemble
changed.
But the lack of an *absolute* information measure may not be a
reason to avoid using some information measure to *compare*
two messages taken from the same ensemble, as described above.
If lack of this absolute information measure is asserted to
apply regardless, then claims about quantities of information
before and after mutation should be modified to reflect the
subjective and speculative nature of the claim.
LS>Thus I cannot expect to measure the information in an
LS>arbitrary paragraph of English text. Nor can I expect to
LS>measure the information in a section of a genome.
Spetner is overlooking the possibility of quantifying
information for comparative purposes. We can get a relative
information measure and thus objectively determine whether
Paragraph A contains more, less, or the same amount of
information as Paragraph A' which is Paragraph A after
alteration under a specified set of assumptions. And we can
do the same for two sections of a genome, one of which
represents a change from the other.
See
for a brief discussion of information entropy as applied to
sections of genomes.
LS>But whatever the information in a paragraph of text, if I
LS>struck out one or more sentences, I can be sure that I have
LS>not increased the information. Rather, I can confidently
LS>say that I have decreased the information.
This is true under a Shannon measure. The number of bits in
a message increases with the length of the message, and thus
a deletion reduces the number of bits.
But without seeing Spetner's information measure, it is
impossible to know whether information goes up, down, or
sideways with deletions when it is employed. It is always
possible that Spetner intended and believed his equation for
information to make information decrease with decreasing
message length, but that it might not actually do so. Until
Spetner produces the equation, this point is in doubt.
LS>(I exclude the case in which the paragraph was nonsense and
LS>didn't contain any information to begin with. In such a
LS>case the information was zero both before and after I
LS>struck out the sentences.)
Again, I have to wonder what information measure Spetner uses.
Under Shannon, information and meaning are separate concerns.
But Spetner's equation of "nonsense" and lack of information
content conflates and confuses meaning with information.
Under Shannon, those messages with zero information content
all share the property of being a message comprised of one
symbol repeated over and over again. Those messages with
maximal information content are those with the property of
equal proportions of each symbol within the message. This has
nothing whatever to do with the meaning that is taken from the
message. Once one introduces meaning into the discussion
(other than to exclude it from further discussion), it is not
clear that information theory is still being discussed.
LS>This example shows that indeed one can sometimes determine
LS>whether a change in a message has decreased the information
LS>without having quantified the information of the original
LS>message.
Only by knowing the properties of a relevant information
measure can this assertion be made. The assertion is
consistent with the properties of a Shannon information
measure. I don't know what the Spetner information measure
looks like, except that it cannot be the Shannon measure.
By symmetry, though, one can by Spetner's argument also
determine that certain changes necessarily increase
information without having a quantification in hand.
LS>I hold that the disabling of a genetic function is a
LS>decrease in information.
This is not necessarily so. Even if we allow meaning to be
conflated with information, it would depend upon the manner in
which disabling occurs.
Spetner has discussed the case in which a point mutation
causes a disabling of the function of the protein product.
This introduces an argument which is premised upon meaning
rather than information. A change of base due to a point
mutation does not necessarily change message length, and may
only alter the information content by some small fraction of
the total number of bits needed to represent the entire
allele. A point mutation is only occasionally going to alter
the information content of any biological allele by more than
just a few bits (and in those cases it will be due to changing
the allele length by altering some codon such that it either
ceases to be a stop codon or becomes a stop codon), even if it
rather drastically alters the function one obtains from its
protein product. Disabling a genetic function may thus, even
in the case of a point mutation, be associated with a decrease
in information (e.g., truncation of allele due to change to an
early stop codon or decrease in heterogeneity of bases within
the allele), no change in information (e.g., change of codon
in a critical region to code for an amino acid that reduces or
eliminates function but where the base changed to has the same
frequency of occurrence as the original base), or an increase
in information (e.g., change of the stop codon to another
codon effectively lengthening the allele or an increase in
heterogeneity of bases within the allele).
Another class of cases can be brought up, where a change in
another gene causes the production of a protein product which
acts to inhibit the function of another protein product, or
which causes another gene to be repressed. In these cases,
the inhibition of function of one protein or the repression of
a gene is actually the function of another protein, and thus
represents an increase in information.
LS>Disabling a repressor gene is a decrease of
LS>information.
It depends upon how it is done, as explained above. Take an
analogy from neurobiological modeling. A stable neural
circuit is much easier to implement when one has both
excitatory and inhibitory synapses to work with. The addition
of inhibitory synapses to an unstable circuit can stabilize
it. Function improves, and the underlying complexity of the
neural structure also increases. This kind of inhibition
represents an information increase, not a decrease. This can
be contrasted to increasing the effect of already existing
inhibition in such a circuit by the expedient of removing
excitatory synapses, which would correspond to the "striking
out" mentioned by Spetner before.
LS>It's like striking out a sentence in a paragraph.
OK, let's see how this goes.
Paragraph 1:
"Collect eggs. Warm skillet. Add pat of butter to skillet.
Break eggs into skillet. Season with salt and pepper. Turn
eggs once. Throw skillet at ceiling, swallow eggshells. Cook
until whites have light brown color at edges. Serve."
Paragraph 2: "Collect eggs. Warm skillet. Add pat of butter
to skillet. Break eggs into skillet. Season with salt and
pepper. Turn eggs once. Cook until whites have light brown
color at edges. Serve."
Paragraph 3: "Collect eggs. Warm skillet. Add pat of butter
to skillet. Break eggs into skillet. Season with salt and
pepper. Turn eggs once. Ignore the next sentence. Throw
skillet at ceiling, swallow eggshells. Cook until whites have
light brown color at edges. Serve."
Which paragraph has the most information? By Shannon, it is
number three. Paragraph number two has the least information,
and paragraph one has an intermediate amount of information.
Spetner's analysis is like looking at the change from P1 to P2
while ignoring the fact that changes like going from P1 to P3
can happen as well.
LS>The strikeout might be improve the readability of the text,
LS>but it is not an addition of information. Certainly, one
LS>cannot write a book by starting with a few paragraphs and
LS>blue-penciling them. One might improve those paragraphs
LS>(analogous to microevolution), but one could never produce
LS>a book that way (analogous to macroevolution).
Fortunately, "strikeouts" are not the only sorts of genetic
changes that have been documented. Deletions of genomic
content do happen, to be sure, but this is not the same thing
as a point mutation altering an allele. A point mutation is
conceptually quite different from a deletion. It is an
substitution of one base with a different base. The length of
the genetic segment in question quite commonly remains the
same after the point mutation as it was before (where the
genetic segment is defined as continuing until astop or
nonsense codon is reached). The *effect* of a point mutation
can sometimes be the loss of function of the protein product,
but that is a question of meaning, not information.
LS>This analogy applies to mutations like the disabling of a
LS>repressor gene (which can cause the overproduction of an
LS>enzyme) or degrading the specificity of an enzyme (which could
LS>increase the enzyme's activity on some other substrate), even
LS>though such mutations might be beneficial under special
LS>circumstances.
And again this objection has more to do with meaning than
with information.
LS>Neo-Darwinian macroevolution is supposed to proceed by
LS>getting rare lucky mutations, one after another, each
LS>installed in the population by natural selection.
Simple models look at serial changes, but there is nothing
in the theory to exclude alleles being acted upon in parallel.
I would suggest reading Sewall Wright with reference to
"interaction systems".
LS>Single isolated adaptive mutations of the types that have
LS>been found are not sufficient. Eventually some real
LS>information has to be added to achieve macroevolution. The
LS>classic scenario of the neo-Darwinist is to duplicate a
LS>gene and then have it evolve without losing the function of
LS>the original gene.
I think it fairer to say that the duplicate originally codes
for a protein product which has the same function as the
original. Once duplicated, there is no necessity that
it continue to retain the original function.
LS>The duplicate might first lose some of its function, but
LS>then it has to build up something new.
Or, given that the original copy provides sufficient function,
it could simply drift.
LS>To use our example of reducing the specificity of the gene,
LS>it might be beneficial first to reduce specificity so as to
LS>grant the enzyme some activity on a new substrate. But that
LS>can be only the beginning. The second job is to have random
LS>mutations increase the specificity of the enzyme for the
LS>new substrate. The first is easy and can be done
LS>quickly. The second is much harder, and we have no evidence
LS>that it has ever occurred, in spite of the necessity for
LS>orders of magnitude more of this kind of mutation than for
LS>those of the type that disable a gene.
Really? What about nylon-eating bacteria? On what grounds
is this example excluded as evidence of the sort Spetner
would want to see?
There are known mutations that definitely result in increases
in information under a Shannon measure. Autotetraploid
speciation in orchids is pretty common. (One can browse the
orchid society lists of species and note how often the
attribute "tetraploid" is applied.) Tetraploidy is the
condition in which the chromosome count doubles, usually by a
failure to divide somewhere in gametogenesis. Shannon's
information measure gives lower values for shorter messages,
but higher values for longer messages. The doubling of
genomic content yields a larger Shannon value, and one can see
this occurs by Spetner's own argument that points this out for
the purpose of discussing deletions leading to lower values.
That's on the information side of things. What about meaning,
since Spetner seems to care about that? Well, tetraploid
orchid daughter species are typically larger and have more
robust structures than those of the parent species. This
means that the argument that doubling the same material
doesn't change the information content is simply (and visibly)
counterfactual. The fact is that morphology of the daughter
species changes, and it changes because of the change in
genome. The change in genome is well-characterized. The
change is that there are now two copies of every locus where
before there was one. By both a technical measure of
information (Shannon's) and by a more casual and common-sense
measure that incorporates discussion of meaning,
autotetraploid speciation in orchids where the parent and
daughter species differ in morphology represents an increase
in information.
Wesley