DECONSTRUCTING RECONSTRUCTION:
WHEN ARE THE RESULTS OF PARSIMONY AND
STATISTICAL PHYLOGENETIC ANALYSIS GENUINE ADVANCES?

Bryology Seminar, Missouri
Botanical Garden

Richard H. Zander, 19 Feb. 1999
(Revised Feb. 12, 2001)

INTRODUCTION

This work is intended for the average systematist with a cautious interest
in phylogenetics or the phylogeneticist with concern for reliability. The
past 30 years have seen the publication of many exact—even if sometimes
billed as "not well supported"—phylogenetic solutions. The
publication of an exact solution when not well supported involves the high
probability of Type I errors, which is the acceptance, even temporarily, of a
wrong phylogenetic hypothesis (e.g., a particular internal branch, subclade,
or tree) because the null hypothesis of no support is, wrongly, considered
false. The philosophy that supports this practice is hypothetico-deductivism
(or falsificationism), which accepts a theory if it resists falsification.
This is opposed to verificationism, which rejects a theory that is not
supported by numerous observed instances.

It might be said that the two philosophical attitudes are pragmatically
equivalent, since one chooses for use those theories that are fruitful.
predictive in practice, and promote understanding, whether supported
hypothetico-deductively or by verification. The problem arrives with
retrodiction (or postdiction), the reconstruction of one-time past evolutionary
events that cannot be directly observed. Any internal branch in a rooted tree
has three possible arrangements of the two terminal and one basal branch (=
"nearest neighbor interchange"). Given that evolution happened and
limiting our attention to only those three lineages, only one arrangement is
true. Thus, there is twice the chance of a Type I error (acceptance of a
false phylogenetic hypothesis) than a Type II error (the rejection of a true
phylogenetic hypothesis because one wrongly decided the null—of no support
for the optimal tree—was true) for each branch. Clearly,
hypothetico-deductivism is a philosophy that allows or even encourages
publication of exact solutions with their concomitant Type I errors, no
matter how poorly supported.

The relatively new microcomputer-based ability to analyze massive amounts
of data in developing exact solutions has been encouraging to systematists.
Phenetics, however, has been something of a dead end. Cladistics, in modeling
evolution (yes, it does . . . that is its attraction), is a popular way of
analyzing data, but statistical analysis and philosophy-based parsimony
analysis are generally considered oil and water. Statistics is the spine of
science. The rejection of statistics by most past and many present-day phylogeneticists
is a major problem, but "statistical phylogenetics" has its own
problems.

How did this situation come about? Is there a solution? This work is a
reprise of presentation and criticism at a Missouri Botanical Garden Bryology
Seminar (well attended by non-bryologists) of some of the philosophical bases
or at least tendentious arguments for present-day cladism and statistical
estimation of phylogenetic relationships. There is also a handout.

THE TALK

Botanists have, in the past, little familiarity with statistics. When I
was introduced to botany, the symbol for any number greater than 64 was:

This is a working method of calculating the probability that a particular
evolutionary scenario happened, given certain information and certain
assumptions. For non-adepts, here are some simple examples of calculating
probabilities:

PROBLEM:
You have a coin (a two-sided die) and a 20-sided die.
The coin has a "1" and "2" on its sides; the 20-sided die
has a "1" and numbers up to "20" on its sides.
Someone tosses randomly one or the other until a "1" comes up.
What is your chance of selecting which object generated the data set
"1"?

ANSWERS:Parsimony: It is easier to generate a "1" with the coin than
with the die, and the coin is the simplest explanation.

Bayesian Estimation: One follows the dictum that the probability of
an hypothesis is the same as the probability of the data set, given that
hypothesis. Assuming prior probabilities are uniform for both coin and die
(fair throws and no loading), these simplified formulas apply:

Coin probability:

20-sided die probability:

Conclusion: The coin has a posterior probability of 0.91 of generating the
data set "1", while for the 20-sided die, the probability is only
0.09. The coin is ten times more likely to have generated the "1".

Classical Frequency-Based Statistics: Statistics based on single
throws are meaningless.

MAXIMUM LIKELIHOOD AND THE ABOVE EXAMPLE

Modern phylogenetic analysis places much emphasis on maximum likelihood as
a means of reconstructing evolution based on DNA sequence data. There are
problems with this, however.

The coin's likelihood of generating the data set "1" is 1/2.
This is greater than 1/20, and is the hypothesis of maximum likelihood. A
measure of statistical support for this solution is the difference between
likelihoods, with the likelihoods usually expressed as natural logarithms.

ln 1/2 = -0.59
ln 1/20 = -3.00

The difference = 2.31, which is called the likelihood ratio. This
difference is 10 times in non-logarithmic terms: ln 10 = 2.31.

So the coin is ten times more likely than the 20-sided die to generate the
data set "1". Using natural logarithms is overkill here but convenient
when dealing with the very, very small likelihoods of particular tree
topologies.

We will get back to maximum likelihood in a while, but first let's look
at:

CLADISTICS

Although cladistics offers "parsimony," "simplicity,"
and "best explanation" as logical bases for choosing the shortest
tree, these are only admissible as sole justification when there are clear
and few alternatives, and there is no possible great loss involved in
accepting the simplest solution. Wittgenstein has pointed out that simplicity
has no logical justification:

It is evident that accepting any exact solution leads to Type I errors
(accepting as right a wrong phylogenetic hypothesis). One can always choose,
however, not to choose when there is not far more evidence for than against.
The key, of course, are the words "in the absence of contrary
evidence" in the above quote. There is almost always contrary evidence,
usually not detailed or emphasized in press.

Cladistics also implies that a cladogram is a "discovered" nest
of relationships, something quite real in nature. But even Karl Popper, a
promoter of hypothetico-deductivism, was leery of realism:

Philosophers of science commonly urge particular logical or at least
rhetorically impressive criteria for choosing among competing theories, but
seldom suffer from any wrong scientific choices they make, since . . . they
don't make them. Those who actually make such choices can affect the quality
of their science with Type I errors unless a reliability measure is a
standard part of a study.

Salmon is another philosopher of science who championed simplicity and
best explanations:

Salmon (as well as Popper and others) is generally speaking of the
statistics of single events (which are usually dealt with in Bayesian
statistics by non-philosophers), such as reconstructing a past event. Salmon
also encouraged seeing an increase in probability with additional data
as indicating a correct choice even though the probability remains small.

Cladists and others who espouse the doctrine of "approximating"
truth with best explanations also strongly advocate the philosophy of realism.
Scientists are mostly realists, in that we believe that there really are
"things out there" that are sampled in our collections, and are
represented in out data sets, and are modeled in our theories. There is a
significant problem with "approximating" truth, however, that can
blind us when there is more summed evidence against than for a "best
theory," or there exists a well supported second-best alternative. There
is also the additional problem that a scientist can easily mistake a theory
for a reality.

So when is there enough evidence for one hypothesis that one can
reject another, or all others?

Consider the following cladogram:

The data set is given on the left. If there are five (advanced) traits
shared by A and B, and none by B and C or A and C, then there is no
contradiction in the data set.

How about:

Here B and C share one character ("1"). Thus we have 4
characters for and 1 against the above optimal cladogram ((AB)C). If all
shared characters are equal evidence for phylogenetic relationship, then the
researcher has a 4:1 chance of being right in selecting this cladogram rather
than ((BC)A).

How about:

Now A and B share 5 traits ( numbers 1-5) but A and C share 4 traits
(number 6-9). Though ((AB)C) is a "best" explanation, the chance of
selecting the correct cladogram is 5:4. The cladogram on the left above is
one that philosophical cladists might term "poorly supported." Now,
is it poorly supported or actually little more than the result of flipping a
coin?

This example applies to one internal branch connecting A & B as the
cladogram ((AB)C) and its two alternatives ((AC)B) and ((BC)A). But
cladograms have generally many internal branches. Consider:

Suppose these were connected into a big cladogram, with A, B and C
representing different lineages attached to each internal branch. If each
optimal internal branch were selected as correct with a 5:4 (or 5 out of 9)
chance, then the whole cladogram has a chance of being correct of 5/9 to the
sixth power. This is a miniscule chance of being totally correct though the
cladogram still manages to meet the optimality criteria of most parsimonious,
best explanation, approximating the truth, and so on.

If this was all there was to it, there would be no problem. We would tend
to reject all cladograms as improbable. One can, however, make probabilistic
or otherwise assured theories about relationships: there are the class of
relationships called "uncontested groups" that no one would
dispute. Extreme example: cows and horses are more closely related to each
other than either is to the sponge. Most cladograms have support (by some
measure) somewhere in between that of uncontested groups and arrangements
more akin to games of chance. But how to tell what support any one cladogram
really has? [Note: I present an exact method of gauging support for each
internal branch in issue 3 of Systematic Biology, 2001. See citation in the
bibliography of this talk's handout.]

Note: What about Bremer support (= decay index) and subsampling
(bootstrapping and jackknifing)? These are commonly used measures of branch
support. The reader is referred to the handout,
especially citations of Oxelman et al. (1999), Rice & al. (1997) and Yee
(2000) for disconcerting criticisms of these support values.

"Corroboration" has been a watchword in cladistics, but it is
commonly used when only congruence is meant:

CORROBORATION is significantly increasing
support, or maintaining the very high probability of one tree, with
additional data, and ultimately implies "no reasonable doubt."

CONGRUENCE is the same level of support of both
for and against a hypothesis.

CONSILIENCE is congruence of data sets produced
with somewhat different natural processes, such as data on morphology and
molecular analysis.

In "Congruence and corroboration" above, two data sets (of
advanced traits) corroborate the conclusion that A & B are more closely
related to each other than they are to C. But in "Congruence and no
corroboration," the two data sets are congruent, but do not corroborate
the hypothesis (there remains equal evidence for and against the optimal
solution.

BUT NOW, consider a data set with 6 characters shared by A & B, and 4
other characters shared by A & C; the optimal tree associates A & B
as terminal groups. A second data set has also 6 characters supporting the A
& B lineage, but 5 other characters supporting A & C. In this case,
the second data set corroborates the lineage A & C and falsifies
A & B, even though A & B continue to be the terminal groups in the
optimal tree. In my opinion, discussing congruence, corroboration and
falsification is splitting hairs, because no exact and reliable solution is
supported. A & B and A & C have nearly similar support and one might
as well toss a coin. Refusing to choose the exact, optimal tree may or may
not be a Type II error (rejecting the optimal tree when it is correct), but
it is a fail safe solution for the researcher, students and scientists in
other fields looking for reliable estimates of phylogenetic relationships.

QUESTION: If two consilient data sets produce
the same shortest tree, even though that one tree is poorly supported, surely
that cannot be rejected as random or sampling error?

ANSWER: Congruence supports all
reasonable trees. The same two data sets can together support two or more
different hypotheses, and these can be contradictory.

In a treatment of the taxonomy of giant lobelias in Africa, using
chloroplast DNA, researchers got the following optimization:

Note that branch length, though uncomfortably short (for gene trees) in
many branches, is often about the same as the decay index (= Bremer support).
This means that there is little conflicting evidence. In other branches,
however, Bremer support is rather less than the branch length. This implies
the existance of a contrary alternative branch of considerable length (the
branch length minus the Bremer support). In addition, it is well known that
what one gene gives, another takes away, since gene histories often conflict
(lineage sorting and the like). Check the following
conclusion of these authors:

In this case, chloroplast DNA gene history is presented as equivalent to
species evolutionary history. Out of context, this is a beautiful and
compelling illustration, sure to grace the pages of a textbook someday. There
are some qualifying words in the text, yet the potential for Type I error
(accepting a wrong hypothesis as true) is large not only for the authors of
this paper, but for fellow evolutionary phylogeneticists, students, teachers
and for scientists who might use these results to guide conclusions in
research efforts in other fields. The picture is, of course, not all wrong
and there is good support for many lineages, but how is the ambiguity
presented? In the interpretation of the interrelationships of branch length
and Bremer support, something not all readers will bother to do, and in the
context, familiar to phylogenetic experts but scarcely all readers, of the
multifarious problems and assumptions involved (see handout).

Let's combine two different data sets on a different group of taxa. Here
are the published cladograms resulting from:

Did conflicting data cancel each other out? It is well known that one can
get nice, sometimes well supported trees from totally random data. (Note: I
wrote a little DOS program that generates random data sets for those who want
to experiment. Write me at rzander@sciencebuff.org
for a copy.) My own take on this is that if one uses optimality criteria,
there are almost always "best" results, no matter how data is
combined. Chance will increase support for some results, decrease it for
others. This is especial problematic when differential lineage sorting
(different gene histories as mentioned above) is ignored.

When combining data sets, a "multiple tests" statistical problem
arises. A statistician can manipulate data in a myriad ways, looking for a
"significant" result, i.e. something that meets or exceeds a
pre-established confidence level. Correction for multiple tests is commonly
done (using Bonferroni correction) by dividing alpha (your toleration for
Type I errors) by the number of tests. Thus, some branches will be found to
be better supported with combined data sets, but that increase in support
must be lowered by some correction factor. Again, researchers who tend to
tolerate Type I errors also tend to ignore contrary evidence.

Conflicting data have been thrown out if the authors decide that, for
instance, only two particular data sets out of the three available combine to
make a nice composite cladogram, and that these are sufficient. Note in the
following that the authors feel it appropriate to use the ITS data set only
to address intragroup relationships . . . clearly a decision made to view
exact results as "approximating truth."

It is possible, of course, to throw out data sets for all kinds of good
reasons (wrong rate of evolution, tracking a demonstrably divergent gene
history, sample error, and so on). When such reasons are not given, or are
specious, the reader must be cautious about any exact solution.

The principle of total evidence is commonly cited as the justification for
combining data sets:

Why not use all evidence? My answer is that only data sets produced by the
same process can be logically combined. Thus, more data about a particular
gene history is good, but one cannot combine data about different gene histories
unless an impressive theory is available that explains what happens when gene
histories are somehow averaged to get a species phylogeny, and such a theory
is not available. Given the prevalence of conflicting gene histories, a large
number of data sets is probably needed to distinguish those that track
species history (as noted by Nei and others, see handout).

Also, optimality methods look only at the positive side of the results of
combining all evidence. If one actually weighs the evidence both for and
against a solution (one particular solution chosen beforehand) and
if support increases, then using total evidence can be okay. Choosing one
solution after combining data sets is a multiple tests problem.

Now:

MAXIMUM LIKELIHOOD

This is the method of hope of statistical phylogeneticists (versus the
philosophy-based analytical procedures of cladists).

Simple example:
You have a 4-sided die and a 6-sided die. You roll them randomly and look for
the first to generate a "1". The likelihoods for each die of
generating the data set "1" are:

ln 1/4 = -1.38
ln 1/6 = -1.79 (difference between 1/4 &
1/6 = 0.41)

Support for the 4-sided die is 1.5 times the second most likely solution
for generating the data set "1". That's the likelihood ratio.

[Note added February 24, 2001:]

BOTH PARSIMONY AND MAXIMUM LIKELIHOOD PRODUCES EXACT SOLUTIONS

A preference for an exact solution gets you something for nothing. Even a
bush (a multifurcating tree) is an exact solution that means nothing unless
the lineages below it (though not shown) are very well supported (so the
branches cannot be thought of as possibly positioned elsewhere).

Example: You suspect a coin is loaded. You toss it 100 times. It comes up
50 tails and 50 heads (a bush). Your best answer is that it is not loaded. A
different coin that you check for loading comes up 51 tails and 49 heads, so
your best answer is that it is loaded. Yet . . . the answers are
scientifically, if statistics means anything, equivalent. The null hypothesis
of not being loaded cannot be rejected, and you are left with nothing. With
most cladograms, the null hypothesis of no phylogenetic loading cannot be
rejected by the data presented (when looking at an entire tree of many taxa).

But! What about support values? The decay index is always relative to the
length of the branch; thus a branch with length 40 and decay index 10 may
have one or two alternative branches (nearest neighbor interchange) that are
up to 30 steps in length. Bootstrapping is a wonderful tool (an analog of
exact binomial calculations) that gives a good basis for evaluating
pre-selected confidence levels but bootstrapping is calculated by examining
whole trees, and homoplasy (which is rampant in most molecular data sets)
affects it (undoubtedly lowering its calculated value). Also, bootstrapping
cannot deal with the problem of two conflicting apparent phylogenetic signals
(e.g. when one of the alternative branches calculated after nearest neighbor
interchange is nearly as well supported as the optimum branch and both are
significantly longer than the shortest branch).

On the other hand, what about the case when longer trees are much
less well supported . . . does this mean they can be ignored? Statistics is
the spine of science. Consider this example: you have a chicken yard. There
is a big chicken and 50 little chicks (each one dyed a different Easter egg
color) in the yard. You toss a kernel of corn into the yard and glance away,
and fttt it was eaten. Which bird ate the kernel? You toss more kernels
randomly and find from the data set you compile that the big chicken is 50
times more likely to eat a kernel than any chick, and each chick is about as
likely as any other chick to eat a kernel.

Maximum likelihood analysis would say that the big chicken ate the
original kernel with a likelihood ratio of 50! (i.e., comparing likelihoods
of the hypothesis of maximum likelihood and the secondmost likely.) Wow!
However . . . all the birds contributed to the data set, and any bird that
contributed to the data set cannot be ignored, can it? Therefore the chance
of the big chicken eating the original kernel was 50%. (Maximum likelihood
gives you something for nothing if you trust in likelihood ratios and you
have more than two possible hypotheses.)

But! Note that no one chick (alternative hypothesis) had a likelihood
anywhere as high as that of the big chicken! What is the chance we can
eliminate the likelihoods of the chicks as irrelevant and just too small to
matter? We can't, because they all contributed to the data set, and only if
we can eliminate them from the data set can we eliminate their summed
probabilities (summing to 50%). And there is no empirically based theory that
will allow us to do so (or to eliminate long trees, since these also must be
considered as contributing to a cladistic data set since any one of them
could have been solely responsible for it).

But! What is the chance that a 50:1:1:1...(50 ones)% probability
distribution would happen by chance alone? Well, the distribution of
likelihoods is not a data set of observations (not a sample), and we can't do
chi-squared or other non-parametric analyses on these. This probability
distribution would be approximately the same every time you created a data
set with these birds.

The situation with cladograms is worse than this extreme example because
there is doubtless no sharp distinction between the likelihood of the
shortest tree and that of the the secondmost short tree and the thirdmost and
the fourthmost, etc. (unless we have very, very few taxa in the data set).

Therefore we really can get something for nothing, but not only chickens
will squawk. An exact solution is publishable through the magic of the
philosophy of parsimony, even though there are doubtless . . . doubtless many
almost as well supported alternative trees. I limit this comment to trees of
many taxa. Four-taxon trees are a special case and non-parametric tests of
support are possible.

Since cladistic and maximum likelihood analyses are optimizations, of
course they approximate general intuitive evaluations of phylogenetic
relationships (e.g., "uncontested groups"). However, the special
qualifications for respect and attention of these methods of phylogenetic
analysis is that they are more exact than intuition. I submit that such
greater precision is larely artifacts of philosophy, rhetoric and statistical
gobbledegook. I'm sure that somewhere in published exact results there is
greater precision and as such it is an advance in knowledge, but it is very
hard to tell such an advance from nonsense.

From the above discussion, you can estimate my opinion of efforts in
creating a Phylocode as a substitute or even as an alternative for the
flexible-though-imperfect standard codes we have now.

A SPECIAL PROBLEM WITH MAXIMUM LIKELIHOOD ANALYSES IN PHYLOGENETIC
ANALYSIS

Note: If you see in a paper that support (of a solution against the second
likeliest solution) is -ln = 5000.0 versus -ln = 5002.0, this can be
interpreted as a difference of ln 2. Now ln 2 = e to the second power. If e =
2.7, then e squared = 7.4. So the solution of maximum likelihood (5000.0) should
be 7.4 times as likely as the second most likely solution (5002.0).

But actual maximum likelihood analysis with sequence data cannot use
likelihood ratios to measure relative support for one tree over another. This
is because likelihoods of nucleotides are maximized on each topology, and
each topology is a different model (see handout for
relevant citations).

A really simple example:

You have a coin (labeled "1" and "2" on its sides), a
4-sided die, a 6-sided die and a 20-sided die.

Q: If these four are each randomly selected and
thrown randomly, which one has maximum likelihood of turning up a
"1"?

A: The coin, with a likelihood of 1/2 and a
likelihood ratio of 1/2 minus 1/4, or .5 (the coin is twice as likely as the
4-sided die for generating the data set "1").

BUT
Suppose you had the same four objects, and the coin and 6-sided die were in
box A, and the 4-sided die and the 20-sided die were in box B. Which Box
would have the best chance of generating a "1" if the objects were
again randomly selected and randomly thrown? Box A. But what is the support?
One cannot simply compare the likelihoods of the coin and the 4-sided die to
get a likelihood ratio. Instead, the likelihoods of all the objects need to
be taken into account:

Box A

Box B

ln 0.69 = -0.37; ln 0.31 = -1.17; difference = 0.8; ln 2.2 = 0.8

SO: Box A is 2.2 times as likely as Box B in generating data set
"1", and 2.2 is the support for Box as being the solution. This is
an impossible calculation for sequence data sets involving more than a very
few taxa.

Monte Carlo sampling has been used in analyzing the relationships of
groups of many taxa.

FULL BAYESIAN ESTIMATION AND MARKOV CHAIN MONTE CARLO STUDIES

This is a way of using nucleotide sequence data from many taxa to evaluate
posterior probabilities of possible phylogenetic trees. It is less favored by
statistical phylogeneticists, largely because this method is new and the
software is not yet commonly available in an easy-to-use format. There are
major problems with this method, however.

Bayesian analysis is the mathematically the most complex and abstruse of
phylogenetic statistical methods. Most statistical textbooks give a good
account of Bayes' Theorem, which is fairly simple. Very, very simply, with
Bayes' theorem one takes the likelihood of the data for a hypothesis (times
any prior probability), divides it by the sum of the likelihoods of the data
for all possible hypotheses (times any prior probabilities), and this equals
the "posterior probability" (or the probability of the hypothesis
being true). We did this above, with boxes A & B, with the assumption of
uniform prior probabilities (no loading and fair throws).

Bayesian analysis concerns the probability of single events, which are
figures that some statisticians do not believe in as "real"
probabilities (long-run frequency statistics can have very accurate
predictions, not so with single events). But Bayesians statisticians win
their bets on single events in the long run. The trouble is that science cannot
tolerate solutions that are correct only a little more than half the time;
that wins in the long run only in a casino.

Because of its complexity, straight Bayesian analysis of phylogenetic
relationships is computationally doable for only a very few taxa:

With larger data sets, instead, sampling methods called Monte Carlo
methods are used to estimate the probabilities of the most likely tree
topologies in explaining the data set.

There is a "credible zone" in Bayesian analysis, similar to the
confidence interval in classical statistical analysis. A hypothesis that
comprises a researcher-selected 95% credible zone (or interval) is one that
is very probable. Several hypotheses may add their probabilities to the
credible zone. For instance, suppose you have 7 trees. Their probabilities
are:

0.50, 0.30, 0.15, 0.03, 0.01,
0.005, 0.005

Then the first three trees, whose probabilities add to 0.95, comprise the
"credible zone." How these trees are similar is a probabilistic
solution.

The probabilities of the first three trees in the quoted paper above add
up to a pre-selected 99% credible zone, and one tree is much more likely than
the second best. Sounds good, even though the best tree has a posterior
probability of only 64.9%. There is 35.1% evidence against this solution, but
that evidence supports no one alternative tree. At this point one might muse
on the question that if there is one weakly supported optimal solution and a
host of other solutions that are each very poorly supported, does this mean
that one can ignore the other solutions (and in effect raise the probability
of the optimal solution to 100%)? Or might the optimal solution at least
sometimes be a blip or random combination of data that is unrelated to the
phylogenetic history? (Again, random data often can be used to generate well
resolved cladograms with some branches well supported by Bremer support—but
seldom not bootstrapping. To get a DOS executable of my random data set
generator drop me a note at rzander@sciencebuff.org).

The same researchers did a MCMC study of cichlid fish mtDNA data, of 32
species, yielding 10 to the 40th power different possible topologies:

The most likely clade had a posterior probability of 64.5%, and the next most
likely had a figure of 10.2%. Although this is an impressive achievement, one
must ask oneself if this result is sufficiently probable (reliable, true) to
base other research on, say, biogeographic conclusions? It might be, with
certain, specified qualifications.

One problem the researchers pointed out is that different techniques gave
different results:

An important problem with any such computation is that many trees have
probabilities too small to calculate. When a tree has 10 to the fortieth
different topologies, the sum of the probabilities of the many trees with
so-small-as-to-be-non-calculable probabilities may be significant compared to
the "credible zone." Thus, the probabilities of the most likely
clades in Monte Carlo sampling studies are always relative to the sum of the
probabilities of the trees actually calculated, not the sum of the
probabilities of all possible trees, including those not sampled and those
not calculated.

There are more kinds of topologies that must be
taken into account in likelihood and Bayesian calculation where statisticians
distinguish a difference between trees and dendrograms. The probabilities of
these must be summed to get a composite posterior probability of a clade:

The math of these treatments is impressive, but remain optimal solutions
that depend on a complex of assumptions and data. But you can do these
yourself! Software is now available (PAML by Yang, and BAMBE by Simon and
Larget) for anyone to try their hand at Bayesian MCMC analyses. One needs to
be able to set a few options:

where the number of times a topology is visited is directly proportional
to the likelihood. One major stumbling block with both maximum likelihood and
Baysian MCMC analyses are the many regularity assumptions and guesses in
regard to the evolutionary model that must be made (see handout).

ERROR

Error is a fact of life. This talk emphasizes that Type I errors are to be
avoided, since refusing to accept an exact solution, even when a Type II error
(rejecting a true phylogenetic solution because the null hypothesis of no
support for the optimal tree cannot be falsified) may be involved, is fail
safe. But there is no progress without taking the chance of Type I errors.
Some writers assume (incorrectly) that we must make (error-ridden)
decisions:

where minimax solutions (minimizing the maximum possible loss) are to be
sought.

What are the possible consequences of being in error in phylogenetic
reconstruction? What is driving the tendency to publish exact results of
phylogenetic analysis whether well supported or not? Why does one brave Type
I errors and convince oneself of "discovering" (approximately) an
apparently "real" nested hierarchy in nature? The gain/loss table
tells all:

GAIN/LOSS
TABLE

HYPOTHESIS ACCEPTEDHYPOTHESIS REJECTED

IF PHYLOGENETICType I error.Eventual satisfaction.

HYPOTHESIS Satisfaction.No glory, no grants.

IS REALLY FALSEGlory, grants.

Problems for others.

IF PHYLOGENETICEventual satisfaction.Type II error.

HYPOTHESISGlory, grants.No problems but

IS REALLY TRUENo glory, no grants.

Note:
It is easy to point out error, and comb the literature to find supportive
quotes for one's own ideas. I have, however, in press (cited in handout) a means of measuring the reliability of each
cladogram branch using a non-parametric test (chi-squared) on the three
alternative branch lengths obtained from nearest-neighbor interchange. It
points out where some optimal branches may have been chosen by little more
than flipping coin (a three-sided coin), and where there appears to be
probabilistic support at an acceptable confidence level for the optimal
branch. Yee (2000, cited in handout) recently
offered a similarly non-parametric method of evaluating branch support, but
his method is problematic for two reasons: the whole cladogram is involved in
calculating each branch probability not just the three alternatives from
nearest neighbor interchange, and the signed-ranks probabilities are used
throughout as branch probabilities (a Bayes simple proportion should be used
to calculate the probability of selecting the correct alternative branch
arrangement when a pre-selected confidence level is not attained).

J.
Lyons-Weiler (pers. com.) has pointed out, quite rightly, that the generation
of any optimal tree "throws away the error term." Luckily parsimony
still leaves plenty of error for my method to root out. He has championed
treeless analysis for phylogenetic signal (RASA),
which has much potential.

Many
programs
are available for phylogenetic reconstruction. Nearly all of them will favor
a Type I error over Type II; thus, they may be fail-safe for the researcher,
but not for associated students or workers in other fields who rely on the
results. M. E. Sidall has a good summary of the logical
and statistical bases behind phylogenetic methods. He is not particularly
keen about any one method. On the other hand, the data is there and I am sure
the potential for new knowledge is when we distinguish a true advance from
nonsense.

I
thank Marshall Crosby and Robert Magill for hosting this seminar. Since this
was an informal talk, author citations were largely eliminated, and I hope
readers can distinguish between my own ideas and when I present those of
others. If in doubt, full attributions are given in papers that are cited in
the handout.