Universal Common Ancestry’s Smoking Gun

This is a repost of an article I first wrote up several months ago on my own blog.

In 2013, a paper was published by researchers from New Zealand in the journal PLOS ONE entitled “Beyond Reasonable Doubt: Evolution from DNA Sequences“. After reading it recently, I was surprised to find that it seems to have flown under the radar in popular science circles. I couldn’t find any references to the article in press releases or even in popular blogs that promote biological sciences and evolution. Perhaps this is the reason that the paper has also gone unchallenged by creationists and ID proponents – it’s never really been used in debates to support common ancestry. Let’s assume that’s the case, after all, it’s completely unheard of for creationists/ID proponents to ignore the best pieces of evidence for evolution, isn’t it? The singular comment I could find by an ID proponent on this paper was a whining EvolutionNews article that didn’t even attempt to criticise or refute the paper’s methodology or results, but instead complained at length about the “sneering tone” of authors. In other words, complaining that evolutionary biologists are so dismissive of ID and creationism. Why might that be, I wonder.

Anyway, that little tangent aside, let’s look at the contents of the paper.

What does the theory of universal common ancestry predict?

As the name implies, Universal Common Ancestry (UCA) is the theory that all extant life on earth shares a common ancestor if you go back far enough. This theory makes several quantitive predictions: trends in data that we really have to observe if UCA really is correct. Among these is the idea of molecular convergence as you go back in time.

In other words, if you could travel back in time, say, 3 million years, and find the ancestors of today’s humans and chimpanzees, you would expect them to be more similar to each other than modern humans and chimps are today. This is because the two independent lineages have been evolving independently since they diverged about 6 million years ago. This is why we expect to find more dissimilarities at the molecular (DNA and protein) level with increasing time since our last common ancestors: we’re more similar to chimps than gorillas, more similar to gorillas than orangutans, and so on.

Of course, we can’t actually go back in time and take DNA samples of our ancient ancestors (although ancient DNA can be very informative), but we can do the next best thing: perform ancestral gene reconstruction analyses. I won’t go into detail about this method in this post, suffice it to say that it’s an accurate method of inferring gene sequences of an ancestor of a given set of extant sequences.

So, the stage is set for a formal test of UCA. It’s possible to take a set of extant species and compare how similar their sequences are, then reconstruct the ancestral sequences and see if these are more similar or not. That’s exactly what White, Zhong, and Penny did in their paper.

Quantifying convergence between large taxonomic groups

As you can see in Table 1, White et al compared a wide variety of taxonomic clades with divergence times ranging from 125-800 million years ago. The reason for this wide-ranging dataset was to demonstrate that the molecular convergence was a trend found at all different levels of ancestry, not that just some groups converge while others appear to have independent origins. This is why the results speak to universal common ancestry, not merely ancestry between particular groups. A number of species within each clade were chosen to give a good representation of that clade – for example 25 species of Angiosperms. Gene sequences for a number of chloroplast, nuclear, or mitochondrial genes were obtained from existing databases.

Table 1. The taxonomic groups and data types used in the analysis. A subsection of the paper’s Table 2. Data type refers to the type and number of genes used in the analysis, while the numbers in the Group X and Group Y columns refers to the number of species within each group that the sequences were obtained for.

So, remember the procedure: first, the similarities between genes from species in a pair of these extant groups were calculated on the basis of alignment score, for example Angiosperms (AS) and Gymnosperms (GS). This was done by comparing each individual in the AS to every other individual in the GS, and essentially taking an average.

Next, the ancestral gene reconstruction was performed on each group, so they ended up with a single sequence for gene, one ancestral to the AS and the other ancestral to the GS. Again, the similarity between this pair of ancestral sequences is calculated. If the pair of groups are related, these ancestral sequences should be more similar than the average of the extant sequences. If the groups are not related, which would correspond to some of independent origin, such as special creation, we should expect to find no significant different in the level of similarity. This principle is illustrated in Figure 1.

Figure 1. Contrasting predictions of UCA and independent origins in sequence space.A) Prediction from UCA. B) Prediction from independent origins. The red dotted lines represent the average pairwise similarity in sequence space of gene sequences from the extant species in each group, and the green nodes represent the location of the reconstructed ancestral sequences. The grey dotted line in A) represents the common ancestry of groups X and Y that is responsible for the convergence. C) a figure from creation.com representing the evolutionary tree of life (left) and the creationist “orchard” of life (right).

As you can see, the illustrations of the predictions from the different models in Figure 1A) and B) match perfectly with the illustration given by creationists for the competing models in C). The branches point straight down towards the creation event, while in the UCA tree the branches are oriented such that they converge as you move further down the tree. The only difference between the diagrams is that the one from creation.com is drawn in morphological space, while A) and B) are drawn in sequence space, but the principle remains the same.

In Table 2 of the paper the significance values for each of the pairwise tests are shown, and they’re all highly significant, ranging from 1.05×10-6 to 1.69×10-44. For the combined dataset as a whole, the P-value is 2.59×10-132. That means the odds of the observed convergence occurring by chance are infinitesimally small. Appropriate null controls were run to justify the conclusion that convergence is a product of common ancestry.

While it wasn’t the focus of the paper (or this blog post), the authors also comment that the fact that greater convergence is correlated with the length of the gene in question is consistent with the stochastic nature of substitutions in evolution. They also found that the significance of the convergence correlated with the number of genes used, so using an expanded dataset of genes is likely to produce even more confident result in a similar analysis.

Pre-empting creationists

I just want to re-emphasise: the ancestral sequences of groups X and Y were reconstructed completely independently, and only compared at the very end of the process. There was nothing about the methodology that “guided” the ancestral sequences to be more similar than the extant ones.

Since the results are unassailable, the only option creationists have is to take issue at the basic methodology – the reconstruction of ancestral gene sequences. They might argue that such a reconstruction inherently assumes that a group like Angiosperms have a common ancestor in the first place, and that there weren’t multiple independent lineages within these large groups. Don’t worry, the authors already thought of that, so they made sure to take this possibility into account. The null model which the results were compared to made absolutely no assumptions about the groups, other than the fact that the groups themselves were unrelated. They explicitly allowed for all of the taxa within each group to be unrelated as well. To quote the paper:

Our null model can be considered in the following way – that the taxa in subgroup X are descended from an unknown number 1< = rX< = |X| of root sequences, the taxa in subgroup Y are descended from an unknown number 1< = rY< = |Y| of root sequences, and that the rX+rYroot sequences are all independent from each other. This allows, at one end of the spectrum, the possibility that all |X|+|Y| taxa were independently created, and at the other end of the spectrum, the possibility that all taxa in one subgroup are descended from a single common ancestor for that subgroup, which was created independently of the single common ancestor for the other subgroup. In other words, this null model imposes no requirements on the presence or absence of internal (within-subgroup) evolution of the two subgroups of taxa; the only constraint is that there is no evolutionary link between the two subgroups. That is, neither subgroup contains taxa derived from the other, nor from a common ancestor.

When combining the dataset as a whole, they also made sure that the same pairwise relationships were recovered. To put it another way, they made sure that each of the groups was actually being compared with its “true” sister taxa, i.e. that Angiosperms were still more similar to Gymnosperms and not, say, Streptophyta. This is further evidence that the groups were real – to quote the paper again:

It is important to demonstrate that the two subgroups or clades (X and Y) are genuine, and we do this for each of the subgroups in Table 2 in two ways. Firstly, the two subgroups are determined by other data – for example by nuclear or by mitochondrial DNA sequences for the plant chloroplast data. Secondly, for each of the eight pairs of datasets in Table 2 we later combine the two datasets, and confirm that the same two subgroups are still found – for example, the monocots and eudicots. This independent selection of the two subgroups is necessary because if, for example, we formed one subgroup by randomly selecting half the monocots and half the eudicot sequences, and used the other taxa to form the second subgroup, then we could artefactually get similar ancestors. So both tests (selecting subgroups from independent data, and later showing that the subgroups are recovered with the data used) are important in demonstrating that the subgroups X and Y are natural.

Next, creationists might try to argue that ancestral gene reconstruction is inaccurate, therefore the results can’t be relied upon. I already cited a paper demonstrating how accurate the methodology is (as do the authors of the paper), but does that really matter? Even if we assume for a moment that all the ancestral gene reconstructions could be wrong, that still doesn’t explain why they would result in the obscenely significant convergence that the authors observed. As I said before, there is nothing inherent in the method that would bias it to making the ancestral genes in two or more completely independent analyses more similar, so this argument leads to the inevitable claim that the results of this analysis were just a lucky result, despite the odds against that being 2.59×10132 to one. An infinitely more likely explanation is that the ancestral gene reconstruction was accurate, and the results reflect the fact that universal common ancestry is true.

Summary

In their paper, White and colleagues lay out a rigorous quantitative test of a fundamental prediction of the theory of universal common ancestry. Such formal tests have been attempted previously, but this is by far the most simple and elegant one that I’ve come across. The paper has been favourably mentioned by several other publications on the subject of testing UCA, including those which have offered in-depth critiques of previous attempts. I’m surprised this article has managed to fly under the radar for the last few years in the evolution/creation “debate” circles, so I hope that this blog post might help to elevate its profile just a little.