Wednesday, October 22, 2014

Is phylogenomics tree-like?

Phylogenomics, the idea of applying genomic data to phylogenetic studies, has been around for quite a while now (Eisen 1998), although it was probably Rokas et al. (2003) who drew the first widespread attention among phylogeneticists. Molecular phylogenetics started off using the sequence of a single locus (often small-subunit rRNA) as the data, and slowly progressed from there to multiple loci. Currently, it is considered good practice to use half-a-dozen loci, sampling the main genomes (nucleus, mitochondrion, plastid); and genomics offers the possibility of a fast and cost-effective means of generating large amounts of multi-locus sequence data.

Review papers are beginning to appear based explicitly on next-generation sequencing (NGS), such as those of Lemmon & Lemmon (2013) and McCormack et al. (2013), replacing the earlier work of Philippe et al. (2005), and there are suggestions for how phylogenetics analyses might need to change in response to NGS data (Chan and Ragan 2013). These all treat phylogenomics as being very similar to traditional molecular phylogenetics, in the sense that many people are expecting phylogenomics to provide tree-like resolution of questions that remain unresolved with the current smaller datasets. In the words of Rokas et al. (2003), phylogenomics is intent on "resolving incongruence in molecular phylogenies". That is, incongruent gene trees are seen as the major obstacle to be overcome by phylogenetics data analysis (see also Jeffroy et al. 2006).

However, this might be a naive expectation. After all, the existing phylogenetic conflicts are there for a reason. If we cannot resolve certain parts of organismal history in terms of a phylogenetic tree when we use the current levels of multi-locus data (say <10 loci), then there is no real reason to think that this will happen just because we increase the number of loci. There are plenty of other reason for incongruence among genes, the most obvious one being that the history is not tree-like in the first place. The advantage of phylogenomics, then, would be its ability to clarify the phylogenetic history rather than to resolve incongruence.

There are now quite a few published empirical phylogenomic studies, which allows us to provide a preliminary answer to the question about whether phylogenomic patterns are tree-like or not. There are a few published studies where the authors claim resolution in terms of a tree, as least for part of their phylogeny (e.g. Wang et al. 2012), but it seems to me that there are far more studies where the incongruence remains even with genomic data. Below, I briefly introduce a few arbitrarily chosen examples.

So, complex genealogical problems often remain complex even after using genomic data. We haven’t "solved" any of the so-called genealogy problems, we have simply made clear in what way they are complex. That is, genomics data generally reveal reticulate evolutionary histories, not simple tree-like ones.

This leads me to conclude that phylogenomics is about reticulate evolution, and it is thus time for phylogeneticists to abandon trees as a model for genealogies. We have probably already resolved most of the simple tree-like genealogical patterns, using non-genomic data, and from here on we will be using genomics to study gene flow in addition to parental gene inheritance.

Examples

(1) Galtier and Daubin (2008) were among the earliest researchers to try to "deal with incongruence in phylogenomic analyses", and one of their examples was the long-standing problem deciphering the relationships among the closest relatives of humans. However, the genomics data make it clear that, while humans share slightly more genes with chimpanzees than with other great apes, we still share some with gorillas but not chimpanzees, and with orangutans but not either chimpanzees or gorillas. Also, chimpanzees share some genes with gorillas that we do not share. The situation is now clearer, but the tree incongruence remains.

(2) At the same time, Kuo et al. (2008) looked at the then-available genomes for members of the Apicomplexa, which are unicellular eukaryotic parasites. The genomic data confirmed the current groupings of Haemosporidians, Piroplasmids and Coccidians (shown as branches with high support in the diagram) but completely failed to resolve the relationships between these groups (shown as branches with low support). Things are no better today, when we have at least some data for 35 genomes.

(3) The relationships among mammal superorders, particularly the placentals, has been a ongoing area of debate. I have already covered this in some previous blog posts, notably Conflicting placental roots: network or tree? and Why are there conflicting placental roots? There are three possible ways of resolving a tree at the root of the placental phylogeny, and genomic datasets seem to support all three of them — the published different trees are therefore based on variation in the model used for data analysis. As Hallström and Janke (2010) have noted, there was probably incomplete lineage sorting and hybridization in the early placental mammalian divergences, rather than a truly tree-like history.

(4) Dell'Ampio et al. (2014) looked at the phylogenetic relationships of the wingless insects, and tried to come to grips with the incongruence among genes. They considered three main tree-based hypotheses for the relationships, and found that genomic support was pretty evenly spread among the three topologies. They dryly note that after their hard work the relationships "are still considered unresolved."

(5) Relationships among hominids have been a popular study for many years, and not unexpectedly there has been a burst as a result of genomic data, especially as there are now SNP micro-arrays available to simplify the data collection. I have covered this in previous posts, as well, notably Why do we still use trees for the Neandertal genealogy? The bottom line is that the genomic data provide evidence of extensive introgression (or admixture) between humans and their nearest relatives throughout their time of co-existence. This example is from Reich et al. (2011).