Wednesday, November 13, 2013

Monophyletic groups in networks

I have noted before that taxonomic groups that are represented in any tree-like parts of a phylogeny can be considered to be monophyletic, but those that consist of hybrids cannot, unless we hypothesize a single hybrid origin for each group (How should we treat hybrids in a taxonomic scheme?). This issue arises from the concept that monophyletic groups must share an exclusive Most Recent Common Ancestor (MRCA), and this concept is not straightforward for a network compared to a tree.

This topic has been tackled mathematically a couple of times (see Huson and Rupp 2008; Fischer and Huson 2010), resulting in the recognition that for a network there are three main types of MRCA: conservative MRCA (or stable MRCA), Lowest Common Ancestor (or minimal common ancestor), and Fuzzy MRCA (see Networks and most recent common ancestors). These have definitions based on the Least Lower Bound and Greatest Lower Bound of mathematical lattices.

Unfortunately, there has been very little discussion of the topic in the biological literature. However, recently Wheeler (2013) has made a start. There is no reference to the mathematical work on MRCAs, but he considers what to do about the concepts of monophyly, paraphyly and polyphyly with respect to networks.

Basically, he suggests three new types of phyletic group: periphyletic, epiphyletic, and anaphyletic. He provides algorithmic definitions of these groups, relating them to the previous algorithmic definitions of monophyly, paraphyly and polyphyly. These new types concern groups that are monophyletic on a tree, but have additional gains or losses of members from network edges — that is, they lie somewhere between monophyletic and paraphyletic.

For example, an epiphyletic group would be one that is otherwise monophyletic but also contains one or more hybrids that have one of their parents from outside the group, while a periphyletic group would be monophyletic but has contributed as a parent to at least one hybrid outside the group. An anaphyletic group would have done both of these things. For clarification, Wheeler provides the following empirical example, based on Indo-European languages (where English is recognized as a "hybrid" of Germanic and Romance languages).

Reproduced from Wheeler (2013).

In terms of MRCA, it seems to me that all three new group types use the Lowest Common Ancestor model, which is the shared ancestor that is furthest from the root along any path (ie. the LCA is not an ancestor of any other common ancestor of the taxa concerned). However, this is only clear when we consider hybrids, in which the two (or more) parents contribute equally to the hybrid offspring. When dealing with introgression or horizontal gene transfer, where the parentage is unequal, then we approach the Fuzzy MRCA model, in which only a specified proportion of the paths (representing some proportion of the genomes) needs to be accommodated by the MRCA, thus keeping the MRCA close to the main collection of descendants.

What is not yet clear is whether we would want to recognize any of these new group types in a taxonomic scheme. I guess that this is something that the PhyloCode will have to think about, since it is based strictly on clades (although they are allowed to overlap).