Wednesday, March 7, 2012

Why do we still use trees for the dog genealogy?

In my previous two posts on Georges-Louis Leclerc, comte de Buffon, and his original dog genealogy of 1755, and the model for it, my interest was in Buffon's pioneering spirit in developing new ideas about genealogies and their presentation. However, it also seems natural to wonder how much we have progressed in the 250 years since then.

Having looked at the recent literature, there currently seem to be three distinct trends within dog phylogenetics:

the study of whole-genome data, in which the results are presented solely as a neighbor-joining tree Parker et al. (2004) von Holdt et al. (2010)

the study of mtDNA sequence data, in which the results are presented both as a tree and as a haplotype network Brown et al. (2011) Kropatsch et al. (2011) Oskarsson et al. (2012) Ryabinina (2006)

It is difficult to look at this list and not feel that there is a great deal of historical inertia here, regarding the choice of analysis method. People like Hans Bandelt have developed network methods explicitly for mtDNA data, such as median-joining and reduced-median networks; and the literature is replete with papers using these methods to analyze mtDNA sequences, especially the so-called "mitochondrial control region". On the other hand, these methods seem to be less commonly employed for other data types, where instead trees are de rigeur. So, people are apparently choosing their analyses based on historical convention within their field, rather than their suitability for the purposes at hand. Perhaps the papers where both methods are used should be seen as a compromise? Or should I be optimistic and see tham as part of a move away from trees towards the use of networks?

I have shown the two dog trees here. Both of them make it abundantly clear, even to the casual observer, that a tree is inappropriate for the data at hand.

Dog phylogeny (Parker et al. 2004) [Click to view]

The tree from Parker et al. has extremely small bootstrap values for almost all of the branches (only those >50% are shown on the tree), and even the group of modern dog breeds does not get up to 50% support. Clearly, there is massive conflict in this dataset. [Do not ask me why there is a value of 100% for the single branch at the base of the tree, since its presence is illogical.]

Dog phylogeny (von Holdt et al. 2010)

The tree from von Holdt et al. has broader coverage but is even more clearly non-tree-like. The dots indicate the branches with >95% bootstrap support and the colours indicate the 10 groups of dog breeds recognized by the Fédération Cynologique Internationale. As you can see, many of the breeds are scattered around the genetic tree, indicating cross-breeding in the genealogical history. This paper thus follows Buffon by nominating representative breed groups but fails by not showing the cross-breeding. So, it is drawn as a tree not a network, even when we know the history is not a tree. The use of colouring in the phylogenetic tree is one interesting way to indicate cross-connections in the genealogy, but cross-connecting lines is more explicit. [Interestingly, later editions of Buffon's work sometimes used hand-colouring of the genealogy to emphasize the breed groups that Buffon discusses in his text, so even this is not original.]

In both of these cases the tree analysis seems wildly inappropriate. As Buffon wisely told us 250 years ago, domestic dog breeds do not have a simple tree-like ancestry. It almost seems insulting that 2.5 centuries later we are still trying to fit these very same breeds (plus their numerous more-recent descendant breeds) into the straightjacket of a tree. We need to learn from the past if we are to progress into the future.

By the way, the patterns discussed here for phylogenetic analysis seem to be true for all groups of domesticated organisms. [You could try searching for the horse genealogy on the web, and you will see what I mean.] I am thus using the dogs merely as one convenient example. Following Andersen (1990), I do not intend "to pillory the few for errors which many commit with impunity".

Added note:
Since writing this post, another paper has appeared that can be added to group 1 (whole-genome data, with the results presented solely as a neighbor-joining tree): Larson et al. (2012).

2 comments:

Very interesting, thanks! Phylogenetic trees may or may not be an adequate representation of evolution in some cases but it seems quite clear that they are completely inadequate to represent the divergence of "breeds" and, in particular, artificially selected breeds.

I skimmed the paper from Parker et. al and it seems that the wolf "taxa" is actually 8 different individuals. Perhaps the 100% support is simply for the split between those 8 wolves and the dog samples?

The inadequacy of a tree seems to be Buffon's main point in his genealogy. That is what makes the "first" phylogeny so interesting — we started with networks and only now are we returning to them.

Indeed, your explanation for the 100% support seems to be the correct one: "Wolves from eight different countries were combined into one population for simplicity on the tree ... When taken as individuals, all wolves split off from a single branch, which falls in the same place as the root." The countries were: China, Oman, Iran, Sweden, Italy, Mexico, Canada and the United States. Thanks for pointing that out.