A lot of us who work in well established biological systems take for granted how those systems were first discovered or established. Sometimes this involves the choice by an individual to begin studying development using a small worm. Other times it’s the fortunate discovery of visible chromosomes allowing for physical maps of genomes to be constructed decades prior to genome sequencing. Or it could be the fortuitous choice of a taxon to use a model system for molecular evolution. This post deals with that last scenario, specifically the Drosophila melanogaster species subgroup.

In a paper published in the December, 2007 issue of Genetics, Jean David and colleagues summarize the history of the discovery of D. melanogaster and the species closely related to it. D. melanogaster was first described in 1830, and D. simulans and D. yakuba were described in 1919 and 1954, respectively. However, the majority of the species — six of the nine — were described after 1970. And one species, D. santomea, was not described until 2000. Many of the species that are popular subjects for various studies in evolutionary genetics — ranging from the genetics of speciation to sophisticated empirical population genetics — have only been known to science for a few decades.

While David et al. provide a very nice history of the discovery of each species, they manage to mix up the phylogeny of this subgroup. Granted, it’s a tricky tree, but there’s a strong consensus in the research community as to the correct tree. Here’s the phylogeny presented by David et al.:

The species names with boxes around them indicate those for a which a complete genome sequence is currently available. What I want you to pay particular attention to is the branching order that gives rise to D. melanogaster, D. yakuba, and D. erecta. In this tree, D. melanogaster and D. yakuba are grouped together at the exclusion of D. erecta. This had been the accepted phylogeny for much of the time we’ve known about these species, however, it no longer is. In the legend of this figure, the authors cite two papers from 2004, which is actually after improvements were made to the phylogeny of these species. You’d think experts on this subgenus would know better.

The tree on the right is what is currently thought to be the correct tree. This is based on two studies which looked at the sequences from multiple genes (one by Ko et al. and another by Pollard et al.). Notice that D. yakuba and D. erecta are clustered together at the exclusion of D. melanogaster and its close relatives. The reason why this tree has been so hard to resolve is that the two branching events that gave rise to these three lineages happened within a very short time interval relative to the lengths of each of the three external branches. Short internal branches, like the one dividing the D. yakuba/D. erecta clade from the D. melanogaster lineage, can lead to the accumulation of genetic changes that mask the true evolutionary relationships.

Specifically, Pollard et al. found that incomplete lineage sorting causes different genetic markers to suggest different phylogenetic relationships. This happens when ancestral polymorphisms are segregating in the population prior to a speciation event. When the D. yakuba/D. erecta lineage split from the D. melanogaster lineage, some of the ancestral polymorphisms remained in each of those lineages. Then, when D. yakuba and D. erecta split, different polymorphisms fixed along each of those lineages — and along the D. melanogaster lineage. If the same allele fixes along the D. melanogaster and D. yakuba lineages, but a different one fixes along the D. erecta lineage, that site will suggest a phylogeny in which D. melanogaster and D. yakuba are the closest relatives.

The best way to get around this is to sample multiple loci. (One should always sample multiple loci when make inferences about evolutionary relationships, historical demography, or any other natural history of organisms.) Ko et al. looked at four large genes and found the greatest support for grouping D. yakuba and D. erecta. Pollard et al. looked at the entire genomes of each of the three species and found the same. However, it’s important to note that the entire genome does not cast a unanimous vote for this particular relationship. But, because the majority of the genome suggests that relationship, it is the most likely actual evolutionary relationship of the three lineages.

David JR, Lemeunier F, Tsacas L, Yassin A. 2007. The Historical Discovery of the Nine Species in the Drosophila melanogaster Species Subgroup. Genetics 177: 1969-1793 doi:10.1534/genetics.104.84756