Projects

Below are brief descriptions of some of the projects my lab has pursued over the years. If you are a prospective student, this should give you a feel for the kind of work we do.

GrAToL (Assembling the Green Algal Tree of Life)

Together with Dr. Louise Lewis’ lab (also in the UConn EEB Department) as well as labs from four other institutions (Karol lab at the New York Botanical Garden, McCourt lab at Drexel University, Delwiche lab at the University of Maryland, and the Lopez-Bautista lab at the University of Alabama), our lab received a 5-year AToL (Assembling the Tree of Life) grant from the National Science Foundation (NSF DEB-1036448) in 2010. The primary goal of this project is to estimate a framework phylogeny for the green algae (Phylum Chlorophyta). This is a very important group because it was a lineage from this group of algae that led to the embryophytes, a group that includes all other plants that you know (excluding fungi and red and brown seaweeds, which have not been considered plants for a long time). Green algae are also very important these days as a source of biofuel, so a better knowledge of their phylogenetic relationships is critical for much applied research.

Model Selection

In collaboration with Ming-Hui Chen and Lynn Kuo in the UConn Statistics department, my lab has worked on the problem of estimating the marginal likelihood, a quantity used to compare models. The marginal likelihood measures the average fit of a model to a data set; the model with the best average fit is deemed best. Estimating the marginal likelihood accurately is not easy, however. The method we came up with is called the stepping-stone method.

The figure on the left shows a simple example involving only two sequences and a model with 2 parameters: the edge length (ν) and transition transversion rate ratio (κ). The stepping-stone method uses a series of probability distributions ranging from the posterior distribution to a reference distribution (here the prior distribution). The prior distribution in a Bayesian analysis represents our belief in the values of the model parameters without reference to the sequences we’ve observed, while the posterior represents our belief after the sequences have been taken into account. The marginal likelihood is related to both the prior and the posterior. The marginal likelihood is a weighted average of the likelihood (probability of the data given the model), and the prior distribution provides the weights. The marginal likelihood also represents the constant that is used to normalize the posterior distribution (i.e. scale it so that it represents a proper probability distribution). The stepping-stone method uses samples from both prior and posterior as well as several distributions that lie between to provide an accurate estimate of the marginal likelihood. See the Xie et al. (2011) and Fan et al. (2011) papers for a more complete explanation.

Bayesian Star Tree Paradox

If sequence data are simulated using a 4-taxon star tree (such as the one shown on the left) and evaluated with standard software tools for Bayesian phylogenetic inference, one of the 3 possible fully-resolved trees is often supported very strongly. This is paradoxical in that most people expect the three possible resolutions to be equally supported in this case, but such an outcome is only seen when the sequence length is tiny (e.g. 1 site). It appears that uncertainty in this case is manifested in the inability to predict, from dataset to dataset, which of the 3 possible fully-resolved tree topologies will be favored. This behavior is troubling, and possible examples of this behavior have been pointed out by several researchers. Many more potential examples can be found in the literature by looking for high posterior probabilities but low bootstrap support, combined with tiny internal edges.

We argue that the central problem here is the non-identifiability of the tree topology, and propose a solution using reversible-jump MCMC. Our rjMCMC sampler visits not only fully-resolved tree topologies, but can visit topologies containing hard polytomies as well. This effectively places a point mass prior probability on polytomies, providing an alternative in situations in which a fully-resolved topology is not a reasonable option. The analysis can be made as conservative as desired by modifying the prior distribution assumed for topologies, but in our (albeit limited) experience it does not appear easy to destroy support for real edges by using a prior that strongly supports polytomous topologies.

Algal Phylodiversity

A major thrust in the laboratory of Louise Lewis is diversity and systematics of green algae (Phylum Chlorophyta) living in the soils of North American deserts. These unicellular green algae are capable of tolerating the harsh conditions posed by desert soil environments, and represent an important (yet not well understood) component of desert microbiotic crust communities. The 18S rDNA sequences of a number of green algal isolates have been determined, and these data suggest that several lineages of green algae have diversified within deserts. One might be tempted to think that the green algal cells isolated from desert soils are simply the result of spores dispersed into deserts from distant aquatic sources. This study shows that the 18S sequences of these desert isolates are more divergent from their nearest aquatic relatives than would be predicted if they were merely incidental visitors. We characterize the molecular phylodiversity of desert green algae and demonstrate with a Bayesian analysis of 150 green algal 18S sequences that all freshwater classes of green algae have yielded desert lineages. The numerous transitions from desert to aquatic existence apparent from the phylogeny argue that it is no longer accurate to portray land plants as resulting from a single origin. The highly celebrated origin leading to the embryophytes is but one of many transitions to terrestriality.