Sunday, June 29, 2008

Software Review: BEST version 2.0

As Glor hinted in a recent post, "species tree" analyses are pushing the phylogenetics world towards a paradigm shift. One of the methods currently available to researchers is Liang Liu's computer program BEST (Bayesian Estimation of Species Trees). Version 2.0 is available at the BEST website in both Windows and Max OSX executables. BEST estimates the posterior distribution of species trees that are estimated from multilocus, and multiple-allele DNA sequence data that attempts to account for the persistent pattern of deep coalescence of alleles. This is one of the mechanisms that can result in mismatch between gene trees and the species tree. Details of the method are provided in a paper by Liu and Pearl published in Systematic Biology (subscription required for PDF download).

The BEST website provides example files for analyses when single alleles are sampled for each species, as well as the sampling of multiple alleles from each species. Both analyses assume that species are reciprocally monophyletic. Given that BEST is a modification of MrBayes, the data formats are very similar except that BEST includes priors for theta and mu. Also, the tree topologies, branch lengths, and mu are unlinked across the sampled loci. If a haploid locus (mitochondrial or chloroplast DNA) is sampled, BEST allows the user to define the ploidy of the locus (default setting is diploid).

If a multiple locus dataset is run in BEST, a sham file is needed to summarize the trees after the burnin (the familiar "sumt" command from MrBayes). Liang Liu has posted an example of this type of file on the BEST website.

The trees are summarized using a burnin value that discards all trees and parameter values sampled prior to convergence. As in MrBayes, summarizing the trees produces a consensus tree file, where the consensus percentages for clades are interpreted as the Bayesian posterior probability. Progress of the BEST run and assessment of convergence can be monitored using the computer program Tracer.

My laboratory group has been experimenting with BEST for the past few months, and we are generating some interesting and exciting results. The prior on theta appears to be the one issue/nuisance that we have run across in our explorations using BEST. A fairly wide prior is given in the example files. We are beginning to run BEST with more narrow, and realistic, priors for theta. So far the results are promising.

Overall, I have found BEST straightforward to implement with my multilocus phylogenetic data. Familiarity with MrBayes will certainly help new users of BEST. Also, Liang Liu has been very helpful and encouraging to users, and has implemented suggestions into the example files on the BEST website. My entire lab group is excited to be exploring the frontier of phylogenetics, with the hope of that we are making the most reasonable inferences regarding species relationships that is afforded by our hard earned data.

7 comments:

I've been interested in the new dimension BEST adds to tree (branch width) since I first read these papers. I am curious to know if - despite widely ranging values depending on the prior used - theta remains proportional across analyses. Also, are you noticing any cool patterns related to this value? Like autocorrelation, fluctuations corresponding with presumed instances of dispersal or ecological shifts, etc?

Dan, not sure. You are welcome to have at the parameter files. Let me know and I can get them up on my lab server. Frankly, I think that the time to explore these questions is now, and given your interest in computational phylogenetics you can make a nice contribution.

Although this shouldn't have an impact on the content of this string, I wanted to clear up a potential case of identity confusion. My graduate student - Dan Scantlebury - posted this as "dans"; Dan Warren at UC Davis has been posting as "Dan". Both are interested in computational phylogenetics, but perhaps we need to give these guys some nicknames...

We have used BEST in my lab for some time now. If you violate any of the assumptions (and there are many...e.g., horizontal gene transfer) and certain coalescent requirements, don't be surprised that you will find little resolution and support in your trees relative to a partitioned BI or ML analyses. See the paper by Belflore et al. (2008) on Geomyidae in Sys. Bio.

Moreover, violations of these assumptions will produce harmonic mean -lnL's that are much lower than the more resolved, standard partitioned BI analyses.

If you meet the assumptions of BEST species trees obtained from joint posterior probabilities of gene trees, then you are golden!

In my previous comment, I wasn't saying that the standard BI or ML trees are actually better...they may simply have more resolution (although this may be artificial). In contrast, the lack of resolution in BEST may also be artificial.

Well, theta is a huge problem and I am sure the defaults are too wide. We are exploring this now. Also, it is not entirely clear at what phylogeographic/phylogenetic level will violate the assumptions of BEST. For instance, will assessing species relationships for organisms >10 mya be out of bounds for this? Edwards thinks not..right?

About Dechronization

Dechronization is authored by evolutionary biologists interested in the development and application of methods for estimating phylogeny and making phylogeny-based inferences. The goal of the blog is to provide a forum for discussion of the latest research and methods, while also providing anecdotes, tidbits of natural history, and other related information.