Pages

Tuesday, March 15, 2011

Matrix representation parsimony (MRP) supertree estimation

Are any of my readers aware of an existing function to do Matrix Representation Parsimony (MRP) supertree estimation (Baum 1992; Ragan 1992; also see Sanderson et al. 1998; Bininda-Emonds et al. 2007) in R? I'm not, but I was thinking it would be a fun - and easy - thing to program. Consequently I have just posted a function for this, called mrp.supertree(), on my R-phylogenetics page (direct link to code here). This function could also be used to get the MRP consensus tree (that is, if all the input phylogenies have the same set of taxa; e.g., see Felsenstein 2004 pp. 527-528).

I programmed this by using two handy functions: the function prop.part() in {ape} (which returns all the non-trivial bipartitions for a given tree) and pratchet() in {phangorn} (Schliep 2010), which estimates the MP tree using the parsimony ratchet method of Nixon (1999).

The way MRP supertree estimation works is by first constructing N (for N trees) ni x mi binary matrices (for ni species and mi non-trivial bipartitions in the ith tree). We score the tree matrices by putting a 1 in the j,kth position if the jth taxon is found in the kth bipartition, for each bipartition, and a 0 otherwise. We then accumulate these matrices into a supermatrix which contains union(speciesi) rows and sum(mi) columns. We put a "?" into the supermatrix if a species is missing from the tree. Finally, our MRP supertree is the MP tree estimated for this binary supermatrix.

To see how well this function does, let's first generate a random reference tree with 50 species:

> true.tree

Now let's randomly prune 40-60% of the species from this tree ten times:

A few notes on this implementation. Firstly, I set pratchet() to return all equally parsimonious trees (if several are found, mrp.supertree() will return a "multiPhylo" object) - but this is not the same as a guarantee to find all most parsimonious trees. This could only be guaranteed with an exhaustive search. Secondly, pratchet() does not always find the MP tree - so you might run mrp.supertree() multiple times if the parsimony score is greater than the theoretically minimal score.

16 comments:

This is actually useful for me! I was looking for a way to build a MRP. I'm not very familiarized with R, but I try to load a newick file with all the trees and it gives me a error. How are you passing the trees to the function?

The function takes an object of the class "multiPhylo" (that is, a list of trees stored in memory). If you are starting with a simple text file with Newick trees (named, for instance, "treefile.tre") in a list, e.g.:

Hi. I'm not sure what the problem is, to be honest. If you email me your datafile (liam.revell@umb.edu) or, assuming they are small trees, post your Newick tree strings here, I can try to help more. - Liam

Hi Dima. Hard to diagnose from the error message, which is not familiar. (Looks like a segmentation fault, which is usually due to memory allocation problems, but I cannot remember seeing this in R.) Feel free to send me the datafiles and I will try and run. - Liam

Hi Richard. I don't as my research is not on supertree estimation. (I programmed this function for a phylogenetics class I taught last year.) If you can't find what you're looking for online, maybe you should email your query to the R-SIG-phylo email list, as there are lots of experts who subscribe. Good luck! Liam

You may want to skip phytools and try phangorn. phangorn has an equivalent (but undoubtedly better) MRP function. I advise you try that & contact the package author (or post query to R-sig-phylo) if you have trouble. - Liam

About this blog

This web-log chronicles the development of new tools for phylogenetic analyses in the phytools R package. Unless you a reading a very recent page of the blog, I recommend that you install the latest CRAN version of phytools (or latest beta release) before attempting to replicate any of the analyses of this site. That is because the linked functions may be archived, and very likely have been replaced by newer versions.