Hi Liam,I've been applying evol.rate.mcmc to some data and am trying to get my head around the analysis. From what I understand it finds a single rate shift in rates of phenotypic evolution within a tree. Is there any way to test the 'significance' of any identified shifts (i.e. especially for more subtle identified shifts)? I hope that question makes sense, thanks for any advice.regards

Ha! Here's a real comment for you: I am having a hard time using the phyl.pca function in phytools. No matter that I am passing the mode "cov", I get an error.> pca<-phyl.pca(C, Y, method="BM", mode="corr")"Error in colSums(invC %*% X) : error in evaluating the argument 'x' in selecting a method for function 'colSums': Error in invC %*% X : requires numeric/complex matrix/vector arguments. Any ideas?

Dear Liam, Your blog is very interesting. As you work with Simmap, I would have a very basic question: I want to do stochastic character mapping on 10000 Bayesian trees in Simmap and display the results on the consensus tree. What output files do I have to use - is it the mutational maps, which can be saved optionally? Do I have to summarize all individual maps to display them on the consensus? I had no problem to make ASR in Simmap, but after checking the manual I still don't understand how to get the output for stochastic character mapping and how to display the results. Do you have any hint for me?Thanks,David

I have nothing to do with the program SIMMAP which is maintained by Jonathan Bollback. The phytools function make.simmap can do basic stochastic mapping. Other functions (read.simmap, write.simmap) in the phytools package read & write trees in the SIMMAP format.

In the future I hope to add some of the functionality you are talking about to the phytools package, but I am not there yet.

I am a PhD student of Vladimir Minin's at UW, and we have noticed some potentially undesirable behavior with the optim.phylo.ls function.

What we observed is that there appears to be some stochasticity when using the "fixed=TRUE" option; that is, it would sometimes return different answers when run repeatedly with the same input tree and sequence alignment.

The culprit appears to be the multi2di function within optim.phylo.ls. The multi2di function purports to resolve polytomies by explicitly denoting them as having branches of length 0, as opposed to a lack of a branch whatsoever. However, it is self-admittedly a stochastic procedure, in terms of exactly where it adds branches -- it seems that it should not matter in principle since it is merely adding branches of length 0. However, there appear to be some undesired consequences when each of these resolved trees are used in optim.phylo.ls

As a demonstration, consider the following multifurcating tree:http://i16.photobucket.com/albums/b37/peterchi/T1.png

Now, suppose we want to use this as the fixed topology on which to estimate the least squares branch lengths for a given sequence alignment. Let's say that we have saved the matrix of pairwise distances (according to JC69) as dist2.stan. Then, we would use the following R code:

T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)

However, if we run this repeatedly, we will observe that we obtain different answers between iterations:> T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)best Q score of 0.0040827015 found after 0 nearest neighber interchange(s).> T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)best Q score of 3.27601e-05 found after 0 nearest neighber interchange(s).> T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)best Q score of 0.0040827015 found after 0 nearest neighber interchange(s).> T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)best Q score of 0.0033840255 found after 0 nearest neighber interchange(s).> T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)best Q score of 3.27601e-05 found after 0 nearest neighber interchange(s).

As mentioned above, the cause of these discrepancies appears to be the multi2di function. Observe what happens when we repeatedly plot the multi2di(T1) result; that is:

Yes, I can help you with this. I think that, by convention in "phylo" objects, nodes closer to the root node on a given path from the root have smaller node numbers then do nodes on the same path that are farther from the root. That is to say, node 13 is not necessarily closer to the root than node 19; but, if for a set of four taxa, the pairwise MRCAs (excluding duplicates) are: 16, 15, & 14, then we know that MRCA of all 4 is 14.

Just in case this is not guaranteed to be the case, I have written a function that uses ape::mrca & phytools::nodeHeights to compare the pairwise MRCAs of a list of species to the heights of the MRCAs above the root, and then picks the one with the lowest height. I will post it shortly.

My guess is that in your during the MCMC one of: 1) the log-value of the likelihood for the current state of the chain; 2) the log-value of the likelihood for the proposed state; or 3) either of the log-priors; are numerically indistinguishable from -Inf or Inf. One thing you could do to check this is rescale your data by multiplying by 10 or 1/10. In addition, if you send me your dataset I would happy to try and diagnose the issue - and it would also be helpful in improving phytools.

Hi Liam,Suggestion for you. How about adding % variance explained for the PC axes as part of the output of phyl.pca? Not hard for someone to do separately, but it would be useful output to add. I'm digging phytools.

Dear Liam,I am trying your phyl.cca function but I keep getting the following error:Error in ccs > 0 : invalid comparison with complex valuesis this a format problem with the data matrices or something else is going on?thanks for your reply and for developing this awesome 'phytools'cheersArley

Hi, Liam,Like the above individual I'm also trying to use the phyl.cca function, but have been encountering three main issues:1. Although I have the latest versions of R and phytools, R cannot find the function. It's definitely spelled right...2. When I source the R code instead, the lambda estimate statement doesn't seem to register (the exact error: unused argument(s) (fixed = F)). I'm guessing this is because the code posted 9/7/2011 doesn't include updates to use lambda.3. If I take out the lambda part, I get a "subscript out of bounds error," which I know must have something to do with matrices. I've tested the format with is.matrix and the rownames are identified (furthermore, I've successfully used phyl.pca, which has similar format requirements).Any suggestions? I'd be happy to send my code and files if needed.Thanks for taking a look at this. I enjoyed the comparative methods symposium at Evolution in Ottawa! Mercedes Burns

I have actually fixed the problem I had with phyl.ccaI just had to re-order the rownames in the data matrices to match the order of the taxa in the tree, and it worked like a charm!For the record, I'm using version 0.1-8.arley

The first thing I would do is install the latest version of phytools. You should first detach phytools or (better yet), start a new R session. You can then update the package using update.packages() or install.packages("phytools"). Try to run it again and if you are not able, please let me know.

Thanks for your response, Liam! Issues 1 and 2 are resolved, but I'm still getting the subscript out of bounds error. I'll try Arley's solution of reordering the rownames, but yes, let us know if there's a fix!Mercedes

Hmm, the reordering rows trick didn't work for me. I get a subscript out of bounds error when I run: data<-phyl.cca(tree,X,Y,fixed=F)X is a matrix with 5 columns, Y is a matrix of 2 columns. The rows are identical in number and name. Any thoughts? Thanks again.Mercedes

Using brownieREML(), I am testing whether the evolutionary rate of a continuous trait (y) is contingent on a discrete multistate trait x <-c("a","b","c").

The evolutionary history of the multistate trait was reconstructed with the function make.simmap(x, model=c("SYM")). However make.simmap assumes that transitions are symmetric (model="SYM"), such that transitions from "a" to "b" are as likely than from "b" to "a". However i don't have a biological justification to assume this a priori. So, I wonder if there is a statistical justification for this assumption or should i try other models (e.g. model=c("ARD"). Unfortunately, results change when i do the latter.

Many thanks for you time and help, and congratulations for this helpful Blog.

If your results seem sensitive to the mapping method, you might also consider running stochastic mapping in the stand-alone program SIMMAP. It can do things not presently implemented in phytools, including sampling across model parameter values. You can export your data & tree using the phytools function export.as.xml (see ?export.as.xml for details) which will create an input file for SIMMAP.

Dear Liam, I am wondering, is it possible to give different colours to the taxa after pruning by using the drop.tip. I just want to show the locations of the used species in the whole tree. All the taxa in tree retains but with different colours? (e.g. used species blue or red and pruned ones grey)

Dear Liam Your fancyTree command is really cool and just what I needed. I am very new to this field and please excuse me for my silly question. I want to use this command but I have no idea how I can select the species that I want to prune. I have a super tree with thousands of species and there is my data set with few hundreds species. Now, I am not able to figured it out how to tell which species to retain and which to prune.. Could you help me in this regard.

Hi. If just want to, say, prune your tree to contain only the species represented in your dataset you can just do (assuming your data are in matrix X, and the row names contain species names):tips<-setdiff(tree$tip.label,rownames(X))ptree<-drop.tip(tree,tips)Or, if you want to visualize it using fancyTree:tips<-setdiff(tree$tip.label,rownames(X))ptree<-fancyTree(tree,type="droptip",tip=tips)- Liam

Dear Liam, thanks for the reply. I tried the code but the last line of the code, ptree<-fancyTree(tree,type="droptip",tip=tips)is not working. it gives the following messageError in plot.window(...) : need finite 'xlim' valuesI don't know why is not working.

Is there any possibility that the vector, tips, is either empty or contains all the tips in the tree (or, say, all but one). You may also want to update to the latest version of phytools to make sure that you have the newest version of the function.

Hey Liam, Do you have, or do you know of, a function that will do a phylogenetic MANOVA in which the canonical axes are evaluated while taking covariance due to phylogeny into account? I'm looking for something analogous to phyl.pca. I am aware that GEIGER has a function (phy.manova) to do a standard MANOVA and compare a test-statistic to a null distribution based on simulation under an empirically determined single rate matrix (ala Garland et al. 1993), but this approach does not provide phylogenetically informed canonical axes. I would like to know what variables are evolving differently in two groups of taxa. Also, neither group is monophyletic.

Unfortunately, I don't have a good answer. nlme::gls does not allow multiple response variables. Also, so far as I know, this has not been done with phylogenetic data - which makes me suspicious that simply extending generalized regression/ANOVA to multiple continuous dependent variables might be more complicated than it sounds. You could try the SIG-phylo list....

I would like to run a pgls analysis where I account for variation in the estimation of my species means. In other words, my response variable is a mean value calculated for each species in my phylogenetic tree. However, the number of samples used to calculate these means differs among the species in my tree. Is there any way to include a weighting factor that accounts for this variation (something like the square root of the sample size) in a pgls context? Do you have any suggestions for how to code this in R?

Have you tried the phytools function pgls.Ives? For this function, you need sampling variances on your estimates of the means for each species - which in this case would be the within-species variance divided by the sample size for each species.

If you do not have within-species variances, just the mean & sample size, please let me know as I'm pretty sure we could fit the Ives et al. (2007) model, but assuming that within-species variance was fixed across species and then estimating it. Obviously, it's better if you do know within species variances.

Note that with pgls.Ives you have two options for input: you can give the function the species means, variances of the means in x & y, and sampling covariance between x & y; or you can just feed the function with the raw data for all individuals - as long as the ordering is the same for x & y.

Unfortunately, so far I've only implemented the bivariate phylogenetic regression in this function.

Many thanks for your reply. I can calculate the within species variance, but have the problem that for some species I only have one observation (n per species ranges from 1 to 32, and I have 14 species in my analysis). I can try and implement the solutions that you suggest in an earlier blog (http://blog.phytools.org/2012/06/blombergs-k-phylogenetic-signal-with.html), but what do you consider to be a suitably homogeneous intraspecific variance to make this option a possibility? In my case it ranges from less than 1 to nearly 40.

Hi Nick. The within-species variances for a trait vary 40-fold? A couple of quick questions leap to mind:

(1) This is the variance, not the sampling variance, correct? The relationship is sampling variance = variance divided by sample size. phylosig() takes the sampling variances as input, but when we only have one (or very few) observation(s) per species, I suggest using the mean within-species variance as the sampling variance for those species (or the mean/pooled variance divided by the species-specific sample size). When we have only 1 sample, of course, the within-species variance and the sampling variance of the mean are the same.

(2) Are the data on a linear or log scale? Usually log-transformation will homogenize the variance. If your within-species variances (not sampling variances, remember) differ 40-fold, then it is probably because the trait varies widely among species as well and there is a relationship between the variance and the mean. Log-transformation will usually get rid of this.

I look forward to hearing more details about this and I will post the solution to the blog if it is likely to be useful for or informative to others.

Dear Liam,Just a small question. To you it might be a very simple question but I am not able to sorted it out and also I may not ask the question correctly but still asking it. I am trying to show the distribution of contineous trait data over the phylogenetics tree in colours. something like, 'tiplabels(pch=14, col=mycol(Trait data), cex=0.8)'but this gives me colours only at the tips as it should but if I want to colour the entire branch, how can I do it. Something like the background of your webpage with a lot of coloured trees.Thans

Dear Liam, Thanks for the reply. Yes, I am looking for this kind of function. I tried this function but it is giving me an error, "Error in while (x > trans[i]) { : missing value where TRUE/FALSE needed"Though "tiplabels" function is working fine. I must add , before using this function, I am also using the "drop.tip", function as well.Thanks

Dear Liam, thank you very much for this help. Ok, first I will try myself to sort it out if I am not able to plot tree then will surely send my data and tree to you.Thank you very much once again. In both cases, I will let you know.

Have you written any functions, or are you planning to, to simulate not just trait means, but trait variances for species? E..g, add trait variation as a output value for each tip along with the mean in fastBM.

Dear Liam,I have used make.simmap to obtain 1000 stochastic maps for a larger clade. It contains two sister-clades. My main interest is to compare transitions for these two sister -clades. describe.simmap gives average values for the entire clade, while, countSimmap provides number of transitions for 1000 trees. Is there a way to obatin these values individually for sub-clades within phytools? or Do i have to run the analyses separately for each sister-clade to obtain these values?

I contacted you by mail, but the blog may be a nice channel for ideas exchange as well, I believe*.

I'm writting some code to actually simulate geographic dynamics using this CTMC framework. I implemented my code using geiger, but now that I've delved into phytools documentation and also this blog, I'm quite tempted to re-do my stuff to use phytools capabilities. I like the way you can do it all (simulate, fit and estimate) in a very clean and fast way. Since I'm planning a serious simulation study, this seems rather fit.

Do you think it is possible to incorporate an asymmetric Q, provided the restrictions for detailed balance (upon exponentiation) hold?

Cheers and thanks for providing,

Luiz

*I'm sorry to have posted this outside the general comments section, the first time.

I'm wondering, do you know of a way of quantifying the amount of variation in reconstructed trait values in ancestral vs. younger nodes (in R), i.e. answering the question: "When in evolutionary history did change occur in said trait?"?

I've reconstructed values and all, I just do not want to have to calculate edge values by hand for a 200spp phylogeny :)

I am running the phy.cca to command of phytools for my dataset ( which I have sent to you by e-mail. Name: Nachiket Shankar ; Subject: phl.cca) I constantly get this error - and am unable to figure out the problem!

I just ran some of the code you sent me. It looks like you are trying to do CCA with only one x & one y? This is not the purpose of of CCA, which is designed for more than one x & y (essentially a multivariate correlation, see description here). You may just want a regular (phylogenetic) correlation, although it is a little difficult for me to recommend that without more information about your data & problem.

Is there a way to 'slice' a mutational map into successive time frames (or eras) and tally character state changes within each slice as a unit? I am curious to know if there is a method that allows one to estimate the amount of character change over units (bins) of time.

I am trying to test the effect that constraining the root state has on the number of transitions by setting pi = c(0,1) or even pi = c(.001, .999) but it still results in the root being of state 1 not state 2. Is it possible to force the root to be a particular state?

Just a question about computation of single rate BM in Brownie.lite.Why did you use optimtimization of the likelihood instead of computing directly the MLE? There is cases where the results could differ?

Hi Liam,I've been using densitymap function to summarize mutational maps made in SIMMAP. I have been unable to obtain the posterior probability values for each branch of the density tree as shown in the plotted tree. Is it possible to extract these values when running densitymap?

this brings me to my second question. I have a continuous trait mapped for the same tree. For each branch of the tree this trait takes a value between 0 and 260 (only integer numbers). Is there a way to estimate the correlation between this trait and the probability values from densitymap? Note that this will be a correlation between values in the internal branches, not with tip values. Is this analogous to a correlation between ancestral states?

brownie.lite is throwing me an error and I cannot track down why. When I enter" H.l.bm <- brownie.lite(Nym.pruned$phy, Hosts, maxit=1000)"it runs for 10 sec and then:Error in 1:m : argument of length 0

Nym.pruned$phy is a an object of class "phylo" with a mapped discrete character and Hosts is a vector with names corresponding to the species names in Nym.pruned$phy? If that the case, then let me know and feel free to send me your data (or .RData file) so I can investigate.

Hi Liam,thanks for both the great blog and phytools :o)I was wondering if I might ask you for your help. I am trying to test for phylogenetic signal in my data using Pagel´s lambda (phylosig function). I have the latest version of both R and phytools. I import the phylogeny via read.nexus command. I even sorted the species data so that they are in the same order as phylogeny. However, I end up with this error message every single time:

"some species in x are missing from tree, dropping missing taxa from x"[1] "some species in tree are missing from x , dropping missing taxa from the tree"Error in vcv.phylo(tree) : the tree has no branch lengthsIn addition: Warning message:In drop.tip(tree, setdiff(tree$tip.label, names(x))) : drop all tips of the tree: returning NULL

the same data and phylogeny work just fine in other analyses - PIC, calculation of K through Picante multiPhylosignal. Do you have any idea what might be the reason? I assume it is some silly mistake I am making, but have hard time figuring it out.Thanks in advance for any reply,Katerina

Hi,I'm having the same issue with my phylogeny tree, and I was wondering if you figured out what was wrong and how to fix it. My tree does have branch length, but I keep getting that same error when running phylosig.Thanks.Onja

I'm using the evol.rate.mcmc, minSplit and posterior.evolrate functions to calculate and plot splits in trait evolution rate on a phylogeny. I have no problem with the evol.rate.mcmc and the minSplit functions, but when I use the posterior.evolrate function I the following error pops up:

I am very new to this field and when I am trying to get phylogenetic distances or with other phylogenetic commands I get a similar error as mentioned above by Marina. Input;- t1<-read.tree(text=aa)- sample<-read.table(file="densityplotspecies.txt")- pd("sample","t1",include.root=TRUE)(with aa as a phylomatic newick output)

I tryed a lot of different things but I keep ending up with this error:Error in tree$edge.length : $ operator is invalid for atomic vectors

I'm trying to find a function that returns the node number of the most inclusive clade containing some tips, but not others. For example, for the following tree (((1,2),((3,4),((5,6),((7,8),(9,10)))) I want the most inclusive clade containing tips 7 and 9, but not 3. The answer would be the clade containing 5,6,7, and 8,9,10. Thanks in advance for any help you can provide. Joey

I am using PGLS to examine scaling among morphological characters. I wonder if you have advice on the most appropriate way to test for significant differences between the observed PGLS slope and isometry?

Is it advisable to constrain the slope to isometry and then compare the observed and constrained slopes using a t-test?

Hi Liam, I am looking for a way to do a phylogenetically independent PCA that accounts for intraspecific variation within a species. I thought I could indicate members of the same species as a polytomy with branch lengths=0. As a test, I created a tree with one branch that has length=0. I then ran phyl.pca and it crashed R. Any suggestions? There must be a better way to do it. -Will

Yes, this will not work because the phylogenetic covariance matrix among species in this case will be singular.

One tactic is to use species means to get the rotation, and then rotate the individual data using the eigenvectors from the among-species PCA. This allows you to obtain PCA scores from the phylogenetic PCA for all individuals in your analysis, but it does not take into account intraspecific variation in performing the phylogenetic PCA. - Liam

I have been trying to use ancThresh for a while. Once the previous issues were solved, now I get this,> mcmc<-ancThresh(tree,X,ngen=1000,control= list(piecol=cols,tipcol="estimated"),label.offset=0.01)**** NOTE: no sequence provided, column order in xMCMC starting....gen 1000Error in burnin:nrow(mcmc) : argument of length 0Could you please provide me with some help?

This is probably because you need to increase ngen above the default (or decrease the sample interval). The default setting for the number of generations of MCMC to run, the sample interval, and the default burn-in are mutually incompatible! Try increasing ngen=100000 and see if that works.

findPath is an internal function in Rphylip, not part of the namespace. To see its guts, you can type:

Rphylip:::findPath

at the R command prompt.

If Rphylip is not working in Linux it probably because it cannot find where you have put the PHYLIP executables. This is not too surprising as findPath only searches in places where Windows & Mac OS users would normally put these files. For any R session you can use the Rphylip function setPath to set the path - and then you will not have to set it again. For more information about how to use setPath type:

I am trying to reconstruct the ancestral state of a discrete genic characteristic across many loci. For each locus I am randomly sampling 100 trees from the posterior, performing 10 stochastic maps for each sample, and then outputting using the describe.simmap function. In my gene trees I have strong prior knowledge that one particular species should be the outgroup. Is there a way to assign an outgroup to the outputted tree using describe.simmap?

While I have your attention, is there a clever way to summarize ancestral state reconstructions across multiple loci? I am having trouble conceptualizing a method to so because the character I am reconstructing is one of the gene and not the species on a whole, and the phylogenys that I am reconstructing on will most likely have slightly different topologies. One summary statistic I can think of is to plot out the distribution of transitions from one state to another. But, that is all I got.

Any help, or pointing in the right direction would be greatly appreciated!

Hi, I'm trying to run Pagel's correlation method. However I'm having some trouble. I followed all code posted here (http://blog.phytools.org/2014/12/r-function-for-pagels-1994-correlation.html) and everything goes nicely. But when I use my own data:

I am trying to run the "aov.phylo" function in geiger and am getting the following error: Error in aov.phylo(matrix ~ grp, tree, nsim = 50, test = "Wilks") : 'formula' must be of the form 'dat~group', where 'group' is a named factor vector and 'dat' is a data matrix or named vector.

Hi Liam, thank you very much for sharing all these useful information with the scientific community and for the time and support you devote and provide.

I have a couple of questions about the way in which the package caper handle the "relevel" and "anova" function in R. These work fine in other packages such as ape but when i try a PGLS analyses in caper i get some issue. The "relevel" function runs fine but the output table comes out with the original values in terms of intercept of reference. When i ask for an anova table instead i get this:

Hi Camila. It is not, but if you have estimated ancestral states from a different function, it is possible to use contMap to plot them. There is an explanation of how to do this here: http://blog.phytools.org/2014/08/droptip-method-for-object-of-class.html. If this is what you want to do, let me know and I may write another post about it or built a function to make it easier. - Liam

I'm trying to apply regressions, ANOVA and ANCOVA using the pGLS. When I am using discrete predictor on O-U model (corMartins), the following problem appears for most of the associations I tried to fit:

There are several functions available for simulating rate shifts in trees (e.g., SimTree, TESS), but these implement tree-wide shifts, which are somewhat unrealistic, rather than shifts descending from a particular node. Is it possible to implement a more realistic rate shifted tree? Do you have any ideas on how it might be done?

This wouldn't be too hard to do; however I'm surprised that it is not out there. Perhaps you should post your query to the R-sig-phylo email list as if it has already been implemented in software, someone on that list will surely know.

I am have issues with some code that can potentially be a bug - some of the colors are printing in the wrong order. I am attempting to plot colors by states on a pie graph for node labels. I used the two codes below. For the second code, if I list the names in alphabetical order of trait, it plots correctly.

On a separate note, I was wondering if there's anyway to use densityMap for more than one state? I have 9 discrete states and a tree with 1600+ tips. The plot with pie charts plotted comes out a bit messy since there are so many nodes. When I plot the tree as a fan it appears a little better, but still a bit cluttered.

Thanks very much for the amazing blog! It has made my brief foray into phylogenetics a lot more bearable!

I have a question about testing significance of the slope for pgls.Ives. I know this has come up before and the suggestion was to constrain the slope to 0 and run a likelihood ratio test. This all seems fine so far. As far as I can tell, you can specify the upper and lower bounds of the slope to be zero in the optim function. However, this then fails due to non-finite difference values, I think in the likelihood function. Is there a way round this? You can set the slope to be close to zero, but not actually zero. All of this could be completely wrong however, so any help would be much appreciated!

Hi Liam, I am simulating correlated phylogenetic data by using fastBM and then applying the correlation structure by multiplying the data matrix by the cholesky decomposition of my correlation matrix (M) to generate a new data matrix (newdata). To test that the results make sense, I calculate the phylogenetically independent vcv matrix using the method that you describe in your 2009 paper on phylogenetic size correction and pca. Once I have the vcv matrix, I convert it to a correlation matrix (B) and multiply newdata by inverse of the cholesky decomposition of B. This produces a data set very close to the original data simulated under fastBM so long as the number of characters is less than the number of taxa. If it's more, I get an error along the lines of "the leading minor of order 101 is not positive definite." The reason that I've determined is that the number of positive eigenvalues of the phylogenetically independent vcv is equal to the number of taxa minus one. Have you ever encountered this problem before? Is there a good solution?Best regards, -Will Gelnaw

I'm trying to test if my trait of study has been canalized over time. I don't just want to know if the breadth of the trait has been reduced, but I also need to know if the current trait values falls within the ancestral values.So far, my best idea has been to separately reconstruct max and min values for the trait and use that to estimate ancestral breadth. It might be a little tricky though because its a multivariate trait. I've been thinking that I would have to separately reconstruct the max and min for each PC axis.If this method is flawed or there's already a way to test this, can you let me know?Thank you for all the help you've already given me through this blog. You are a hero of grad students!

Thanks for providing such great resources. I have a couple of questions that I cannot seem to find answers for (and hope I have not missed anything obvious).

Is it possible to outline branches in plotSimmap? I would like to use black, grey and white for simplicity, but only if I can see the white branches...

Also, I notice that the colour at the nodes (vertical branches on a square phylogeny) are the colour of just one of the tipwards branches (the bottom branch). Is it possible for the nodes to be coloured equally by the two branches?

I used PGLS to analize my data to test for relationships between to continuous variables. I do find that there is a relationship between those variables. However, when I see the data I think a quantile regression would be a much better approximation. Do you know if there is a way to correct for the effects of phylogenetic relationships when performing a quantile regression?

Thanks for your answer and, also, thanks for keeping this blog. It has been really helpful in the past.

I'd like to plot/analyse a phyl.pca object with existing packages like e.g. with ggbiplot and thus am in the situation where I'd like to convert it into a prcomp or princomp object with sdevxrotationscalecenter attributes filled. I'm calling phyl.pca as follows: pc.pca<-phyl.pca(tree,df,method="BM",mode="corr")

I compute/fill the necessary attributes as:sdev=sqrt(diag(pc.pca$Eval))x=pc.pca$Srotation=pc.pca$Lscale=FALSEcenter=FALSE

But somehow this doesn't seem to be correct. The sd seems to be wrong.Also I'd like to know how to obtain scale and center values...

I am looking for a function to calculate the support of nodes in consensus trees (e.g. Bremer support) but I found none. Do you know if any package includes that? Ape, which has the function consensus() does not.

Hi Borja. You can get bootstrap values for a set of trees from which we have computed a consensus tree using the function prop.part from the ape package.With regard to Bremer support - there may be some way to compute these values using the phangorn package. I would email Klaus, the author of that package, or R-sig-phylo.Borja, it was great to meet you in Tucumán. Hopefully our paths cross again in the future!All the best, Liam

I used the function phyl.pca with method="lambda" to perform a pPCA. The optimal lambda is less than 1 (0.747, precisely). How can I test if this is significantly different than 1 for the same dataset and tree?

Hey Liam, thank you so much for solving other people's problems. I got some of your tips by Google and now I can calculate K values & its p-value after three weeks dealing with data format (rooted & dichotomous) and so on. Keep doing good things. Cheers.

Hi,I am trying to generate a list of apomorphic discrete character changes for all nodes in a tree in R. This is the kind of thing that nearly every piece of phylogenetics software outside of R can do, but I need to extract data from such a list for a large number of trees in a bayesian sample. Any suggestions?JS

It may sounds like a very simple question but I need to simulate birth-death trees by means of the following code. It does not always return wanted number of trees so I was wondering how can I change the code to make it conditioned on the number of returned trees with n extant species.

While counting the number of clades in a set of "rooted" bootstrap trees using prop.clades(), I see, apparently erroneous, results. I have posted the question on stackoverflow (the link is http://stackoverflow.com/questions/33811950/how-to-properly-use-prop-part-and-prop-clades-functions-in-ape-for-bootstrap).

Can you please quickly look at the post to see if I am calling the wrong method or its just that my interpretation is wrong.Thanks in advance,IkramKarolinska Institute, Stockholm

The point is that the taxa names of the phylogeny and the dataset dont match, but I cant understand why. Even if in the error the taxa names of the dataset appear to be number, in the real data set they are not. I have checked the taxa names of the phylogeny and the data set and they do match. So I dont understand why instead of the real taxa names that I have in my dataset, the function uses the numbers of the rows of the dataset (these are not a column). Am I clear?

I have encountered an issue in the behavior of tiplabels when added to a densityMap plotted facing leftwards. When I try to add pie-style tip labels, I get the following error:> tiplabels(pie=to.matrix(Short[HeightDMap$tree$tip.label],sort(unique(Short))),col=cols2)Error in symbols(xpos, ypos, circles = radius, inches = FALSE, add = TRUE, : invalid symbol parameter

The command I am using to plot pie tip states works perfectly when the density map is plotted to the right. Using thermo instead of pie with an identical argument works fine facing left as well. I tried it with the original ML tree too, and pie worked fine facing left as well. I haven't been able to find any mention of this specific scenario here or elsewhere. Any suggestions?

Margaret

p.s. I'm still learning to use phytools, but so far it's impressively powerful!

I have been using your function 'treeSlice' to section a tree at different time intervals (5mya, 10mya, etc). However, the subtrees are collapsed onto the crown, so that a subtree cut at 5mya will have an age of 2my, because the first 3my is a single branch. Do you know of a way to either prevent this from happening (so the subtrees are not collapsed onto the crown) or, after the fact, append a branch at the root (to "restore" the stem)?

It works well, but there are some mismatches between the real characters and the tips states on tree.

from the picture, some mismatches could be find:For our BRP as "1" character, but it labeled as "?" in tree,For MH_I, MH_C were as "1" character, but label as "0";For Svulgaris, Slatifolia were as "0", but labeled as "1";For Sturkerstanica as "?". but it was labeled as "1"

I have 22 characters, I try the other characters, they also show this mismatches.Do you know what's the problem for those? any suggestion on this?

After reconstruction of ancestral characters by rerootingMethod and make.simmap, I used nodelabels to plot pie charts of reconstructed characters onto internal nodes. However, I found the default size of the piecharts are too big for my tree. May I ask if there's any way to adjust the size of piechart in phytools?

I have produced LTT plots for a number of clades, but the phylogenies they are based on are not scaled to real time (instead they are ultrametric trees scaled to substitutions/site).

This makes the relative time x axis on the LTT plots not very useful to me. If you knew of any way to manually callibrate real time scales directly to the LTT plots, I would be very interested. I have root divergence dates from the literature that could be used in this, but I'm not sure how to add these to the plots.

I did reconstructions of ancestral discretely-valued states under ARD and SYM models (ARD was suggested the best model according to comparison by using geiger, and SYM the second). Likelihood values for the ancestral states at a few internal nodes were negative, and thus failed the plotting of nodal piecharts. Reconstruction under ER generated no negative likelihood values.

No, but my guess is that they are zero and are showing up as very slightly less than zero due to numerical precision (limitations in how precisely a computer can store a number). You could try rounding the object to (say) 8 digits, i.e.:

fit<-ace(....)fit$lik.anc<-round(fit$lik.anc,8)

## or:

fit<-rerootingMethod(....)fit$marginal.anc<-round(fit$marginal.anc,8)

If this doesn't have the effect of zeroing the negative values, then please send me your saved workspace & I will see if I can figure it out. - Liam

I have a quick and very basic question: is there any function in phytools to fit models of continuous traits evolution (something analogous to fitContinous in geiger) that admits missing data in the traits matrix?

#add a tip to an internal branch above the node shared by t2 and t3x<-match("t2",tree$tip.label)y<-match("t3",tree$tip.label)a<-Ancestors(tree,x)b<-Ancestors(tree,y)z<-a%in%bnode<-a[min(which(z))]position<-0.5 * tree$edge.length[which(tree$edge[,2]==node)]new.tree<-bind.tip(tree,"t6",where=node, position=position)

I've seen comments about phangorn::midpoint messing up ladderize, but it looked like the reorder function would fix it, which doesn't seem to be the case here. Any info would be great Liam, Thanks!-Travis

I am trying to do phylogeography using make.simmap to estimate ancestral states with nsim=100. I am concerned about that I have biased sampling for some of the groups and how can this affect the results. However, I see the object $counts created after running make.simmap with nsim contains a first column (N) with a different number per simulation. I would like to know what is this function really doing during the simulations, is it sampling randomly tips from the tree each iteration independently of their prior group or is it sampling taking into account the number of tips in each group (so, balancing number of tips from each group)?Many thanks for your help,Best,

I am trying to do a simple pPCA with a molecular phylogeny I made for 13 species. I have morphometric data for 6 morphologies on each specimen (n=2310). I have 2 questions:Q1) in my X matrix for the phyl.pca() function am I just using the average measurement for each morphometric?Q2) I keep getting the error: Error in invC%*% X : requires numeric/complex matrix/vector arguments.

When I convert X from a data frame to a matrix it changes to characters from numeric and I'm wondering if that has something to do with this problem?

Thanks for a super helpful blog! One question- I've noticed a few papers, when performing an ancestral state reconstruction, saying they only set nodes "with significant support (i.e., a single state that was at least 7.39 times more likely than the next most likely state; Pagel 1999)."

I need to add a sister taxa to a nexus tree I have in R and have pruned.

I followed your tutorial on adding sister taxa (http://blog.phytools.org/2015/08/adding-tip-to-sister-taxon-on-tree.html) but before I can do it, I need to convert the tree to a phylo object.When I try to do that, R gives me this message:Error in UseMethod("as.phylo") : no applicable method for 'as.phylo' applied to an object of class "character"Any idea why?

Thanks very much for a fantastic blog! It has really helped me with my research.I have a quick question regarding the (potential) double use of phylogenetic correction. We have phenotypic traits (morphological), and functional traits (bite force). We have used a phylogenetic PCA to obtain phylogenetically corrected PC scores of the morphological traits. We would like to regress the first PC axis scores against the bite force. Would it be statistically correct to use the phyl.PC scores and the bite force values in a PGLS? We are concerned that phylogeny would be corrected for twice by doing the stats in this way. Should we use traditional PC scores and bite force values, and then do a PGLS?

I have been doing some simple tip additions using the add.species.to.genus function looped across a multiphylo using a list of taxa with where="random". However, every 15 to 20 trees, an error is thrown that a tree is not ultrametric. Inspection of the problem trees shows that occasionally, a tip will be added with a branch length longer than the remaining length of the tree. This of course make the tree no longer ultrametric and causes the error to be thrown when my code tries to add the next tip. I am wondering if you have any thoughts on why this might be occurring, or how I might try to solve it. Is there a minimum branch length that must be added using this function?

Thanks for the great resources that you provide. However, I am running into a problem with the midpoint.root() function for my tree. I get the error message > Error in phy$edge[, 2] : incorrect number of dimensions

I was able to narrow done the source of the error to the splitTree function that is called by the reroot function which is called by the midpoint.root function but I didn't get any further.

My Tree is generated by FastTree and made ultra metric with PATHd8 (although the same error occurs before that step).

Hey again. I dogged a bit deeper and the problem seems to be that splitTree() calls extract.clade() and in extract.clade, anc is found to be node 1117 and the smallest of 7 next.anc is node 1. Therefore *keep* evaluates to *start* + 0:0 and hence just *start* of length == 1. The the edge is updated to phy$edge <- phy$edge[keep, ] which evaluates to an one-dimensional numeric vector of length == 2 wherefore TIPS <- phy$edge[, 2] throws the error above.

Yet, although I understand what happens. I don't really understand why it happens and what I could do to fix it. Any help would be greatly appreciated!

Thank you for a fantastic toolset. In some tree-building programs (like RAxML), samples with identical seq. data are removed from analysis. I'm wondering then if it's possible to label tips with multiple sample names. I'm interested in using your co-phylogeny tools to examine consistency between different trees based on plasmids from the same bacterial sample, however, I want to make sure the same sample names are included in each of the trees, if that makes sense.

I am really battling with phytools as i am very new to the program. I need to get the mean values at each of the nodes in my tree. I have run ace and succeeded but i am battling to get the means for the nodes. Please can you help.

Hello Liam, I am using phytools to compare rates of speciation between species with different phenotypic traits using the Threshold Model with discrete charecters which are coded for in binary in my matrix. I am getting the problem of "subscript out of bounds" and I am not sure how to fix it. My tree is in nexus format, with 106 tips. The tip labels have been changed to numbers (I guessed from 1 - 106 according to their order in my input file?). My matrix is 3 columns, first represents the species number, the second is if they represent one trait and the second is if they represent the other. This matrix is 106 columns too. A species has to be one of the traits but cannot be both. I wondered if you are able to suggest a reason why the subscript is out of bounds? Many thanks, Josie

Hi Liam, Thanks for the great resource! I have a question that isn’t necessarily related to phytools, but phylogenetics in R in general. I’m wondering if you have some insight and potentially a phytools-based solution.

I’ve been having some trouble working in R with unrooted trees (with bootstrap values) inferred with RAxML like this one:

I believe this has to do with the fact that Ape (and other R packages and tree viewers) treat support values as pertaining to nodes on the tree rather than internal edges. This problem is described in: L. Czech and A. Stamatakis, “Do Phylogenetic Tree Viewers correctly display Support Values?,” bioRxiv, p. 35360, 2015.

I found a work-around that I think will work for me, which is to plot the rooted tree but display the node labels from the unrooted tree (see below).

plot(rooted_tr)nodelabels(tr$node.label)

I'm just wondering if you have any ideas for avoiding this quirk when rooting trees in R.

Hi Lima,Thanks for creating such a great package and blog. I'm having a problem removing tip.labels from a countMap plot. I have been using contMap with plotTree.wBars but I've decided I no longer need the bars and would just like the contMap but without labeled tips, which I cant seem to do? im' ploting a 450 tip tree as a "fan". I have made a work around buy still using plotTree.wBars and making x = 0, and border="white", which is fine but I thought there might be a more elegant way?Best regardsMatt Larcombe

I have tried to build a SIMMAP null model. So, I have shuffled the tips (discrete states) from original tree and re-ran a SIMMAP (make.simmap function) to store the ages of state transitions . When I run make.simmap using original tree, everything works well. However, when I shuffle the tips, some estimations last forever. Are there some combinantions unfeaseble to reach a stable solution? I guess sometinhg happens with the estimated Q matrix. I would appreciate a lot any advisement.

Is there a blog post / function capable of combining the functionality of contmap and dottree? I'm looking to plot a continuous character with a color ramp on the tree with a discrete character dot next to the tips of the tree. If I there is a post on this I have missed it in my search and would appreciate being pointed in the right direction if it is out there.

I really enjoy the plotBranchbyTrait function. I'm plotting substitution rates and the color scale really makes the long branches pop.

Everything was going great for a few weeks, and now with the same trees I used earlier I'm no longer able to get the scale bar to add. Oddly, I can get it to add with your demo example, so I guess it's something with my trees/data, although the same script worked fine a few weeks back. It plots my tree with lovely colors, but never asks about the scale bar.

is there a function in Phytools similar to "plotRateThroughTime" or "getRateThroughTimeMatrix" of the package "BAMMtools"? These methods allow to plot evolutionary rates through time, e.g., http://lukejharmon.github.io/ilhabela/instruction/2015/07/02/diversification-analysis-bamm-rpanda/ (see the "Rate through time plots"). Your rates (or any other continuous data) are associated with the edges of your tree and plotted against time.

Unfortunately, the BAMMTools methods only work with bammdata objects (I'm also not sure, whether they can handle non-ultrametric trees). I was just wondering, whether phytools offered a similar option for more generic data (tree + vector or your data)?

After all, using "plotBranchbyTrait" of the "phytools" package already enables us to plot continuous values (as colours) onto the edges of our phylogeny. The generated object must therefore already contain the association between edges and our continuous data. Is there an easy way to plot our data against time (or any other branch length unit) as in the examples mentioned.See Cooney et al. (2017, fig. 2c) for another example of these plots:http://www.nature.com/nature/journal/v542/n7641/full/nature21074.html

Hi Andres. phangorn can analyze any type of discrete character through the data type "USER", not just molecular data. I'm not sure how Wagner parsimony is defined for continuous traits. It is possible this is equivalent to what was once called 'linear parsimony.' Linear parsimony, if I remember correctly, involved identifying the set of ancestral states with the minimum sum of absolute values of changes. There is a related method called 'squared-change parsimony.' This method involves minimizing the sum of squared changes; however if we take edge lengths of the tree into account we have 'weighted squared change parsimony' - which turns out to give us the same solution as the ML estimates under a Brownian assumption. I'm not aware of any similar justification for linear parsimony, nor am I aware of an implementation of linear parsimony in R - although it would not be hard. - Liam

I was wondering how I can exclude the past few million years of branches from my time tree. I have a phylogeny that began diversifying around 40MYA and I wanted to see how the gamma statistic value changes when I exclude recent speciation events. I tried searching for this, but may be I am not searching by the right key word.Regards,Aparna

I have encountered an issue when I wanted to plot a phylogenetic tree with branched colored by values for a quantitative trait and I'm hoping you can help me out with.The problem is that the values of trait is assigned randomly to the species so the tree does not represent the actual data. Here is my code:library(phytools); library(picante)TREE.R20120829<-"(((((((((((((((((((((((Viola_riviniana,Viola_americana,Viola_pedatifida)Viola,(((((((((((((((((((((Trifolium_repens)trifolium))))))))irlc)))))))))papilionoideae))))fabaceae))fabales)))fabids))rosids),(((((((((((((((((((((Verbena_stricta)Verbena)verbenaceae)))),((Veronica_montana)Veronica)scrophulariaceae))))))lamiales)))lamiids,((((((((((((((((((((((((((((((((((Artemisia_dracunculus,Artemisia_frigida,Artemisia_giraldii,Artemisia_sacrorum,Artemisia_scoparia,Artemisia_tridentata)artemisia))))))))))))))))))))))))))asteraceae)))))asterales))campanulids))ericales_to_asterales)asterids)))))core_eudicots)trochodendrales_to_asterales)sabiales_to_asterales)eudicots)ceratophyllales_and_eudicots,(((((((((((((((((((((((((Zea_mays)zea)))))))pacc,(((((((((((Triticum_aestivum)triticum,(Bromus_briziformis,Bromus_inermis,Bromus_tectorum)bromus))))))))))bep))),(Festuca_arundinacea,Festuca_idahoensis,Festuca_ovina,Festuca_rubra)Festuca,(Agropyron_cristatum,Agropyron_desertorum)Agropyron)poaceae))))))poales)commelinids))))))monocots)poales_to_asterales)magnoliales_to_asterales)austrobaileyales_to_asterales)nymphaeales_to_asterales)angiosperms,((((((Pinus_massoniana,Pinus_sylvestris,Pinus_taeda)Pinus)pinaceae)pinales)))gymnosperms)seedplants)euphyllophyte;"

About this blog

This web-log chronicles the development of new tools for phylogenetic analyses in the phytools R package. Unless you a reading a very recent page of the blog, I recommend that you install the latest CRAN version of phytools (or latest beta release) before attempting to replicate any of the analyses of this site. That is because the linked functions may be archived, and very likely have been replaced by newer versions.