Pages

Monday, September 17, 2012

Update to phyl.pca so that "mode" (correlation vs. covariance) can be specified flexibly

A user recently complained the following about phytools function phyl.pca for phylogenetic principal components:

One little suggestion for protecting users from themselves might to throw up an error if the user misspells the mode parameters [Ed. note: which specifies whether to use the covariance or correlation matrix during analysis]. Currently, if you type: mode="cor"
It will work, and just run on the default covariance matrix. But "cor" is a misspelling of "corr", which will yield the desired analysis. It might be a common error, since cov and cor are both three letters. I just did this, but noticed the output was identical.

Good suggestion. I have now added a few lines of code to phyl.pca to fix this problem. Basically, the function now accepts any unambiguous shorting of covariance and correlation instead of just the three and four letter strings "cov" and "corr" respectively. In addition, if the match is imperfect, the function will return a message.

The updated function code is here; it is also in the latest non-CRAN version of phytools (v0.1-94, here) and will be in the next CRAN phytools release.

The way I achieved this is really simple. Basically, I used the base function strsplit to split the input mode into characters; then I matched the specified mode characters to strsplit("correlation",split="") or strsplit("covariance",split=""). Finally, if the match was imperfect (for instance, mode="corelation"), the function defaults to using the covariance matrix, but also spits an error.

Can't figure out what is wrong with the data. Do you have any idea? It does run the demo, so I assume my input files are somehow wrong. I have a tree in nexus format, and a trait file input in csv format (no missing data), with names linked to the traits (data <- traits[,2:7]rownames(data) <- traits$name).Hope you can help?

Does your try have terminal branch lengths that are zero? That's one thing that will cause singularity of vcv.phylo(tree), which means the phylogenetic PCA cannot be computed. If not that, maybe send me data & tree and I would be happy to investigate further.

Are there zero-length terminal edges in the tree? This makes vcv(tree) singular & can cause this error. The other possibility is that the covariance matrix among traits is singular, which happens if there are more columns (variables) than rows (species) in your input data matrix; or if some columns are perfect linear combinations of other variables.

In at least the first case, one workaround suggested to me Eric Stone at NCSU, is to use the pseudoinverse:

require(corpcor)solve<-function(x) pseudoinverse(x)pca<-phyl.pca(...)

For this to work, though, you need to load phyl.pca & all internally called functions from source. This can be done using:

Hi Liam,Unfortunately I keep on getting the same type of error. I don't have any zero length edges in the tree or more columns than rows in the data matrix. The data are partial warps from a geometric morphometric analysis. If I use that matrix and do a normal PCA I get identical results as from relative warps. I have no idea what may be the problem with the phyl.pca. Would you be willing to have a look at my data, please. Maybe I'm missing something.

This comes over a year later, but i had similar issues with phyl.pca,and found that, like liam suggested, this issue can arise when some variables are perfect linear combinations of other variables - this may be true of some climatic variables.

Trawling through some other forums I found a solution that by adding a random small number (0.00001 - 0.00009) to your data will remove the perfect linear relationship between variables without changing the underlying data such that the results will be altered significantly.

The other option is removing these variables altogether if you know (or can find out) which ones they are.

About this blog

This web-log chronicles the development of new tools for phylogenetic analyses in the phytools R package. Unless you a reading a very recent page of the blog, I recommend that you install the latest CRAN version of phytools (or latest beta release) before attempting to replicate any of the analyses of this site. That is because the linked functions may be archived, and very likely have been replaced by newer versions.