We will be hosting mothur and R workshops throughout 2018. Learn more.

Parsimony

The parsimony command implements the parsimony method (aka P-test), which was previously implemented in TreeClimber and is also available in MacClade and on the UniFrac website. The parsimony method is a generic test that describes whether two or more communities have the same structure. The significance of the test statistic can only indicate the probability that the communities have the same structure by chance. The value does not indicate a level of similarity. The files that we discuss in this tutorial can be obtained by downloading the AbRecovery.zip file and decompressing it.

Contents

Default settings

By default, the parsimony() command will carry out the parsimony test on each tree in the tree file and will perform a global test. The global test determines whether any of the groups within the group file have a significantly different structure than the other groups. Execute the command with default settings:

This means that the tree had a score of 49 and that the significance of the score (i.e. p-value) was less than 1 in 1,000. These data are also in the abrecovery.paup.nj.psummary file. Looking at the file abrecovery.phylip.nj.parsimony you will see a table with the score of your tree and the distribution information for the 1,000 random-joining trees that were constructed:

As the output to the screen indicated, this file tells you that you had one tree with a score of 49 and that none of the 1,000 random trees had a score of 51. Alternatively, if your tree had a score of 110, this table would tell you that 44 of the 1,000 random trees (i.e. P=0.044) had a score of 110 and that 146 of the 1,000 random trees (i.e. P=0.146) had a score of 110 or smaller.

If instead of loading abrecovery.paup.nj you had instead loaded abrecovery.paup.bnj and run parsimony():

Each line in the output represents one of the 1,000 bootstrap replicates that are in abrecovery.paup.bnj and this output is provided in the file abrecovery.paup.bnj.psummary. The file abrecovery.paup.bnj.parsimony would look like:

The difference between this output and that of abrecovery.paup.bnj is that in this case you have supplied 1,000 user-generated trees via bootstrapping. This table tells you that you provided 186 trees that had a score of 51 and that 605 of your 1,000 bootstrap replicates had a score less than or equal to 51. All of the trees had a score less than or equal to 58 and thus, they all had a p-value < 0.001.

groups

Having demonstrated that the community structure for at least one of the three groups in the abrecovery.groups file were significant from the other two, you would now like to do pairwise comparisons. Note: You should not do pairwise comparisons if there is not a significant difference at the global level. A conservative method to determine the significance of your pairwise p-values you could divide the overall significance threshold (e.g. typically 0.05) by the number of comparisons that you will carry out. To do all of the possible pairwise comparisons you will set the groups option:

All of this tells you that the three groups harbor significantly different community structures from each other since the p-values are all less than 0.01667 (i.e. 0.05/3).

iters

If you run the parsimony() command multiple times, you will notice that while the score for your user tree doesn't change, it's significance may change some. This is because the testing procedure is based on a randomization process that becomes more accurate as you increase the number of randomizations. By default, parsimony() will do 1,000 randomizations. You can change the number of iterations with the iters option as follows:

random

If you just want to construct a distribution of scores for some number of random trees you want the random option. To do this type something like the following where the value given to random (i.e. random) is the root file name where you will put the output:

mothur > parsimony(random=random.parsimony)

You will then be guided through a series of interactive questions...

Please enter the number of groups you would like to analyze: 2
Please enter the number of sequences in group 1: 200
Please enter the number of sequences in group 2: 200

Here I built and scored 1,000 trees for two groups that each had 200 sequences in them. If we open the random.parsimony file we will see the distribution:

processors

The processors parameter allows you to specify the number of processors to use. The default is 1.

Fine points

Missing names in tree or group file

If you are missing a name from your tree or groups file mothur will warn you and return to the mothur prompt. Be sure that you don't have spaces in your sequence or group names.

Differences in implementation

A minor difference between the mothur/TreeClimber and UniFrac implementations concerns how the significance is assessed. We test the significance by generating a large number (e.g. 1,000) of random-joining trees and score each tree to generate the distribution. The UniFrac web site's implementation uses the input tree topology and randomizes the labels on the leaves of the tree a large number of times and scores each tree to generate the distribution. The difference in p-values is next to nothing; however, the random joining trees were in the original description of the method by Maddision & Slatkin (1990)

Revisions

1.28.0 Added count parameter

1.28.0 Bug Fix - name file info was not included in the creation of the random trees which effected significance values.