We will be hosting mothur and R workshops throughout 2018. Learn more.

Rarefaction.single

The rarefaction.single command will generate intra-sample rarefaction curves using a re-sampling without replacement approach. Rarefaction curves provide a way of comparing the richness observed in different samples. Roughly speaking you get the number of OTUs, on average, that you would have been expected to have observed if you hadn't sampled as many individuals. Although a formula exists to generate a rarefaction curve (see the example calculation), mothur uses a randomization procedure. It can also help you to assess your sampling intensity. If a rarefaction curve becomes parallel to the x-axis, you can be reasonably confident that you have done a good job of sampling and can trust the observed level of richness. Otherwise, you need to keep sampling. Rarefaction is actually a better measure of diversity than it is of richness. For this tutorial you should download and decompress AmazonData.zip

mothur implements its rarefaction calcualtions via randomization. There is a formula for calculating the values, but because it involves a number of factorial calculations, it takes a lot of time and memory to evaluate. By default, the rarefaction.single() command uses 1,000 randomizations to generate the rarefaction curve data for the observed number of OTUs (i.e. sobs). To run rarefaction.single() enter:

The left column indicates the label for each line in the data set and the right column indicates the row number in the data set. Execution of rarefaction.single() will generate the 98_sq_phylip_amazon.fn.rarefaction file, which will look something like:

The first column indicates the level of sampling intensity; by default this information is provided every 100 individuals. The subsequent columns are triplets. The first column of the triplet is the average number of OTUs that were observed for that sampling intensity based on the number of iterations, which is 1,000 by default. The second and third columns are the bounds on the upper and lower 95% confidence intervals for that average. In other words, the observed richness was between those two numbers in 950 of the 1,000 iterations. Obviously, the average and the confidence interval values are the same when one individual or all of the individuals have been sampled.

Options

calc

In the paper by Hughes et al. [1] it was suggested that people may want to rarefy various non-parametric richness estimators. Our experience indicates that for some reason this generates very weird and misleading results. Regardless, it is possible to rarefy any of the estimators and the output file will end in the name of the estimator with an "r_" prefix:

The above command will generate files ending in "rarefaction", "r_chao", and "r_ace". They will have the same format as described above. It is important to note that the 95% confidence interval data will be the confidence interval for the estimator, not the estimator's 95% confidence interval.

abund

By default the ACE estimator uses 10 as the cutoff between OTUs that are rare and abundant. So if an OTU has more than 10 individuals in it, then it is considered abundant. This is really just an empirical decision and we are merely following the lead of Anne Chao and others who implement 10 in their software. If you would like to use a different cutoff, you can use the abund option:

Looking at the file, 98_lt_phylip_amazon.fn.r_ace, you'll see that when the distance is 0.10, the final ACE estimate value is 101.1 (95% CI=75.5-158.8) compared to 161.4 (95% CI=120.3-228.4) when abund was 10. You will not see a difference when the maximum abundance is below the threshold.

iters

To improve the accuracy of the calculations you can change the number of randomizations that are performed using the iters option; the default value is 1,000. Running 10,000 randomization should take 10-times as long as the default:

label

There may only be a couple of lines in your OTU data that you are interested in generating rarefaction curves for. There are two options. You could: (i) manually delete the lines you aren't interested in from you rabund, sabund, or list file; (ii) or use the label option. To use the label option with the rarefaction.single() command you need to know the labels you are interested in. If you want the rarefaction curve data for the lines labeled unique, 0.03, 0.05 and 0.10 you would enter:

freq

For larger datasets you might not be interested in obtaining all of the data for the number of sequences sampled. For instance, if you have 100,000 sequences, you may only want to output the data every 1,000 sequences. Alternatively, if you only have 100 sequences, you may only want to output all of the data. The default setting is to output data every 100 sequences. By altering the freq option you can set the frequency that the analysis is performed:

processors

If you have a Windows computer, move on, this feature doesn't apply to you. If you're one of the cool kids, you get to use the processors option, which enables you to reduce the processing time by using multiple processors. You are able to use as many processors as your computer has with the following option: