Menu

NMDS Example

July 20, 2016

No Comments

Here I will present a real-life example of how to go from your OTU table to NMDS plots and hypothesis testing in R:

We recently concluded a project that sought to determine whether snail food sources (leaf-surface microbes) were significantly different between current snail sites and proposed enclosure locations. The issue for the Oahu Army Natural Resources Program was that these endangered snails are being eaten by predators (rats and other invasive snails). They have developed an effective way to build a structure that keeps predators out, but some of the places where these endangered snails currently live are on steep and remote mountain ridges which makes construction work impossible. The idea was to build safe spaces nearby on level ground and move snails into the enclosure, but there are a lot of factors that play into whether they will survive the move. One of these possible factors was their diet; If the microbial communities on leaf-surfaces were significantly different between current and proposed sites it might have negative consequences for snail fitness.

I noticed that some of my samples don’t have any OTUs in them (zeros all the way down) so I want to remove those. We can check for, and remove, samples matching any criteria we like from both the OTU table and the metadata table at the same time.

There are more parsimonious ways of doing this, but this way you can easily adapt it to remove samples with richness less than any value you like…that’s why I bothered with decostand().

Now that we have gotten rid of useless samples, we need to transpose the OTU table so that samples = rows and OTUs = columns, because this is how the vegan package likes it.

t_otus <- as.data.frame(t(otus))

Next, we will rarefy our data set down to the lowest OTU abundance so that we can compare evenly between samples regardless of sampling depth artefacts. There has been some recent debate about the efficacy of rarefying community data (read a very thought-provoking paper here) and there is growing consensus that rarefying data down to a “lowest-common-denomenator” is not statistically valid and can lead to loss of statistical power. That said, it is currently still a common practice and for the purposes of this example, we will go ahead and rarefy rather than normalize the data in other ways in order to focus on the NMDS process. Note that rarefaction and rarefying are two completely different things!

Above, we determined our OTU count for the lowest abundance sample and then rarefied the data to that (randomly selected only 3,679 hits from each sample). Next, we want to transform our rarefied OTU table (square-root) and determine the best method for calculating a distance matrix from it.

It’s time to plot this and take a look at where our samples fall in “ordination space.” We will use ggplot2 because it’s lovely. It takes a bit of effort to get used to, but it’s an excellent package for plotting and comes with a ton of functionality. (Here is a nice intro tutorial for playing with ggplot)

Here we see that our ellipses (representing 95% CI around the centroid) have a lot of overlap. To me, this looks like the epiphyte communities aren’t very different for each other: Boring news for scientists looking for a story, but great news for the snails who might have to move to a new location. Still, we need to run some statistical tests to make sure. There are two main options for this sort of project: adonis(), and anosim(). Adonis is the vegan implementation of a permutational analysis of variance, and anosim is similar, but is an analysis of similarity.

Here is some more detail about the two methods (taken from the QIIME help page), though you should read more about them in the R help files:

Adonis is a nonparametric statistical method that takes a distance matrix and a category to determine sample grouping from. It computes an R2 value (effect size) which shows the percentage of variation explained by the supplied category, as well as a p-value to determine the statistical significance. Adonis creates a set by first identifying the relevant centroids of the data and then calculating the squared deviations from these points. After that, significance tests are performed using F-tests based on sequential sums of squares from permutations of the raw data.

—–

ANOSIM is a method that tests whether two or more groups of samples are significantly different (similar to adonis, above). You can specify a category in the metadata to separate samples into groups and then test whether there are significant differences between those groups. Since ANOSIM is nonparametric, statistical significance is determined through permutations.Note: ANOSIM only works with a categorical variable that is used to do the grouping. Mantel is recommended for continuous variables.

To do this with our data in R:

This performed anosim on our distance matrix with Location as a categorical predictor. Anosim has built-in summary() and plot() methods that provide a lot of helpful information for interpreting the results. Your anosim result should look like this:

This essentially says that our communities are statistically different from each other, contrary to what our plot seemed to show. Let’s take a look at PermANOVA, using adonis(), which is generally considered to be more robust than anosim().

adonis_location = adonis(otus_dist ~ Location, metadata)
adonis_location # take a look at results; there are no summary() or plot() methods included

Similar to anosim! Still, the real question we were asking with these data wasn’t about all seven sites together…we had some paired sites in mind. For example, the snails at the site called “Skeet Pass” are under consideration to be moved to the proposed site at “Kaala Bog.” So, the real question is whether each current site is different from its proposed site(s), not whether any of the communities are different. For a standard ANOVA we could just run a post-hoc test to determine which groups are different from each other (e.g., a Tukey test), but anosim and adonis do not have any valid post-hoc methods currently. In that case, what we can do is split up our data into paired sites and perform anosim on each subset, which should give us an idea as to whether there are differences between the sites that we care about.

So, there seems to be no significant differences between current and proposed site combinations, even though our first analysis showed that the sites were different from each other, in general. This makes sense intellectually, since the current and proposed sites for the snails were pretty close together spatially, but the three current snail sites were further apart. Bottom line is that as long as they don’t move the snail populations too far from where they are, the microbial community (food source) won’t be too different for them.

I hope this was a helpful example of how to get started doing (and plotting) NMDS and using the anosim() and adonis() functions to determine whether community compositions are statistically different.