R for Ecologists: Permutation Analysis – t-tests

You’ve carefully designed your experiment, you’ve meticulously collected your data, and you have a hypothesis to test. Unfortunately, your data is typical of ecology data: small sample sizes, messy, and non-normal. Your ideal test, the t-test, won’t work because of the non-normality and sample size is too small to invoke the central limit theorem. All hope is not lost!

Permutational analysis are a fantastic way to analyze data from designed experiments where experimental units have been randomly placed among treatments (see Anderson 2001 Canadian Journal of Fisheries and Aquatic Science for a thorough discussion of permutational analyses). In fact, permutational analysis is great for these carefully designed experiments for two main reasons: 1) It frees us from the stringent assumption of normality and equal variances (update, not necessarily true – see comments); there are few distributional assumptions and 2) We can analyze any derived metric we like. Note that the assumption of independent observations still applies. There is just no getting around pseudoreplication.

First, let’s simulate some data. We’ll start with normal data so we can see the equivalence of permutational analyses and parametric tests when assumptions are met.

Let’s imagine you’re interested in phenotypic plasticity of Gambusia and you raise Gambusia under different predation regimes: predator-rich and predator poor. You suspect the Gambusia will have fast growth rates in predator-rich environments (they want to mature faster and have more, smaller offspring when predators are present). We’ll simulate the data on 15 mosquitofish from the two populations: Predator-Present (PP) and Predator-Absent (PA).

Even though the data from one group or the other may look non-normal, we know the underlying distribution is normal because we simulated from a normal distribution. We can use a basic t-test to determine if there are differences between the two populations

t.test(PA, PP)

You get a very significant p-value and a t-statistic of -5.95.

Now we’ll analyze it with a permutational test. Inherent in our experimental design was the random assignment of individuals to PP or PA treatments. We can view our observed data as just one of many possible arrangements of the data. We can shuffle the observations around and ask “If the observations are randomly assigned treatments, what is the probability of observing our particular arrangement of the data” (note that this is still the frequentist version of hypothesis, what is the probability of our data if the null hypothesis of randomness is true).

We first need to know just how many permutations we’re capable of so we don’t wind up repeating combinations. We have 30 observations, split evenly into two groups. There are 30-choose-15 = 155117520 ways to permute the data uniquely. I think we’re safe. (In R, this is choose(30, 15) ).

The last line is a little trick where you get R to assign all differences greater than our observed difference a value of 1 and everything else gets assigned 0. Then you add up the 1′s and divide by the number of observations to get the proportion of observations greater than our observed statistic. This is the same as taking a mean of a vector of 1′s and 0′s. For example, if I have two 1′s and two 0′s, then (1+1+0+0)/4 = 0.5, which is the same as mean(1,1,0,0).

The p-value is incredibly low, similar to what we saw with the t-test.

The results of the t-test and permutational analysis are again equivalent, although this will not always be the case. (My personal experience is that it is usually the case, so parametric tests are probably more robust to departures from assumptions than we’ve been led to believe).

This code described a t-test. Doing an ANOVA requires a few extra lines of code but is conceptually similar.