Example 9.12: simpler ways to carry out permutation tests

In a previous entry, as well as section 2.4.3 of the book, we describe how to carry out a 2 group permutation test in SAS as well as with the coin package in R. We demonstrate with comparing the ages of the female and male subjects in the HELP study.

In this entry, we revisit the permutation test using other functions.

R

We describe a simpler interface to carry out and visualize permutation tests using the functions from the mosaic package.

We conclude that there is minimal evidence to contradict the null hypothesis that the two groups have the same ages back in their respective populations.

Now we demonstrate another way to run this test in a more general form, using the mosaic package’s do() function combined with the * operator to repeatedly carry out fitting a linear model with a parameter for female which will calculate our test statistic (difference in means between females and males) repeatedly after shuffling the group indicators. The shuffle() function permutes the group labels, and then the summary statistic is calculated.

Permuting data is a very awkward thing to do in data steps. But it turns out to be easy in proc iml (the built-in SAS matrix language). Here we read in the key variables from the data set (use and read). Then we generate the permutations (ranperm). However, this generates row for each permuted data set, while we need a column for each, so we transpose the matrix (t) before saving it. Then we save the resulting data in a SAS data set with the female variable. Note that we permuted the ages only, as opposed to the R example– it doesn’t matter which is permuted, of course. Much of the proc iml code used here can be found in section 1.9 of the book– however, note that curly braces are required in the read statement, as shown below.

With the permuted data in hand, we use proc ttest (section 2.4.1) with the ODS system to generate and save the differences. Note that the default variable names from proc iml are fairly nondescript. With the 1000 permuted statistics in hand, we can generate a histogram of the statistics and a p-value with proc univariate.