Repeated-Measures
ANOVA with two trials plus a between-subjects
effect

model: numeric <= (subject) group trial group*trialexample: jumphgt <= (athlete) group
time group*timeLet's take the experiment on the previous
page, where we attempted to increase jump height with some sort of experimental
treatment. As before, we measure jump height pre and post the treatment in a group
of subjects, the experimental group (expt in the figure). But now we also
have a second group who get a different treatment, and the aim of the experiment
is to compare the change in jump height in the two groups. If that different treatment
is nothing at all, or a sham treatment (a placebo), the second group is
called a control group--hence the name for this sort of experiment, a controlled
trial.

Let's analyze it the easy way first. For each subject, subtract the pre height
from the post height to get a change score. Now compare the change scores in
the two groups using an unpaired t test. Use the unequal-variances version of
the t test, because the standard deviation (square root of the variance) of
the change scores in the experimental group is likely to be larger than that
in the control group, owing to individual responses to the treatment. The spreadsheet
for controlled trials can do it all for you. If you have three groups (e.g.
two experimental groups and one control group), use a new spreadsheet for each
pairwise comparison of groups. You can also use a one-way ANOVA on the change
scores, but beware: ANOVA assumes equal variances (standard deviations) of the
change scores in all the groups. See the slide
show on repeated measures for an explanation of these subtleties.

Now for the model, which is the hard way. We have to do it,
though, because you need to understand the model for later
complexities with repeated measures. Let's start with the simple
model from the previous page:

jumphgt <= (athlete) time

This model represents the obvious fact that jump height is
affected by time (depends whether it's the pre-test of the post-test)
and the identity of the athlete (depends how good a jumper s/he is).
But we now have two groups of subjects (control and expt), so we have
to add a term to show that athletes in one group could jump
differently from those in the other:

jumphgt <= (athlete) time group

Technically the model is now a three-way ANOVA, but no-one ever
calls it that. OK, what tells us whether the experimental group did
better in the post test, relative to the control group? The group
effect? No, this term represents the overall difference
between the groups, counting pre and post tests. We're missing a
term, of course: the interaction time*group. This term is the first
thing you look at to see how your treatment worked.. So the full
model is:

jumphgt <= (athlete) time group time*group

By the way, the order of time and group in the model is
irrelevant, and time*group is the same as group*time.

The data for this model seem simple enough (pre and post means and
SDs for two groups), but interpreting the
substantiveness/significance of each term in the model can be
confusing. So here are examples illustrating the eight possible
combinations of insubstantial and substantial effects for the
different terms in the model. Don't go past this section until you
understand all eight parts of this diagram:

The last two examples on the lower right are the ones we usually
want in a study: no difference between the control and experimental
groups in the pretest, and a nice big divergence on post-test. The
fact that main effects are substantial in these two examples is
irrelevant. The other two examples with a substantial interaction
also illustrate treatments that worked, but the outcomes are not
ideal, because in both cases the groups are different in the pretest.
A worry, because it means that one or both of the groups can't be
representative of the population, at least as far as jump height is
concerned. And non-representative samples mean non-generalizable
findings!

Finally how do we calculate the magnitude of the experimental effect? Easy.
The post score minus the pre score for the experimental group is the main thing,
but we have to subtract off any change in the control group. To do it as an
estimate or contrast in the repeated-measures ANOVA, combine the four levels
of time*group in the following way: (post·expt - pre·expt) - (post·cont
- pre·cont).Special Case: Simple CrossoversIn a simple crossover
design, half the subjects get a control treatment followed by an experimental
treatment, while the other half get the treatments the other way around. People
usually analyze the data as a simple paired
t test, which means they effectively subtract the control response from
the experimental response for each subject, without regard for the order of
treatment. In a minute I'll show you a better way, using the above ANOVA model,
and I'll generalize it to multiple crossovers. First, more about simple crossovers.

Why split the subjects into two groups and cross the treatments over? Because
if all subjects get the control and experimental treatments in the same order,
you won't know whether any change you see is truly an effect of the treatment,
or just an effect of being tested a second time--a practice or learning
effect. When you split the subjects, the group that gets the control first
has the practice effect added to the experimental treatment, whereas the group
that gets the experimental first has the practice effect added to the control
treatment. So when you average the difference scores, the practice effect disappears
and you are left with the treatment effect, provided the two groups have the
same number of subjects.

Fine, but there's a problem. When there is a practice effect, you get
two clusters of difference scores. For example, if the practice effect is about
the same size as the treatment effect, one set of difference scores will be
around zero, and the other will tend to be twice as large as the treatment effect.
The average is still equal to the treatment effect, but the effect appears to
be more variable between subjects. The result is a bigger (worse) confidence
interval for the treatment effect, or a bigger (worse) p value, or less power
to detect the treatment effect.

Another potential problem is carry over. For the group that gets the
experimental treatment first, it's important that any effect of the treatment
disappears by the time that group gets the control treatment--otherwise the
difference between control and experimental treatments for that group will be
reduced. The result will be an apparently smaller treatment effect overall,
and an apparent practice effect. For example, if the treatment effect carries
over completely, the analysis will produce a treatment effect that is half its
true value, and an apparent practice effect of the same magnitude. So you
can't do a training study as a crossover, unless you are confident that
the adaptations produced by the experimental training program decay away before
subjects get the control program.

You might be able to get over the problem of carry over by
increasing the time between the two treatments. But the longer the
time, the less reliable the dependent variable is likely to be, which
means a wider confidence interval for the difference between the
treatments.

One way around the problem of practice and carry-over effects is to throw out
the crossover altogether. Replace it with a properly controlled study, in which
you split the subjects into two groups, give both groups a pre-test, then administer
the control treatment to one group and the experimental treatment to the other,
and finally do a post-test on both groups. Any practice effect should be the
same for both groups, so it disappears when you calculate the change in the
experimental group minus the change in the control group.

So why bother with a crossover at all? For a very good reason: you
get the same confidence interval for the treatment effect with
one quarter the number of
subjects as in a fully controlled design, provided there are no
practice and carry-over effects. For such a big saving in time and
expense, always consider a crossover before a fully controlled study.
Minimize any carry-over effect by allowing adequate time between the
treatments. And don't worry about the practice effect, because ANOVA
takes care of it. Here's how:

model: numeric <= (subject) treat group treat*groupexample: jumphgt <= (athlete) treat group treat*groupThe
figure shows data for an example of a simple crossover, in which an experimental
treatment increased jump height relative to a control treatment. I've separated
the data for the two groups (control treatment first, experimental treatment
first) to illustrate a practice effect, which adds to the difference between
experimental and control treatments for the group that had the control treatment
first, but reduces the difference for the other group. The data also illustrate
that randomization of athletes to the two groups resulted in one group (expt
first) being somewhat better jumpers overall.

The model has the same form as the model at the top of this page,
but the time effect is now replaced with treat, which has two levels
(cont and expt). The other main effect, group, now represents which
group each subject was assigned to (contfirst, exptfirst). The
interaction term treat*group has four levels (cont·contfirst,
cont·exptfirst, expt·contfirst, and expt·exptfirst).

The difference between the two levels of the treatment
effect (expt - cont) tells you the thing you're most interested in: how well
the treatment worked relative to control. The difference between the two levels
of the group effect (exptfirst - contfirst) tells you how different your two
groups of subjects were, so it's a measure of how well you randomized your subjects
to the two groups. The interaction gives you the size of the practice effect,
and I'll leave you to figure out that the appropriate contrast is 0.5*(expt·contfirst
- expt·exptfirst - cont·contfirst + cont·exptfirst ). If that's
too challenging, here's another way to get the practice effect. First, make
another repeated-measures variable called trial in your data set. Trial is almost
the same as treat, but trial has values of the dependent variable corresponding
to the first and second trial, whereas treat has values corresponding to control
and experimental treatments. Now do the ANOVA with group, trial, and group*trial
in the model. The practice effect comes straight from trial in this model.

Get your stats program to give you confidence intervals for all these contrasts,
please, not just the p values! And if you plot your data for publication, show
the two groups as I have done in the above example.

A bonus for this method of analyzing crossovers is no absolute
requirement for an equal number of subjects in each group. It's still
best to have equal numbers, but if you get dropouts in one group, the
resulting treatment effect is not biased by any practice effect. It
would be biased if you used a paired t test to analyze the data.
Users of the Statistical Analysis System have the option of modeling
the data in a slightly more intuitive way. Instead of having a group
effect in the model, use a variable called trial, which has values
first and second (or 1 and 2). This
variable indicates whether the given observation represents each
subject's first or second trial or test. Here's the model:

model:
numeric <= (subject) treat trial treat*trial

It looks similar to the previous model, but trial is actually a
second within-subject factor, which we
haven't dealt with yet. It turns out that traditional methods of
repeated-measures ANOVA can't handle this model, because each subject
has values for only two of the four combinations of treat and trial.
But the new mixed procedure in SAS handles it brilliantly. Just use
the treat term to get the estimate of the difference between the
experimental and control treatments, and use the trial term to get
the practice effect. An appropriate combination of the levels of
treat*trial gives the difference between the means of the two groups
of subjects with treatment and practice effects partialed out, if you
want to check how evenly the subjects were randomized to the two
treatment sequences.