Sunday, January 1, 2017

F**k replication. F**k controls.

Just kidding – high replication and proper controls are the sine
qua non of experimental science, right? Or are they, given that high
replication and perfect controls are sometimes impossible or trade-off with
other aspects of inference? The point of this post is that low replication and an
absence of perfect controls can sometimes indicate GOOD science – because the
experiments are conducted in a context where realism is prioritized.

Replication and controls are concepts that are optimized for
laboratory science, where both aspects of experimental design are quite
achievable with relatively low effort – or, at least, low risk. The basic idea
is to have some sort of specific treatment (or treatments) that is (are)
manipulated in a number of replicates but not others (the controls), with all else
being held constant. The difference between the shared response for the
treatment replicates and the shared response (or lack thereof) for the control
replicates is taken as the causal effect of the specific focal manipulation.

However, depending on the question being asked, laboratory
experiments are not very useful because they are extracted from the natural
world, which is – after all – the context we are attempting to make inferences
about. Indeed, I would argue that pretty much any question about ecology and
evolution cannot be adequately (or at least sufficiently) addressed in laboratory
experiments because laboratory settings are too simple and too controlled to be
relevant to the real world.

1. Most laboratory experiments are designed to test for the
effect of a particular treatment while controlling for (eliminating) variation
in potential confounding and correlated factors. But why would we care about
the effect of some treatment abstracted from all other factors that might influence
its effects in the real world? Surely we what we actually care about is the effect
of a particular causal factor specifically within the context of all other
uncontrolled – and potentially correlated and confounding – variation in the
real world.

2. Most laboratory experiments use highly artificial
populations that are not at all representative of real populations in nature –
and which should therefore evolve in unrepresentative ways and have
unrepresentative ecological effects (even beyond the unrealistic laboratory “environment”).
For example, many experimental evolution studies start with a single clone,
such that all subsequent evolution must occur through new mutations – but when is
standing genetic variation ever absent in nature? As another example, many
laboratory studies use – quite understandably – laboratory-adapted populations;
yet such populations are clearly not representative of natural populations.

In short, laboratory experiments can tell us quite a bit
about laboratory environments and laboratory populations. So, if that is how an
investigator wants to focus inferences, then everything is fine – and replicates
and controls are just what one wants. I would argue, however, that what we actually
care about in nearly all instances is real populations in real environments.
For these more important inferences, laboratory experiments are manifestly
unsuitable (or at least insufficient) – for all of the reasons described above.
Charitably, one might say that laboratory experiments are “proof of concept.”
Uncharitably, one might say they tend to be “elegantly irrelevant.”

After tweeting a teaser about this upcoming post, I received a number of paper suggestions. I like this set.

To make the inferences we actually care about – real populations
in real environments – we need experiments WITH real populations in real
environments. Such experiments are the only way to draw robust and reliable and
relevant inferences. Here then is the rub: in field experiments, high replication
and/or precise controls can be infeasible or impossible. Here are some examples
from my own work:

1. In the mid 2000s, I trotted a paper around the big weeklies
about how a bimodal (in beak size) population of Darwin’s finches had lost
their bimodality in conjunction with increasing human activities at the main
town on Santa Cruz Island, Galapagos. Here we had, in essence, an experiment
where a bimodal population of finches was subject to increasing human
influences. Reviewers at the weeklies complained that we didn’t have any replicates
of the “experiment.” (We did have a control – a bimodal population in the
absence of human influences.) It was true! We did not have any replicates
simply because no other situation is known where a bimodal population of Darwin’s
finches came into contact with an expanding human population. Based on this
criticism of no replication – despite the fact that replication was both
impossible and irrelevant – our paper was punted from weeklies. Fortunately, it
did end up in a nice venue (PRSB) – and has since proved quite influential.

Bimodality prior to the 1970s has been lost to the present at a site with increasing human influence (AB: "the "experiment") but not at a site with low human influence (EG: "the control"). This figure is from my book.

2. More recently, we have been conducting experimental
evolution studies in nature with guppies. In a number of these studies, we have
performed REPLICATE experimental introductions in nature: in one case working
with David Reznick and collaborators to introduce guppies from one
high-predation (HP) source population into several low-predation (LP) environments
that previously lacked guppies. Although several of these studies have been
published, we have received – and continue to receive – what seem to me to be
misguided criticisms. First, we don’t have a true control, which is suggested
to be introducing HP guppies into some guppy-free HP environment. However, few
such environments exist and, when such introductions are attempted (Reznick,
pers. comm.), the guppies invariably go extinct. So, in essence, this HP-to-HP control
is impossible. Second, our studies have focused on only two to four of the
replicate introductions, which has been criticized because N=2 (or N=4) is too low
to make general conclusions about the drivers of evolutionary change. Although
it is certainly true that N=10 would be wonderful, it is simply not possible in
nature owing to limited available of suitable introduction sites. Moreover, N=2
(N=1 even) is quite sufficient to infer how those specific populations are
evolving, and, for N>1, whether they are evolving similarly or differently.

Real, yes, but not unlimited.

3. Low numbers of replicate experiments have also been
criticized because too many other factors vary idiosyncratically among our experimental
sites (they are real, after all) to allow general conclusions. The implication
is that we should not be doing such experiments in nature because we can’t
control for other covarying and potentially confounding factors – and because
the large numbers of replicates necessary to statistically account for those
other factors are not possible. I first would argue that the other covarying
and confounding factors are REAL, and we should not be controlling them but
rather embracing their ability to produce realism. Hence, if two replicates
show different responses to the same experimental manipulation, those different
responses are REAL and show that the specific manipulation is NOT generating a
common response when layered onto the real complexities of nature. Certainly,
removing those other factors might yield a common response to the manipulation
but that response would be fake – in essence, artificially increasing an effect
size by reducing the REAL error variance.

@EcoEvoEvoEco lab exp evolution not irrelevant to isolate causes. But agree that the world is all idiosyncratic. Every valley is unique.

For experiments the experiments that matter, replication and
controls trade-off with realism – and realism is much more important. A single N=2
uncontrolled field experiment is worth many N=100 lab experiments. A
single N=1 controlled field experiment is worth many different controlled lab
experiments. Authors (and reviewers and editors) should prioritize accordingly.

1. It is certainly true that limited replication and imperfect
controls mean that some inferences are limited. Hence, it is important to summarize
what can and cannot be inferred under such conditions. I will outline some of
these issues in the context of experimental evolution.

2. Even without replication and controls, inferences are not
compromised about evolution in the specific population under study. That is, if
evolution is document in a particular population, then evolution did occur in
that population in that way in that experiment. Period.

3. With replication (let’s say N=2 experiments), inferences are
not compromised about similarities and differences in evolution in the two
experiments. That is, if evolution is similar in two experiments, it is
similar. Period. If evolution is different in two experiments, it is different.
Period.

4. What is more difficult is making inferences about specific
causality: that is, was the planned manipulation the specific cause of the
evolution observed, or was a particular confounding factor the specific cause
of the difference between two replicates? Despite these limitations, an
investigator can still make several inferences. Most importantly, if evolution occurs
differently in two replicates subject to the same manipulation (predation or
parasitism or whatever), then that manipulation does NOT have a universal
over-riding effect on evolutionary trajectories in nature. Indeed, experiment-specific
outcomes are a common finding in our studies: despite a massive shared shift in
a particular set of environmental conditions, replicate populations can
sometimes respond in quite different ways. This outcome shows that context is
very important and, thereby, highlights the insufficiency of laboratory studies
that reduce or eliminate context-dependence and, critically, its idiosyncratic
variation among populations. Ways to improve causal inferences in such cases are
to use “virtual controls,” which amount to clear a priori expectations about ecological and evolutionary effects of
a given manipulation, and or “historical replicates,” which can come from other
experimental manipulations done by other authors in other studies. Of course,
such alternative methods are still attended by caveats that need to be made
clear.

I argue that ecological and evolutionary inferences require experiments
with actual populations in nature, which should be prioritized at all levels of
the scientific process even if replication is low and controls are imperfect.
Of course, I am not arguing for sloppy science – such experiments should still
be designed and implemented in the best possible manner. Yet only experiments
of this sort can tell us how the real world works. F**k replication and f**k
controls if they get in the way of the search for truth.

Additional points:

1. I am not the first frustrated author to make these types of
arguments. Perhaps the most famous defense of unreplicated field experiments was that by Stephen Carpenter in the context of whole-lake manipulations. Carpenter also argued that mesococosms were not very helpful for understanding large scale phenomena.

2. Laboratory experiments are obviously useful for some things,
especially physiological studies that ask, for example, how do temperature and
food influence metabolism in animals and how do light and nutrients influence
plant growth. Even here, however, those influences are likely context
dependence and could very well differ in the complex natural wold. Similarly,
laboratory studies are useful for asking questions such as “If I start with a
particular genetic background and impose a particular selective condition under
a particular set of otherwise controlled conditions, how will evolution
proceed?” Yet those studies must recognize that the results are going to be
irrelevant outside of that particular genetic background and that particular selective
condition under that particular set of controlled conditions.

3. Skelly and Kiesecker (2001 – Oikos) have an interesting paper
where they compare and contrast effect sizes and sample sizes in different “venues”
(lab, mesocosms, enclosures in nature) testing for effects of competition on
tadpole growth. They report that the different venues yielded quite different experimental
outcomes, supporting my points above that lab experiments don’t tell us much
about nature. They also report that replication did not decrease from the lab
to the more realistic venues – but the sorts of experiments reviewed are not
the same sort of real-population real-environment experiments described above,
where trade-offs are inevitable.

From Skelly and Kiesecker (2001 - Oikos).

4. Speaking of mesocosms (e.g., cattle tanks or bags in lakes),
perhaps they are the optimal compromise between the lab and nature, allowing
for lots of replication and for controls in realistic settings. Perhaps.
Perhaps not. It will all depend on the specific organisms, treatments,
environments, and inferences. The video below is an introduction to the cool new mesocosm array at McGill.

5. Some field experimental evolution studies can have nice
replication, such as the islands used for Anolis lizard experiments. However,
unless we want all inferences to come from these few systems, we need to also
work in other contexts, where replication and controls are harder (or
impossible).

6. Some investigators might read this blog and think “What the
hell, Hendry just rejected me because I lacked appropriate controls in my field
experiment?” Indeed, I do sometimes criticize field studies for the lack of a
control (or replication) but that is because the inferences attempted by the
authors do not match the inferences possible from the study design. For
instance, inferring a particular causal effect often requires replication and
controls – as noted above.

2 comments:

Interesting and important post, Andrew. As an experimental biologist who generally works in the field I'm often asked by lab biologists questions along the line of 'Why can't you do that in the lab?'. Given that lab experiments are obviously useful under some circumstances, it would be good to have a 'decision tree' schematic to outline the conditions under which a lab/mesocosm/field experiment is ideal.

There are a number of cases where a field experiment would likely be 'overkill'. And as you say there are many more cases where the abstractions of the lab are far too removed from nature. There's a real cost of field studies though, because data collection is slower and the possibility of failure is higher.

When you say ... "the results are going to be irrelevant outside of that particular genetic background and that particular selective condition under that particular set of controlled conditions.", somebody could say the same about field experiments. For example, if I do an about the evolution of defenses in plants, your critique could be applied to my chosen genetic background (i.e., white clover), particular selective condition (i.e., herbivores in Toronto), and controlled conditions (i.e., plants in a 1x1 m array). While my study was a powerful test of my specific hypothesis, is it really more generally relevant just because it was conducted outside? Or is it only relevant for clover, in Toronto, in 1x1 m arrays?

I plan to keep working outside, but I think we ought to recognize situations where field experiments are unnecessarily complicated and results from the lab may hold up generally.

Oksanen, L., 2001. Logic of experiments in ecology: is pseudoreplication a pseudoissue? Oikos 94, 27–38.I've always been fond of Alternative #3 here - in some cases, for experiments of realistic scale when replicating a treatment is prohibitive, replicate the heck out of your controls to establish a reference distribution.

and

Ruesink, J., 2000. Intertidal mesograzers in field microcosms: linking laboratory feeding rates to community dynamics. J. Exp. Mar. Biol. Ecol. 248, 163–176.A really nice study that shows the difference in grazing rates/effect size with the same species in the lab versus field.