Thursday, January 3, 2013

Breeding a Behemoth

In the current discussion about the methodological crisis in psychology, there are calls to (1) include exact self-replications in papers, (2)
provide chronological rather than plot-driven accounts of your research, and
(3) publish null results. This makes a lot of sense but what happens when you
heed these calls? You might end up with a 14-experiment Behemoth!

That’s at least what’s happened to us with a paper we’re
writing right now. It is on the memory effects of direct versus indirect speech.
I’ll talk about the contents in a later post. Here I’m talking about the sheer size of the beast. How did we manage to breed our Behemoth?

We started out with a hypothesis (direct speech is
“livelier” than indirect speech and should therefore lead to better memory
representations) that seemed highly plausible given the literature. We found the exact opposite of
what we predicted—a convincing effect, even by Bayesian standards. We tried to replicate it. Same pattern. So now we had 2 experiments but no cigar.

We thought the problem might be with the memory probe that
we used. So we used a different one. Again we found the opposite of what you
would predict. This finding replicated as well. So that’s 4 experiments and
still no cigar.

Then we thought that it had to do with the placement of our
memory probe. So we placed it at a later time. No, that didn’t do the trick.
Still the opposite effect of what we originally had predicted. We replicated this.
So that’s 6 experiments (are you counting along?).

Then one of us thought it had to do with the fact that we
used visual probes, where auditory ones might be more appropriate given certain
findings in the literature. So we used auditory probes. First we used pure
tones. No dice. A resounding null effect, which replicated. That’s 8. Enough, right?

No because then we thought it was actually stupid to use tones and so
we used words instead (the words left
and right). Again null effects, which
replicated. So that’s 10. Then we thought it was stupid after all to use words like left and right that were unrelated to the target sentences so we used words from the sentences instead. Again the results knocked our prediction right out of the park: two big fat null effects. So that’s 12. (If this gets boring to
you, imagine how we felt.)

We thought we couldn’t finish on this note. It would be like
a band closing off a concert with a drum solo (or two dancing dwarves and a miniature Stonehenge for Spinal Tap fans). We were looking for a grand
finale. Our final hypothesis was that direct speech leads to better memory for
the exact wording of a sentence than indirect speech. Drum roll...Yes! A convincing effect.
And it even replicated! And that’s 14.

So now we’re writing this Behemoth of a paper and then we’re
going to try to find a home for it [update: the paper was published in 2013]. I actually think it will be a highly
informative and interesting paper. Our Null results are meaningful because our
experiments were high-powered and we used Bayesian statistics, which enabled us to quantify the strength of evidence for the Null. And our other results
are partly counterintuitive and partly as expected; and taken together, they
present a coherent picture. Moreover, we have great confidence in the results
because of their high power and reproducibility.

But I’m still wondering what the paper would have looked
like under the old regime.

It might have
looked like this. We wouldn’t have run the exact replications, so that
leaves us with 7 experiments. We would have started off the paper with one of
the last two experiments (of course we’d actually run only one of them). A nice
effect that is consistent with the literature. Then we would have reported one
of the first two experiments (again, the one we ran). Hey, interesting, a
counterintuitive effect! Then we would have reported the third experiment,
which would perhaps be presented as a conceptual replication of the first
experiment. We probably would have also included one of the experiments with the longer time interval.

And that’s it! We wouldn’t have reported the null
effects. Instead of 14 experiments, we'd be down to a healthy 4. Instead of the Behemoth, we’d have a sleek foal, which would probably
be a lot easier to sell and would make us look like considerably more competent breeders.

So that’s maybe what the new psychology will look like: a
collection of large beasts lumbering around in the field instead of a herd of
happily prancing foals. But at least the beasts have their feet planted firmly
on the ground.

Update

Daniel Lakens (@lakens) suggests via Twitter that the paper is actually not 14 experiments but 7 because the replications can be summarized in a table. According to him, the paper is a warhorse. I agree the metaphor is apter.