Many Neuroscience Studies May Be Based on Bad Statistics

Image: Wellcome Library, London

The fields of psychology and cognitive neuroscience have had some rough sledding in recent years. The bumps have come from high-profile fraudsters, concerns about findings that can’t be replicated, and criticism from within the scientific ranks about shoddystatistics. A new study adds to these woes, suggesting that a wide range of neuroscience studies lack the statistical power to back up their findings.

This problem isn’t just academic. The authors argue that there are real-world consequences, from wasting the lives of lab animals and squandering public funding on unreliable studies, to potentially stopping clinical trials with human patients prematurely (or not stopping them soon enough).

“This paper should help by revealing exactly how bad things have gotten,” said Hal Pashler, a psychologist at the University of California, San Diego. Pashler was not involved with the new study, but he and colleagues have previously raised concerns about statistical problems with fMRI brain scan studies in human subjects.

The aim of the new study wasn’t to rake neuroscientists over the coals, but to get them talking about how to change the culture and the incentives that promote statistically unreliable studies, says co-author Marcus Munafò, a psychologist at the University of Bristol, United Kingdom. “We’re really trying to be constructive about this.”

Statistical power is essentially the probability that a study will detect an effect of a given size if the effect is really there. It depends on two things: the sample size (the number of people in a study, for example) and the effect size (such as a difference in brain volume between healthy people and Alzheimer’s patients). The more people in the study and the bigger the size of the effect, the higher the statistical power.

Low statistical power is bad news. Underpowered studies are more likely to miss genuine effects, and as a group they’re more likely to include a higher proportion of false positives — that is, effects that reach statistical significance even though they are not real.

Many researchers consider a statistical power of 80 percent to be a desirable goal in designing a study. At that level, if an effect of a particular size were genuine, the study would detect it 80 percent of the time.

But roughly half of the neuroscience studies Munafò and colleagues included in their analysis had a statistical power below 20 percent. Those studies would fail to detect a genuine effect at least 80 percent of the time.

The raw material for the study was 49 meta-analyses, or studies that analyze data from other studies — 730 individual neuroscience studies in this case — published in 2011. The team concludes that most of the reported findings may not be reliable.

For human neuroimaging studies, the median statistical power was just 8 percent, meaning that half the studies were below this mark and half were above. In two different types of animal studies typically used to study memory, median statistical power was 18 percent and 31 percent, respectively, the team reported last week in Nature Reviews Neuroscience, which has made the paper open access for one week, starting today.

“It was already clear that fMRI studies were almost always very underpowered, but this paper shows that just about everything except a set of studies described as “neurological” are also underpowered,” Pashler said.

It’s not the first time researchers have raised concerns about underpowered studies in neuroscience, says Russ Poldrack, a cognitive neuroscientist at the University of Texas, Austin who was not involved with the study. “But it’s a more formal evaluation of just how badly underpowered these studies are,” he said. “Unfortunately, there’s still a good number of people who have their heads in the sand about these issues and want to pretend they’re not problems.”

Poldrack agrees that the work raises ethical concerns. “In animal research if you’re doing underpowered studies, that could be viewed as unethical because it’s more likely that you’ll find nothing, which might suggest that the animals were sacrificed needlessly,” Poldrack said. There are ethical considerations in human studies too. “fMRI is very low risk, but if you’re doing underpowered studies you’re not treating people with the respect they deserve as research subjects.”

So why are underpowered studies so prevalent?

One factor is cost. Many researchers are squeezed for funding, and running smaller studies is one way to stretch a research grant.

Another factor is the pressure on scientists to publish often, preferably in high-profile journals, to advance their careers and win funding from the government. “In many cases we’re more incentivized to be productive than to be right,” Munafò says.

He believes neuroscientists can take a cue from researchers in genetics and other fields who’ve combated problems with underpowered studies by creating ways for scientists to pool their data. The OpenfMRI project led by Poldrack is one example of an effort to do this in neuroscience.

Giving scientists an incentive and making it easier to replicate each other’s findings — generally considered a distinctly unglamorous pursuit — is another approach to increasing the collective statistical power of a body of research, Munafò and colleagues suggest. Two efforts to do this in psychology, the Open Science Framework and the related Reproducibility Project, were launched recently by Munafò’s co-author Brian Nosek of the University of Virginia.

In Poldrack’s view, the most important remedy is convincing scientists to do bigger studies, which will almost certainly mean fewer studies. “What it comes down to is, is it worth doing these things right?”