A Bug in fMRI Software Could Invalidate 15 Years of Brain Research

Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates, by Swedish neuroscientists Anders Eklund, Tom Nichols, and Hans Knutsson, published in the Proceedings of the National Academy of Sciences (2016) reveals the discovery of a major flaw in the software used by neuroscientists in functional magnetic resonance imaging (fMRI) studies. The software used to analyze fMRI brain activations associated with tasks/activities/thoughts was found to deliver inflated false-positive rates.

Dr. Anders Eklund

“we found that the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.” and “It is not feasible to redo 40,000 fMRI studies”

They noted that “Despite the popularity of fMRI as a tool for studying brain function, the statistical methods used have rarely been validated using real data… These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.” *

“because functional magnetic resonance imaging (fMRI) is one of the best tools we have to measure brain activity, and if it’s flawed, it means all those conclusions about what our brains look like during things like exercise, gaming, love, and drug addiction are wrong.”

“the fact is that when scientists are interpreting data from an fMRI machine, they’re not looking at the actual brain. As Richard Chirgwin reports for The Register, what they’re looking at is an image of the brain divided into tiny ‘voxels’, then interpreted by a computer program.

“When you see a claim that ‘Scientists know when you’re about to move an arm: these images prove it,’ they’re interpreting what they’re told by the statistical software.”

“To test how good this software actually is, Eklund and his team gathered resting-state fMRI data from 499 healthy people sourced from databases around the world, split them up into groups of 20, and measured them against each other to get 3 million random comparisons.

They tested the three most popular fMRI software packages for fMRI analysis – SPM, FSL, and AFNI – and while they shouldn’t have found much difference across the groups, the software resulted in false-positive rates of up to 70 percent.”

“The bad news here is that one of the bugs the team identified has been in the system for the past 15 years, which explains why so many papers could now be affected.

The bug was corrected in May 2015, at the time the researchers started writing up their paper, but the fact that it remained undetected for over a decade shows just how easy it was for something like this to happen, because researchers just haven’t had reliable methods for validating fMRI results. (Read more “A Bug in fMRI Software Could Invalidate 15 Years of Brain Research,” by Bec Crew, Science Alert, July 2016)

This is a devastating finding. At the very least it raises serious doubts about the validity of 15 years of brain research – that relied on brain activation mapping studies. It also raises doubts about its validity in the field of Lie Detection where it has been used extensively.

The finding highlights the importance of scientific data sharing which is a subject of heated dispute pitting on one side, powerful commercial medical interests whose objective is protecting profits. This side is led by the editor-in-chief of The New England Journal of Medicine. On the other side of the debate are a growing number of honest medical scientists for whom scientific integrity and the advancement of medicine are paramount, rather than corporate interests. As Dr. Eklund noted:

“If we don’t have access to the data, we cannot say if studies are wrong,” said Anders Eklund, an associate professor at Linkoping University in Sweden and a co-author of the study that found the fM.R.I. software bug. “Finding errors is how scientific fields evolve. This is how science gets done.”

The New York Times reported (August 27, 2016):

“The dust-up has caused considerable angst in the fM.R.I. community, about not only the reliability of their pretty pictures but also how limited funding and the pressure to publish splashy results might have allowed such a mistake to go unnoticed for so long. The remedial measures some in the field are now proposing could be a model for the wider scientific community, which, despite breathtaking technological advances, often produces findings that don’t hold up over time.”

“when you divide the brain into bitty bits and make millions of calculations according to a bunch of inferences, there are abundant opportunities for error, particularly when you are relying on software to do much of the work. This was made glaringly apparent back in 2009, when a graduate student conducted an fM.R.I. scan of a dead salmon and found neural activity in its brain when it was shown photographs of humans in social situations. Again, it was a salmon. And it was dead.”

Other statistical problems in analyzing fM.R.I. data have been pointed out. But these kinds of finger-wagging methodological critiques aren’t easily published, much less funded. And on the rare occasions they do make it into journals, they don’t grab headlines as much as studies that show you what your brain looks like when you believe in God.

“There is an immense amount of flexibility in how anybody is going to analyze data,” said Russell Poldrack, who leads a cognitive neuroscience lab at Stanford University and is a co-author of the “Handbook of functional MRI Data Analysis.” And, he continued, “some choices make a bigger difference than others in the integrity of your results.”

Neuroscientists have a long record of touting and embraced new technologies prior to these technologies having been adequately tested to detect shortcomings, risks, or pitfalls. Neuroscientists have almost invariably overstated the scientific significance and clinical value of the tools they rushed to utilize, yet poorly understood how they actually performed. And when the data was put to an objective test by independent scientists, the claimed findings could not be verified –as was demonstrated in 2015 when the results of 100 psychological studies could not be replicated. (Read Psychology’s Replication Crisis Can’t Be Wished Away by Ed Yong, The Atlantic, March 2016)

“An uncritical use of new imaging technology may open the door to a new kind of old fashioned phrenology, i.e., looking at speciﬁc areas only and neglecting the interconnectivity of a neuronal network.” Bao & Pöppel, 2012, p. 2144)

Furthermore, when relying on software to do much of the millions of calculations and inferences of the tiny parts of the brain, there are numerous opportunities for error — even absurd conclusions. In 2009, this was made glaringly apparent, when a graduate student conducted an fM.R.I. scan of a dead salmon and found neural activity in its brain when it was shown photographs of humans interacting in social situations. Repeat: the fMRI brain scan of a DEAD salmon showed “neural brain activity” when shown photos of humans interacting.

In 2012, an article in The New Yorker focused on fMRIs as a scientific tool for detecting lies. Indeed, fMRIs have a commercial market as high- tech “truth verifiers” that are claimed to work better than polygraphs. What really drives the business is that they are regarded as scientific because, unlike polygraphs, fMRIs are administered by licensed medical professionals.

Dr. Steven Hyman, former director of the National Institute of Mental Health, then provost of Harvard, currently, director of the Stanley Center for Psychiatric Research at the Broad Institute of MIT and Harvard, sees through the bogus science that relies on artifice imagery in lieu of scientific methods to arrive at genuine scientific evidence. He is quoted stating:

Dr. Steve Hyman

“The published data on the use of fMRI for lie detection uses highly artificial tests, which are not even convincing models of lying, in very structured laboratory settings. There are no convincing data that they could be used accurately to screen a person in the real world.” But, in the end, that might not matter. “Pseudo-colored pictures of a person’s brain lighting up are undoubtedly more persuasive than a pattern of squiggles produced by a polygraph,” he said. “That could be a big problem if the goal is to get to the truth.”

In 2005 in our comment regarding an article in The New York Times that suggested that imaging technology had been oversold as a psychiatric diagnostic tool.

“contrary to ALL claims by organized psychiatry and its leading researchers, brain scans DO NOT prove that mental disorders are biological in nature, nor have brain scans been able to detect differences between brains of normal individuals and those who have a mental illness. The psychiatric establishment–with help from its benefactors in the drug industry–disseminated false claims about scientific advances. Psychiatry’s leadership and lobbyists made false claims about PET scans when they lobbied Congress for money.” (Read more at AHRP Psychiatry’s Claims Re: Brain Imaging Have Far Outpaced Science )

We’ve all seen them, those colorful images that show how our brains “light up” when we’re in love, playing a video game, craving chocolate, etc. Created using functional magnetic resonance imaging, or fM.R.I., these pictures are the basis of tens of thousands of scientific papers, the backdrop to TED talks and supporting evidence in best-selling books that tell us how to maintain healthy relationships, make decisions, market products and lose weight.

But a study published last month in the Proceedings of the National Academy of Sciences uncovered flaws in the software researchers rely on to analyze fM.R.I. data. The glitch can cause false positives — suggesting brain activity where there is none — up to 70 percent of the time.

This cued a chorus of “I told you so!” from critics who have long said fM.R.I. is nothing more than high-tech phrenology. Brain-imaging researchers protested that the software problems were not as bad nor as widespread as the study suggested. The dust-up has caused considerable angst in the fM.R.I. community, about not only the reliability of their pretty pictures but also how limited funding and the pressure to publish splashy results might have allowed such a mistake to go unnoticed for so long. The remedial measures some in the field are now proposing could be a model for the wider scientific community, which, despite breathtaking technological advances, often produces findings that don’t hold up over time.

“We have entered an era where the kinds of data and the analyses that people run have gotten incredibly complicated,” said Martin Sereno, the chairman of the cognitive neuroimaging department at the University of California, San Diego. “So you have researchers using sophisticated software programs that they probably don’t understand but are generally accepted and everyone uses.”

Developed in the 1990s, fM.R.I. creates images based on the differential effects a strong magnetic field has on brain tissue. The scans occur at a rate of about one per second, and software divides each scan into around 200,000 voxels — cube-shaped pixels — each containing about a million brain cells. The software then infers neural activity within voxels or clusters of voxels, based on detected blood flow (the areas that “light up”). Comparisons are made between voxels of a resting brain and voxels of a brain that is doing something like, say, looking at a picture of Hillary Clinton, to try to deduce what the subject might be thinking or feeling depending on which area of the brain is activated.

But when you divide the brain into bitty bits and make millions of calculations according to a bunch of inferences, there are abundant opportunities for error, particularly when you are relying on software to do much of the work. This was made glaringly apparent back in 2009, when a graduate student conducted an fM.R.I. scan of a dead salmon and found neural activity in its brain when it was shown photographs of humans in social situations. Again, it was a salmon. And it was dead.

This is not to say all fM.R.I. research is hooey. But it does indicate that methods matter even when using whiz-bang technology. In the case of the dead salmon, what was needed was to statistically correct for false positives that arise when you make so many comparisons between voxels.

Think of the brain as a jam-packed sports arena and the voxels as all the fans. If you ask everyone in the stadium a bunch of questions, you might, by chance, see a pattern emerge, such as a cluster of people standing in line for the bathroom who love pistachio ice cream and skipped a grade in school. You need to statistically account for the possibility of coincidence before drawing any conclusions about ice cream, intellect and bladder control, just as you would for areas in the brain that light up or don’t light up in response to stimuli.

The authors of the paper on the software glitch found that a vast majority of published papers in the field do not make this “multiple comparison” correction. But when they do, they said, the most widely used fM.R.I. data analysis software often doesn’t do it adequately.

Other statistical problems in analyzing fM.R.I. data have been pointed out. But these kinds of finger-wagging methodological critiques aren’t easily published, much less funded. And on the rare occasions they do make it into journals, they don’t grab headlines as much as studies that show you what your brain looks like when you believe in God.

“There is an immense amount of flexibility in how anybody is going to analyze data,” said Russell Poldrack, who leads a cognitive neuroscience lab at Stanford University and is a co-author of the “Handbook of functional MRI Data Analysis.” And, he continued, “some choices make a bigger difference than others in the integrity of your results.”

To try to create some consistency and enhance credibility, he and other leaders in the field recently published a lengthy report titled “Best Practices in Data Analysis and Sharing in Neuroimaging Using MRI.” They said their intent was to increase transparency through comprehensive sharing of data, research methods and final results so that other investigators could “reproduce findings with the same data, better interrogate the methodology used and, ultimately, make best use of research funding by allowing reuse of data.”

The shocker is that transparency and reproducibility aren’t already required, given that we’re talking about often publicly funded, peer-reviewed, published research. And it’s much the same in other scientific disciplines.

Indeed, a study published last year in the journal Science found that researchers could replicate only 39 percent of 100 studies appearing in three high-ranking psychology journals. Research similarly hasn’t held up in genetics, nutrition, physics and oncology. The fM.R.I. errors added fuel to what many are calling a reproducibility crisis.

“People feel they are giving up a competitive advantage” if they share data and detail their analyses, said Jean-Baptiste Poline, senior research scientist at the University of California, Berkeley’s Brain Imaging Center. “Even if their work is funded by the government, they see it as their data. This is the wrong attitude because it should be for the benefit of society and the research community.”

There is also resistance because, of course, nobody likes to be proved wrong. Witness the blowback against those who ventured to point out irregularities in psychology research, dismissed by some as the “replication police” and “shameless little bullies.”

Nevertheless, the fM.R.I. community seems determined to be an exemplar. The next issue of the journal NeuroImage: Clinical will lead with an editorial announcing that it will no longer publish research that has not been corrected for multiple comparisons, and there is a push for other journals to do the same, as well as to require authors to make publicly available their data sets and analyses. Data-sharing platforms such as OpenfMRI and Neurovault have already been established to make fM.R.I. data and statistical methods more widely accessible. In fact, it was data sharing that revealed the fM.R.I. software glitch.

Data repositories have been established in other branches of science, too. Many are supported by the National Institutes of Health, which now requires researchers who receive $500,000 or more in federal funding (as well as those doing large-scale genomic research regardless of funding level) to have a data-sharing plan, although there remain some loopholes as well as limits on who can access the data.

”If we don’t have access to the data, we cannot say if studies are wrong,” said Anders Eklund, an associate professor at Linkoping University in Sweden and a co-author of the study that found the fM.R.I. software bug. “Finding errors is how scientific fields evolve. This is how science gets done.”