Thursday, 16 January 2014

Tough love for fMRI: questions and possible solutions

Let me get this out of the way at the beginning so I
don’t come across as a total curmudgeon. I think fMRI is great. My lab uses
it. We have grants that include it. We publish papers about it. We combine it
with TMS, and we’ve worked on methods to make that combination better. It’s the
most spatially precise technique for localizing neural function in healthy
humans. The physics (and sheer ingenuity) that makes fMRI possible is
astonishing.

There is a lot we can do. We got ourselves into this
mess. Only we can get ourselves out. But it will require concerted effort and
determination from researchers and the positioning of key incentives by
journals and funders.

The tl;dr version of my proposed solutions: work in
larger research teams to tackle bigger questions, raise the profile of a
priori statistical power, pre-register study protocols and offer
journal-based pre-registration formats, stop judging the merit of science by
the journal brand, and mandate sharing of data and materials.

Problem 1: Expense. The technique
is expensive compared to other methods. In the UK it costs about £500 per hour
of scanner time, sometimes even more.

Solution in brief: Work in
larger research teams to divide the cost.

Solution in detail: It’s hard to
make the technique cheaper. The real solution is to
think big. What do other sciences do when working with expensive techniques?
They group together and tackle big questions. Cognitive neuroscience is
littered with petty fiefdoms doing one small study after another – making
small, noisy advances. The IMAGEN fMRI consortium is a beautiful example
of how things could be if we worked together.

Solution in brief: Again, work
in larger teams, combining data across centres to furnish large sample sizes. We need to get serious about
statistical power, taking some of the energy that goes into methods development
and channeling it into developing a priori power analysis techniques.

Solution in detail: Anyone who
uses null hypothesis significance testing (NHST) needs to care about
statistical power. Yet if we take psychology and cognitive neuroscience as a
whole, how many studies motivate their sample size according to a priori
power analysis? Very few, and you could count the number of basic fMRI studies that do
this on the head of a pin. There seem to be two reasons why fMRI researchers
don’t care about power. The first is cultural: to get published, the most
important thing is for authors to push a corrected p value below .05.
With enough data mining, statistical significance is guaranteed (regardless of truth) so why would a career-minded scientist bother
about power? The second is technical: there are so many moving parts to an fMRI
experiment, and so many little differences in the way different scanners operate, that power analysis itself is very challenging. But
think about it this way: if these problems make power analysis difficult then
they necessarily make the interpretation of p values just as difficult.
Yet the fMRI community happily embraces this double standard because it is p<.05,
not power, that gets you published.

Problem 3: Researcher ‘degrees of freedom’.
Even the simplest fMRI experiment will involve dozens of analytic options, each which could be considered legal and justifiable. These researcher degrees of freedom provide an ambiguous decision space for analysts to try different approaches and see what “works” best in
producing results that are attractive, statistically significant, or fit
with prior expectations. Typically only the outcome that "worked" is then published. Exploiting these degrees of freedom also enables
researchers to present “hypotheses” derived from the data as though they were a
priori, a questionable practice known as HARKing. It’s ironic that the fMRI community
has put so much effort into developing methods that correct for multiple
comparisons while completely ignoring the inflation of Type I error caused by
undisclosed analytic flexibility. It’s the same problem in different form.

Solution in brief: Pre-registration
of research protocols so that readers can distinguish hypothesis testing from
hypothesis generation, and thus confirmation from exploration.

Solution in detail: By
pre-specifying our hypotheses and analysis protocol we protect the outcome of
experiments from our own bias. It’s a delusion to pretend that we aren’t
biased, that each of us is somehow a paragon of objectivity and integrity. That
is self-serving nonsense. To incentivize pre-registration, all journals should
offer pre-registered article formats, such as Registered Reports at Cortex. This includes prominent journals like Nature and Science, which have a vital role to play
in driving better science. At a minimum, fMRI researchers should be encouraged to pre-register their designs on the Open Science Framework. It’s not hard to do.
Here’s an fMRI
pre-registration from our group.

Arguments for pre-registration should not be seen as arguments against exploration in
science – instead they are a call for researchers to care more about the
distinction between hypothesis testing (confirmation) and hypothesis generation
(exploration). And to those critics who object to pre-registration, please
don’t try to tell me that fMRI is necessarily “exploratory” and “observational”
and that “science needs to be free, dude” while in same breath submitting
papers that state hypotheses or present p values. You can't have it both ways.

Problem 4: Pressure to publish. In
our increasingly chickens-go-in-pies-come-out culture of academia,
“productivity” is crucial. What exactly that means or why it should be
important in science isn’t clear – far less proven. Peter Higgs made one of the
most important discoveries in physics yet would have been marked as unproductive and sacked in the current system.
As long as we value the quantity of science that academics produce we will
necessarily devalue quality. It’s a see saw. This problem is compounded in fMRI
because of the problems above: it’s expensive, the studies are underpowered,
and researchers face enormous pressure to convert experiments into positive,
publishable results. This can only encourage questionable practices and fraud.

Problem 5: Lack of data sharing.
fMRI research is shrouded in secrecy. Data sharing is unusual, and the rare
cases where it does happen are often made useless by researchers carelessly
dumping raw data without any guidance notes or consideration of readers.
Sharing of data is critical to safeguard research integrity – failure to share
makes it easier to get away with fraud.

Solution in brief: Share and we
all benefit. Any journal that publishes fMRI should mandate the sharing of raw
data, processed data, analysis scripts, and guidance notes. Every grant agency
that funds fMRI studies should do likewise.

Solution in detail: Public data
sharing has manifold benefits. It discourages and helps unmask fraud, it
encourages researchers to take greater care in their analyses and conclusions,
and it allows for fine-grained meta-analysis. So why isn’t it already standard
practice? One reason is that we’re simply too lazy. We write sloppy analysis
scripts that we’d be embarrassed for our friends to see (let alone strangers);
we don’t keep good records of the analyses we’ve done (why bother when the goal
is p<.05?); we whine about the extra work involved in making our
analyses transparent and repeatable by others. Well, diddums, and fuck us – we
need to do better.

Another objection is the fear that others will “steal”
our data, publishing it without authorization and benefiting from our hard
work. This is disingenuous and tinged by dickishness. Is your data really a
matter of national security? Oh, sorry, did I forget how important you are? My
bad.

It pays to remember that data can be cited in exactly
the same way papers can – once in the public domain others can cite your data
and you can cite theirs. Funnily enough, we already have a system in science
for using the work of others while still giving them credit. Yet the vigor with which some people object to data sharing for
fear of having their soul stolen would have you think that the concept of
“citation” is a radical idea.

To help motivate data sharing, journals should mandate
sharing of raw data, and crucially, processed data and analysis scripts,
together with basic guidance notes on how to repeat analyses. It’s not enough
just to share the raw MR images – the Journal of Cognitive Neuroscience tried
that some years ago and it fell flat. Giving someone the raw data alone is like
handing them a few lumps of marble and expecting them to recreate Michelangelo’s
David.

---

What happens when you add all of these problems
together? Bad practice. It begins with questionable research practices such as p-hacking
and HARKing. It ends in fraud, not necessarily by moustache-twirling villains,
but by desperate young scientists who give up on truth. Journals and funding
agencies add to the problem by failing to create the incentives for best practice.

Let me finish by saying that I feel enormously sorry for anyone whose lab has been
struck by fraud. It's the ultimate betrayal of trust and loss of purpose. If it
ever happens to my lab, I will know that yes the fraudster is of course
responsible for their actions and is accountable. But I will also know that the
fMRI research environment is a damp unlit bathroom, and fraud is just an
aggressive form of mould.

23 comments:

You forgot what is likely the biggest problem in fMRI - the data. The data and the processing of the data - AKA the methods. For example, subject motion is a huge problem that will not be solved by any of your proposed solutions. Start with the data first.

In problem 3 you stated that "Even the simplest fMRI experiment will involve dozens of analytic options, each which could be considered legal and justifiable." The problem is that researchers in this field are not aware that your statement is incorrect. Most of the preprocessing steps are not "justifiable" in that many are known to be wrong in principle and many are simply unverified.

Thanks that's a good point. To be clear, I didn't say they were justifiable - I said they could be considered to be justifiable. Big difference. And clearly they are considered to be justifiable by many scientists (and peer reviewers) or we wouldn't have a problem.

This is a great post. Just wanted to give a heads up that it didn't load correctly for me: the text is aligned left so that a good chunk of it is over the dark background and almost unreadable. Anyone else have that problem? I'm using chrome.

Great post. But I'm not sure I understand the solution to problem 4. We currently have a faculty search with 200 applicants. How can we efficiently create a shortlist of applicants without using heuristics. Read the papers? In practice, no. Most of us went into science because we love our research, mentoring and teaching, not because we want to spend all our evenings and weekends evaluating others. I agree on the problem I'm not sure I see an easy solution. I'd love to hear other ideas.

I'm not arguing against the use of metrics per se. The fact is that metrics are essential for judging some aspects of science (and scientists), particularly by non-specialists. But we need to recognise the limitations of metrics and choose the best possible ones. Journal level metrics are terrible indicators. There is no correlation, for instance, between journal impact factor (IF) and the citation rates of individual articles - but there is a correlation between IF and retraction rates due to fraud.

In terms of shortlisting down from 200 applicants, then for research potential I would focus on article level metrics, h-index, and the m-value (the rate of increase in h-index). I might also ask candidates at the initial application stage to write a short section on how often, and in what contexts, their work has been independently replicated by other research groups.

These aren't perfect indicators by any stretch. There really is no substitute for having a specialist read the work, but article level metrics are much better than assessing candidates based how often they publish in prestigious high IF journals that, more than anything, are slaves to publication bias.

Just look here: http://en.wikipedia.org/wiki/List_of_scientific_publications_by_Albert_Einstein his papers in good chunk are quite short. Not saying that they are lacking in content, but physics is quite different form neuroscience.

Really nice post. I tweeted it, along with a question--did you think some of the same arguments can be made about my method of choice, EEG/ERPs? I think so, but I wondered if you have given this any thought?

Thanks Michael. I think all of these arguments (except #1) hold for EEG. You might be interested in this comment by Matt Craddock: http://blogs.discovermagazine.com/neuroskeptic/2012/06/30/false-positive-neuroscience/#comment-795790144.

This post was on the tendency to do multi-way ANOVA in ERP studies, without correcting for the number of effects and interactions bit.ly/1aBb8Gz. I've seen as many as 5 or 6-way ANOVAs in that field, which is really setting yourself up for finding spurious effects.The processing pipeline flexibility also applies in ERP: it's accepted practice by many to select your filter, time window, electrode for analysis, method for identifying peaks etc after scrutinising the data. Referencing method can also dramatically alter the results. It gets worse still if people start analysing frequency bands, where results can depend heavily on things like method of wavelet analysis, and there are lots of ways of defining frequency bands. This paper says a bit about this kind of issue in the context of MMN: 1.usa.gov/LpQHpK A lot of people in the ERP field really don't recognise the problem: I've been asked by reviewers (and editors), for instance, to analyse a different electrode because 'it looks like something is going on there', when instead I've based my selection on prior literature.

The first sentence in "Problem 2" does not make sense. You say that 'Evidence from structural brain imaging implies that most fMRI studies have insufficient sample sizes....' You then cite a paper that included a large number of neuroscience studies (many of them structural imaging studies). Structural and functional brain data are not modeled or analyzed in a similar way. The techniques are much different. I have no doubt that you could make an argument that fMRI studies are underpowered (the argument has been made before), but your current point is a little disingenuous.

The biggest problem is the error in the data. The next biggest problem is the error generated by processing of the data. Statistical problems are tertiary.

Fixing the data will mean more expensive scanners and peripheral hardware. We have to stop buying our hardware right off the shelf of diagnostic imaging manufacturers. Building costume hardware to meet the specifications necessary for sufficient sensitivity is the norm in science.

Also fixing the data may mean that the subjects of investigation be restricted to a much smaller subset - ones that can tolerate head restraint sufficient to decrease motion generated error sufficiently. What is sufficient that? That needs much investigation.

"if we worked together". Yes, and no. First, fewer and bigger teams and huge projects can also mean that we are putting all our eggs in one basket. The US has tons of megateams, for example, but if you look at the data, the UK is doing much better than the US in terms of money-spent/productivity ratio. In fact, the UK is doing better than anyone else. So, arguing that we have to save money by having fewer, bigger teams etc is misleading. What is needed is more funding. Second, we have greed and a credit allocation problem. The big fish want to be even bigger. Since they are big, they feel they can impose their wishes on everyone else. Therefore, there is no incentive for smaller and creative teams to join a mega team who would push to get control and credit for the entire project and subsequent funding stream. Third, what the Government, and most people, don't get is that science as a whole is the best algorithm we have to gather new knowledge about the world. What societies should fund is the algorithm itself, not select some of the little units (i.e., scientists) that implement the algorithm. All units are necessary, as whole, to implement knowledge discovery. Think of it as the algorithm implemented by a colony of ants to forage. The algorithm works as a whole, even though most explorer ants discover nothing at all. It's not their fault, but in doing their share, they contribute to the implementation of the algorithm. Societies need to understand that most experiments do not work in reality, but the system works as a whole. Some scientists will be lucky and will run into something important. Most won't. And it's nobody's fault, and it should not determine promotions, or redundancies, especially not on a short time scale. Without a system that understands this basic fact, anything else is just a band-aid.

Thanks, great comment. There is much to agree with here. The main downside I see with small studies is low power, which limits the ability to answer any questions at all. So there is a trade off between, on the one hand, preserving a tapestry of creativity and innovation by supporting lots of small groups, and on the other hand answering anything at all. Answering questions is what big studies do best. But of course, whether those questions are the right questions - and whether large groups stifle innovation - is another issue entirely. I'm not sure they do, but I agree it is a question worth asking.

Problem 2 -- fMRI reliability isn't that poor, the way people look at reliability is! in a paper last year http://www.sciencedirect.com/science/article/pii/S1053811912010890 we show that you can get descent reliability (and yes I think it is worth looking at 'raw' data, beta, T, thresholded maps rather than only one ; and ICC as the last useful measure) anyhow your link point to something of interest - as in our paper some paradigm have low reliability some don't ..and the causes can be different

Sorry, I'm about two months late to come across this post. Web-wise, I'm still in the late 90's.

I agree with most of your points, although I share the reservations about fewer larger-scale studies that other commenters pointed out. However, I was a bit surprised to see that a search for the word "theory" on this page returned zero hits. I think the statistical power and multiple comparisons issues will only ever get worse, and boosting the N won't really help. Consider that in a small number of years, a standard single-subject dataset may include 1+ million voxels sampled at 2 Hz, and standard data analyses will include mass-univariate, connectivity, space-based analyses like MVPA, and probably some others. Thus, the "effective power" will get much smaller even if the number of subjects increases. I write "effective" power because, to my knowledge, power analyses are done using a single statistic (e.g., t-test), but let's be realistic here: If you (don't worry, not the accusatory "you," the general "you") want a finding that doesn't show up in a mass-univariate analysis, you'll also try connectivity (perhaps PPI, perhaps DCM), MVPA, etc. Thus, I would argue that boosting the N won't really help address the issue of the unreliability, low power, gooey interpretability, and large multiple-comparisons problem.

Instead, I believe the problem is theoretical. Cognitive neuroscience is largely (though not entirely) deprived of useful and precise theories. A "soft" theoretical prediction that brain area A should be involved in cognitive process X is easy to confirm and difficult to reject. The level of neuroscience in cognitive neuroscience has not increased much in the past two decades, despite some amazing discoveries in neuroscience. FMRI data will become richer and more complex, and the literature will receive more and more sophisticated data analyses. If theories are not improved to be more precise and more neurobiologically grounded, issues of low power and multiple comparisons will only increase.

Another latecomer to this discussion. Interesting points raised but not all of them are unique to neuroimaging (more on that below).

Cost: The idea that fMRI is expensive is often raised, but if you've ever opened a bioscience supplier catalogue you'd realize that in relative terms, fMRI is not all that costly (certainly not anywhere near the costs of particle physics, for instance), and indeed the bulk of most grant funding is in staff, not scanning. There is however a problem with traditional ways to charge for scanner time, in that most centres set prices high enough to recover their costs on the assumption that studies use only a small number of subjects - and doing so actively discourage researchers from including large enough samples. A much better model would be to charge a fixed price for a given project and within that allow unlimited (within reason) scanning toward that project. Such an approach would go some way toward addressing the lack of power. After all, a scanner costs no more to run than to keep on standby, and many MRI scanners are chronically underused (for example at night). Unfortunately I don't know of any scanning centre in the UK or elsewhere that use this model.

Looking at the rest of the points, none of them really have anything to do specifically with fMRI. For example, there is no doubt that pressures to publish provide massive incentives toward fraud but as far as I can tell it is no more widespread in the fMRI community than elsewhere (but your point about the pressures to publish in high-impact journals is absolutely right and is definitely a major problem).

In fact, most of the critique above relates to who does the research and the way they do it, not the technique itself. And much of it can be summarised as simply being "poor science" or "science poorly done". Pre-registration to me sounds like admitting that neuroimaging scientists are unable to do honest science unless they are shamed into it and sets a slightly disturbing precedent (ie if you choose not to pre-register your study, then that must mean you are a fraud). Surely it would be better to train people how to do statistics properly?

But I think Mike Cohen made a very good point that the issue is the lack of neuroscience underpinning much of fMRI work is at the core of the problem. On that note, I was a bit surprised that there was no mention of what in my mind is by far the biggest issue with the method - that the signal measured is only indirectly related to neural activity, represents a population average, and is very difficult to link with more direct measurements - issues that no amount of data sharing or pre-registration will address. There is an urgent need for more research in this field, but unfortunately the bulk of fMRI researchers seem only too happy to ignore these issues.

But even if one had a good and reliable way of linking fMRI data to neural activity, the application of cognitive science models to studying brain function is only going to be as useful as the extent to which those models actually map on to neural processing mechanisms. This implies that there needs to be a willingness to recognise that many of those models are likely to be fundamentally incorrect (other than as purely descriptive ones). Indeed it is probably not much of an exaggeration that the majority of the poor fMRI studies that people focus on are just those that blindly set out to test some favourite cognitive science model. Conversely, the best imaging tends to be that which is tightly linked to neuroscience. Fundamentally, this is a problem with psychology itself which needs to embrace neuroscience rather than turning its back on it (the neuron 'envy' that Ramachandran talks about), and that psychologists need to learn and understand neuroscience (but conversely, neuroscientists need to get over their knee-jerk disdain for neuroimaging as a method and accept that it has some small virtues).

About Me

I'm a psychologist and neuroscientist at the School of Psychology, Cardiff University. I created this blog after taking part in a debate about science journalism at the Royal Institution in March 2012.
The aim of my blog is give you some insights from the trenches of science. I'll talk about a range of science-related issues and may even give up a trade secret or two.
Stay tuned!
You can follow me on Twitter: @chrisdc77