The authors are Princetonians Michael T. Todd and colleagues, and the method in question is multivariate pattern analysis (MVPA). I’ve written about this before and there’s a blog dedicated to it. MVPA searches for relatively subtle patterns of brain activity, most commonly in fMRI data.

For example, a conventional fMRI study might compare activity when someone’s looking at a picture, compared to a blank screen, and would find increases of activity in the visual cortex. But MVPA might take two different pictures, and see if there’s a pattern of activity that’s unique to one picture over the other – even if overall activity in the visual cortex is the same.

Neuroscientists have fallen in love with MVPA (and related methods) over the past 5 years, mainly I think because it’s promised to let us ‘read’ the brain: to not just see where in the brain things happen, but to glimpse what information is being represented.

In the new paper, Todd et al make a very simple point: all MVPA really shows is that there are places where, in most people’s brain, activity differs when they’re doing one thing as opposed to another. But there infinite reasons why that might be the case, many of them rather trivial.

The authors give the example of two very similar tasks, A and B. We’ll say these are imagining apples and imagining bananas. You scan some people doing A and B. You run a standard fMRI analysis, and find that nowhere in the brain shows a difference in activity, on average, between the two (as expected – they are similar.)

So you do an MVPA analysis, and now you find an area where the pattern of activity predicts task. Hooray! You’ve found the part of the brain that encodes the nature of fruit, or something else exciting.

Not so fast. The problem is that it might be that some people (say) ate an apple for breakfast that morning and find imagining an apple easier, while others had a banana, and find apples harder. In which case, a part of the brain that lights up when you’re concentrating hard on something (and there’s a lot of those areas) would light up differently based on task, and would discriminate between the two within a given individual.

Here’s a hypothetical signal from a brain area that lights up when you’re working hard:

Generally, the problem is that MVPA will detect a signal from any area of the brain where activity differs according to how each individual performs the task, but this signal might be rather uninteresting.

What’s scary is that this problem is insidious. It could easily go undetected. Because if some people find A hard, and others B hard, they wouldcancel out in the average. It would be very easy to look at the average and conclude that difficulty was just not a factor in your experiment. But MVPA by its very nature looks at individual-specific things and doesn’t let them cancel out.

It’s one of those things that’s so obvious in retrospect. And it’s not just difficulty. If some people find some of your stimuli (let’s say) more boring, fun, unpleasant, attractive, or annoying than others, that could show up too.

Todd et al show this to be more than theoretical. In an fMRI dataset involving two tasks equally difficult on average, they show that MVPA detects several areas where activity patterns are correlated with task, but this disappears if you control for each subject’s reaction times, a proxy for how difficult they found it individually. Such a correction is rarely performed.

The authors conclude by saying that this is not a problem with MVPA, as such. The mathematics are sound and the method does ‘work’, but the trouble is what the results mean. As they write:

Our concern is limited to cases in which a brain region revealed by MVPA is interpreted as representing a particular cognitive variable or type of information e.g., “Brain region A represents information X”. In contrast, claims of the following form are not problematic: “Brain region A can predict behavior Y”

Maybe it’s a good thing that MVPA is so sensitive to subtle cognitive processes that it forces us (or at least these authors) to consider these not-so-subtle confounding factors more carefully.

petrossa

Yet another nail in the coffin of fMRI. Soon to be listed under Junkscience. There will need to be some serious retractions taking place.

JonFrum

If every paper that was later proven wrong had to be retracted, retractions would eventually equal new publications.

petrossa

if the underlying method of all papers is incorrect, retraction must follow. It’s not the paper proven wrong, it’s the fact that the method is incorrect, not retracting them would let stand many false conclusions with possible far reaching health consequences. This is not someone just making a mistake in a calculation.

jetzel

I think the way forward will involve more sensitivity testing for fMRI analysis (MVPA or mass-univariate): do the results stay (mostly) the same if we make small, reasonable changes to the analysis?

From your post, “Because if some people find A hard, and others B hard, they would cancel out in the average.” Yes, but only if the number of people finding A hard is more or less the same as the number of people finding B hard, or if there are fewer people finding A hard, but they find it more hard than the people that find B hard, etc.

We are so often analyzing for very, very subtle effects in very noisy data that no method is universally suitable or even “safe”.

Lots of people are aware of this problem, it’s just the primary solution (used in other fields) is largely untenable. All you need is to vastly increase the number of trials for each participant ad include enough conditions to allow the algorithm to correctly differentiate between your features of interest. If Google wants to train a pattern analyzer to detect specific images of cats that’s a lot easier because you can just show a million photos of cats an they animals and objects to it. For human subjects this doesn’t work so well.

Ultimately, the true test is to show good performance on many instances of a real-world task performed by single individuals. Neuroscience tends not to be very good at this given the enphasis on novelty rather than replcation or viability–although check out Tor Wager’s new paper in NEJM… Who knows maybe clinicians will put that idea to the test.

Mike Todd

I am the first author of this paper, and I just wanted to clarify that it is my understanding that collecting more data will not solve this problem. That is, if Variable 1 is truly correlated (e.g., confounded) with Variable 2, then collecting more data will still lead to spurious results. As far as collecting more “rich” data, again, I think the confound issue would still obtain. Consider extending the “task A vs task B” example that we used in the paper to a “rich data” scenario. Here, I think that you would say that laboratory control of “task” should be relaxed to produce a more natural, rich manipulation, so that over many such naturalistic manipulations and a lot of data, a common “task” representation could be faithfully identified by an MVPA method. But in the example, the difficulty confound would remain (and in fact, would likely become even more prominent as the common factor underlying naturalistic task manipulations)! So I’m not sure that either flavor of the “just get more data” proposal (i.e., collect more laboratory style data per user, or collect rich, naturalistic datasets) would actually resolve the confound problem.

Mark Stokes

I agree that binary differences are always hard to interpret. I prefer using a cross-generalisation approach to relate differences in one context to differences in another. In the example, it would make sense to train a classifier on actual perception of the apples and bananas. If this classifier can then predict imagery, you can assume that there is a systematic relationship between the representations (we did something like this here: http://www.ncbi.nlm.nih.gov/pubmed/19193903). You still don’t know exactly what the difference means, but you are getting closer than just showing a difference between two conditions (which you already know must differ in brain activity, insofar as there is a difference in mind). The key is to understand the differences, and that must be done by relating patterns of differences, and similarities, across as many contexts as possible to identify the (un)common elements.

http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

Thanks for the comment. I agree that if you can show a generalization of the patterns associated with the same stimulus across tasks (or contexts in general) then that would be a big help.

In your paper though it was visual stimuli and visual imagery which, MVPA critics might say, are especially ‘easy’ because of retinotopic mapping. It’s a big step from that to intentions or plans…

Mark Stokes

Actually, in our follow-up paper (Stokes et al., 2011, NeuroImage) we actually ruled out the role of retinotopic organisation

But yes, I totally agree, still plenty of big steps remain. The main thing, for any analysis approach, is to narrow down the possibility space for observed differences, and similarities.

Rajeev Raizada

The potential task-difficulty confound is definitely something worth worrying about when you are simply looking for some neural difference between two stimuli A and B.

However, a task-difficulty confound would be much less likely to produce a systematic structure of neural similarity relations between several different stimuli. For example, different types of visual objects (faces, chairs, shoes, cats, etc.) elicit fMRI patterns with systematic sets of similarity relations between them. These similarity relations turn out to be so systematic that they are preserved across different people and can be used to carry out across-subject neural decoding. http://www.ncbi.nlm.nih.gov/pubmed/22220728

There are many other examples of multivoxel fMRI patterns showing systematic structure, beyond just some same/different neural response. Tom Mitchell’s work is a really nice example: the multivoxel neural patterns elicited by 60 words are systematically related to the semantic properties of those words. http://www.ncbi.nlm.nih.gov/pubmed/18511683

Here again, there’s really no way I can see that a stimulus-difficulty confound could account for that. If the difficulty of mentally representing each word were somehow systematically related to the semantics of each word, across all 60 words, then it wouldn’t really be “difficulty” any more. It would be something to do with word-meaning.

In summary, the authors are absolutely right in saying that the reason why the brain treats stimulus A and B as different might not be the reason that your experiment is designed to probe. That’s why people worry about confounds in the first place. This new paper is a useful reminder to stay on the lookout, and it highlights one way in which confounds might sneak in. But this isn’t some all-encompassing knock-down against multivoxel pattern-based fMRI analyses. And if such analyses look for structure in neural representations, over and above just same/different neural discrimination, then this particular confound probably isn’t going to arise. There are all kinds of other ways that such studies might go awry (bugs in analysis code, bad experimental design, mistakes in stats, etc.), so we’ve got to stay on the lookout for those problems too. But that’s true for any experiment.

http://blogs.discovermagazine.com/neuroskeptic/ Neuroskeptic

Thanks for the comment; the authors (if I recall) agree with you that analyses such as the semantic one are less vulnerable to the problem, compared to straightforward A vs B studies.

However, I worry that even a sophisticated analysis could fall prey to simple confounds. For example consider a case where as well as difficulty, stimuli differed in a couple of other ways – positive vs negative valence, boring vs exciting. Then all of your stimuli would inhabit points in a 3 dimensional space (unique to each subject) and MVPA might be able to pick that position up, but it wouldn’t be especially interesting.

Rajeev Raizada

Thanks, interesting reply. I absolutely agree with you that there could be a multi-dimensional space of confounds. That’s why it’s essential to relate a neural similarity space to something external to it, which can act as an independent cross-check.

In the first paper that I mentioned, the cross-check is that different subjects have matching similarity spaces. Confounds would be unlikely to have the same structure across subjects (and if they did, then they would probably be containing meaningful neural information, and hence not be confounds at all).

Similarly, in the Tom Mitchell paper the word-elicited neural patterns are related to an independent cross-check, namely the co-occurrence statistics of words in a Google text corpus. Any set of “confounds” that also corresponded to text-corpus statistics must themselves contain linguistic information.

So, I agree with you that confounds could arise even outside of a two-stimulus same/different discrimination. But relating neural similarity structure to an independent measure is a good guard against that.

Discover Blogs

Neuroskeptic

No brain. No gain.

About Neuroskeptic

Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. His blog offers a look at the latest developments in neuroscience, psychiatry and psychology through a critical lens.