They are beloved by prestigious journals and the popular press, but many recent social neuroscience studies are profoundly flawed, according to a devastating critique – Voodoo Correlations in Social Neuroscience – in press at Perspectives on Psychological Science (PDF).

The studies in question have tended to claim astonishingly high correlations between localised areas of brain activity and specific psychological measures. For example, in 2003, Naomi Eisenberger at the University of California and her colleagues published a paper purporting to show that levels of self-reported rejection correlated at r=.88 (1.0 would be a perfect correlation) with levels of activity in the anterior cingulate cortex.

According to Hal Pashler and his band of methodological whistle-blowers, if Eisenberg’s study and others like it were accurate, this “would be a milestone in understanding of brain-behaviour linkages, full of promise for potential diagnostic and therapeutic spin-offs.” Unfortunately, Pashler’s group argue that the findings from many of these recent studies are virtually meaningless.

The suspicions of Pashler and his colleagues – Ed Vul (lead author), Christine Harris and Piotr Winkielman – were piqued when they realised that many of the cited levels of correlation in social neuroscience were impossibly high given the respective reliability of brain activity measures and measures of psychological factors, such as rejection. To investigate further they conducted a literature search and surveyed the authors of 54 studies claiming significant brain-behaviour correlations. The search wasn’t exhaustive but was thought to be representative, with a slight bias towards higher impact journals.

Pashler and his team found that 54 per cent of the studies had used a seriously biased method of analysis, a problem that probably also undermines the findings of fMRI studies in other fields of psychology. These researchers had identified small areas of brain activity (called voxels) that varied according to the experimental condition of interest (e.g. being rejected or not), and had then focused on just those voxels that showed a correlation, higher than a given threshold, with the psychological measure of interest (e.g. feeling rejected). Finally, they had arrived at their published brain-behaviour correlation figures by taking the average correlation from among just this select group of voxels, or in some cases just one “peak voxel”. Pashler’s team contend that by following this procedure, it would have been nearly impossible for the studies not to find a significant brain-behaviour correlation.

By analogy with a purely behavioural experiment, imagine the author of a new psychometric measure claiming that his new test correlated with a target psychological construct, when actually he had arrived at his significant correlation only after he had first identified and analysed just those items that showed the correlation with the target construct. Indeed, Pashler and his collaborators speculated that the editors and reviewers of mainstream psychology journals would routinely pick up on the kind of flaws seen in imaging-based social neuroscience, but that the novelty and complexity of this new field meant such mistakes have slipped through the net.

‘…[I]n half of the studies we surveyed, the reported correlation coefficients mean almost nothing, because they are systematically inflated by the biased analysis,’ Pashler’s team wrote. Perhaps unsurprisingly, among the papers they surveyed, it was the papers that used this flawed approach that tended to have published the highest correlation figures. ‘…[W]e suspect that while in many cases the reported relationships probably reflect some underlying relationship (albeit a much weaker relationship than the numbers in the articles implied), it is quite possible that a considerable number of relationships reported in this literature are entirely illusory.’

On a more positive note, Pashler’s team say there are ways to analyse social neuroscience data without bias and that it should be possible for many of the studies they’ve criticised to re-analyse their data. For example, one approach is to identify voxels of interest by region, before seeing if their activity levels correlate with a target psychological factor. An alternative approach is to use different sets of data to perform the different steps of analysis used previously. For example, by using one run in the scanner to identify those voxels that correlate with a psychological measure, and then using a second, independent run to assess how highly that subset of voxels correlates with the chosen measure. “We urge investigators whose results have been questioned here to perform such analyses and to correct the record by publishing follow-up errata that provide valid numbers,” Pashler’s team said.

Matthew Lieberman, a co-author on Eisenberger’s social rejection study, told us that he and his colleagues have drafted a robust reply to these methodological accusations, which will be published in Perspectives on Psychological Science alongside the Pashler paper (now available online; PDF). In particular he stressed that concerns over multiple comparisons in fMRI research are not new, are not specific to social neuroscience, and that the methodological approach of the Pashler group, done correctly, would lead to similar results to those already published. “There are numerous errors in their handling of the data that they reanalyzed,” he argued. “While trying to recreate their [most damning] Figure 5, we went through and pulled all the correlations from all the papers. We found around 50 correlations that were clearly in the papers Pashler’s team reviewed but were not included in their analyses. Almost all of these overlooked correlations tend to work against their hypotheses.”_________________________________

17 thoughts on “Do you do voodoo?”

I am eager to see the Lieberman reply. With respect to his comments as reported in this blog:“In particular he stressed that concerns over multiple comparisons in fMRI research are not new”The Vul et al criticism was not an issue of multiple comparisons, but of authors employing a statistical workflow that amounts to cherry-picking results.“are not specific to social neuroscience”Vul et al acknowledge that their focus on the social neuroscience is a product simply of their acquaintance with the field, not a hypothesis that social neuroscience would be particularly prone to the errors they reveal.“and that the methodological approach of the Pashler group, done correctly, would lead to similar results to those already published.”Vul et al acknowledge that some studies, when re-analyzed properly, may obtain similar results as in the original reports. Then again, some studies may obtain substantially different results when the proper analyses are employed. The point is that until such reanalysis is done, we cannot know which of the published results to trust.

There is an epistemological crisis in the sciences, especially the social and medical sciences. It is simply not possible to rely, even tentatively, on the results of studies published in even the premier journals. The fact is that scientists and researchers are experts in their subject matter, not in the methodology of statistics-based research and analysis, a field in which they might have taken a course or two back in grad school.Peer review is utterly inadequate to the task of ensuring the methodological soundness of published research. In my view every serious academic journal must put on its payroll a professional statistician trained in methodologies of social or biological research. It is long past time for us to stop treating this complex and subtle subject as something for real scientists to pick up in their spare time.

After reading the Vul et al. paper, I was initially convinced by their argument and thought for sure the authors were on to something. However, after reading some of the papers they criticize, I find that I disagree with many of the arguments put forth by Vul and colleagues. Notably, they claim that many papers used a two-step inferential procedure to essentially cherry-pick their results. However, upon closer reading, this did not appear to be the case, at least in the sample of papers I looked over. What I found was that many papers conducted a whole brain correlation of activation with some behavioral/personality measure. Then they simply reported the magnitude of the correlation or extracted the data for visualization in a scatterplot. That is clearly NOT a second inferential step, it is simply a descriptive step at that point to help visualize the correlation that was ALREADY determined to be significant. Assuming they used appropriate corrections and had an a priori anatomical hypothesis, simply plotting the relationship at the correlated voxel(s) does not constitute a second inferential test, it simply reflects effective visual presentation of the data. So, after digging a little, I am much less convinced by the Vul argument. Secondly, simply targeting the papers because they report high correlations doesn’t make complete sense either. The criticized papers I read all used fairly stringent thresholds for significance, so it only makes sense that all the voxels that survived the alpha threshold would be highly correlated (i.e., as the p-value of a correlation is highly dependent on sample size and most fMRI studies are notoriously small because of the cost, the only correlations that will reach statistical significance will be quite large–it is impossible for a small correlation to be significant in a sample of 10 to 20 subjects). Even in a sample of 15 subjects, a correlation would have to exceed r = .5, to reach even the most inappropriately liberal uncorrected threshold of significance. Just because the correlations are large doesn’t make them untrue. Furthermore, the argument that the correlations are “impossibly” high is not as strong as it initially appears because Vul et al. made a LOT of assumptions about the reliabilities of the tests used. Some of the psychometric tests used in the criticized studies I read have shown high reliability coefficients, some reported to exceed .90. Some studies have also reported reliabilities of fMRI that exceed .90 as well, so it is plausible that at least SOME of these studies could conceivably find correlations approaching .90. Finally, the somewhat arrogant tone the Vul paper takes began to grate on me a little as I read it. Perhaps they are right in their arguments, but clearly if half of all of the studies they surveyed were using the particular methods they criticize, clearly the correct methods have not been as obvious as they suggest. I am not yet fully convinced that these studies should be so quickly given the derogatory label of “voodoo” and “worthless” that Vul and colleagues so easily throw around. Using such labels does little to help scientific discourse and reduces the enterprise to what appears to outsiders to be playground name-calling. Scientific scrutiny is vital, but such mudslinging and name-calling is petty.

Burn the witches! Burn the witches! I have to agree with the previous post about the unprofessional and tabloid headline approach of the Vul paper and the flaws in their statistical arguments, but I have a slightly different perspective to add. The whole debate on the blogosphere is taking on the feeling of a witch-hunt, which is starting to concern me. I know of a couple of colleagues who have now expressed fear about submitting their methodologically sound research for publication because they too use correlational methods and are concerned that they also might be unjustifiably attacked. Similarly, this week I was just in the process of reviewing what appeared to be a very good manuscript for a journal when, unexpectedly, the authors chose to withdraw the manuscript over similar apprehensions. This paranoia is ridiculous and I fear it might be having a chilling effect on truly creative research—the kind of research that has led to some of the most fascinating advances in our understanding of brain function. The past 15 years of functional imaging research of social-cognitive-emotional phenomena, including some of the accused “voodoo” papers, has led to extraordinary advances in our understanding of the brain and consciousness. Of course, continued refinement of our research and statistical analysis methods is an integral part of the scientific process, and measured discussion about improvements in analysis approaches is welcomed. However, the sensationalized and somewhat aggressive finger pointing approach of the “Voodoo” paper has the potential to harm true scientific progress. Many of the posts on blogs supporting the Vul paper seem to have a Lord of the Flies lynch mob feel and suggest that there is actually very limited understanding of the true statistical issues, many of which Vul and colleagues unfortunately seem to have misconstrued. Instead, what the public (and less versed neuroscientists) seems to be taking from all this is that “all neuroimaging is worthless”. The lay public, and unfortunately, many journal reviewers and funding agencies are likely to have little understanding of the very subtle statistical arguments that Vul et al criticize, the limitations of their arguments, and the quite legitimate counterpoints that have recently been expressed by some of the attacked papers. Instead, I fear that reviewers and funding agencies will only walk away with the tabloid headline message that “all fmri is crap.” Scientific discourse needs to be measured and thoughtful, not sensational, inflammatory, and finger wagging. In a climate where research funding is already scarce, it seems foolish for us to stand in the public square pointing a finger and yelling “Voodoo! Burn the witches”.

Interesting debate, but the whole non-independence error that the Vul paper uses as the backbone of their argument doesn’t sit well with me. Maybe I’m off base, but if the regional hypotheses were specified a priori and based on some theory, then running a correlation and seeing if there are clusters of activation (above some pre-set alpha criterion) in the predicted region seems scientifically sound. If I understand the criticism, the Vul paper slams the red list studies for taking the next logical step of plotting the activity in a scatterplot or r-value. As for me, that is what I would want to see, so I’m not sure how the authors find fault with simple description of the effects. I think that Vul may be incorrect in claiming that is somehow biasing the statistics. The inferential step came before the extraction and plotting–that is what occurred at the voxelwise analysis.

I agree with Sandra. Vul et al. seem to believe that the procedure of completing a test then plotting the result is an instance of the “non-independence” error (from the in press chapter available on Vul’s website):“The most common, most simple, and most innocuous instance of non-independence occurs when researchers simply plot (rather than test) the signal change in a set of voxels that were selected based on that same signal change.”This is in fact not a problem and surely not “voodoo”.

While the intent of the Vul et al. paper was laudable and their call for tighter analysis methods is appreciated, they have made a number of errors in their analysis and (regrettably) targeted many papers that probably should not be on the “red list”. Vul et al. should have been more careful in determining whether a non-independence error actually occurred in the studies they targeted before making such sensational accusations. From what I’ve seen, the studies they targeted did not analyze their data in exactly the way they suggest. Only supermarket tabloids are supposed to get away with that level of sensationalized allegation without good evidence.

I have some rather extensive thoughts on this paper < HREF="http://neuroskeptic.blogspot.com/2009/02/voodoo-correlations-in-fmri-whose.html" REL="nofollow">here<>…In particular, I’d like to draw attention to the fact that Vul et al.’s main argument does not set out to establish that reported correlations are entirely spurious, but merely that the reported magnitude of these correlations is too high. In fact, the argument relies upon the fact that there do exist statistically significant correlations.However, Vul et al. also make a second, entirely separate argument, on page 18, which accuses an unspecified number of papers of falling prey to a different error (stemming from a failure to read a table) which would make their results entirely bogus.

@Neuroskeptic:I don’t think your point that “Vul et al.’s main argument does not set out to establish that reported correlations are entirely spurious” is accurate.The very title of the paper has the words “voodoo correlations” in it. Since Robert Park’s book, “Voodoo Science: The Road from Foolishness to Fraud,” the word ‘voodoo’ used in a scientific context has the strong connotation of being invalid and not merely inflated. Also, they repeatedly make claims that the correlations are “quite meaningless and could have been obtained with completely random data, “rather disastrous,” (p. 16), and “sure to be inflated to the point of being completely untrustworthy” (p. 20).

O.K. , I’ll be the one to say it!!! Matthew Lieberman and Naomi Eisenberger purposely commited research misconduct, solely out of pure greed! What is going on here, is that Matthew Lieberman and Naomi Eisenberger (from UCLA) chose to cheat their way up the ladder of Social cognitive psychology! They performed illeagal and unethical experimentations on human subjects, (which gave them a short-cut) in getting their answers to their clinical research questions! Then they made up an experimental procedure that was ethical …and that would convince us that their research study was valid and correct! This field is extremely popular! They wanted to be the first ones to get their hands on all of that Government Grant Money ,as well as the private investors money that is available for this specific field of research! Matthew Lieberman (Of UCLA) has been quoted saying ,that he has found a profitable way of doing his research!

For anyone interested, there was a public debate on Voodoo Correlations last fall at the Society of Experimental Social Psychologists between Piotr Winkielman (one of the authors on the Voodoo paper) and myself (Matt Lieberman). The debate has been posted online.

Dr. Matthew Lieberman and Dr. Naomi Eisenberger are guilty of horrible unethical research misconduct! They conduct environmentally manipulated unethical research experiments on unconsenting and unknowing individuals (in real time) while these individuals are living out their daily lives!

They have people , that will go out ,and deliberately manipulate a person’s (real-time) social environment in an extremely negative, unpleasant and intolerable way that causes that individual to be faced with problems that will cause them to suffer from long term social and emotional distress!

They purposely manipulate an individuals daily social environment in a negative way, so that they will begin to suffer extreme amounts of emotional and visceral pain on a daily basis …all, due to the problems that dr. Matthew Lieberman and dr. Naomi Eisenberger’s group has deliberately caused for them!

Their victims begins to suffer long episodes of rejection, isolation, ostracism, loss, abandonment and cannot find any social support. They are not told why or who has done this to them! Their lives are deliberately destroyed !They begin to suffer extreme amounts of pain all …so , Dr. Matthew Lieberman and his wife, UCLA’s Dr. Naomi Eisenberger can get a more original view of individuals suffering from social distress! The focus of their research experiments!

Dr. Matthew Lieberman and Dr. Naomi Eisenberger ‘ s research projects need to be investigated ,shut down and held accountable for the lives and health of the individuals that they have tortured and destroyed all for their own financial greed, job security and to manipulate a way to get their research published in journals!