Annals of Interesting Peer Review Decisions

by Henry on December 7, 2011

Tom Bartlett describes the efforts of two psychologists to publish replication results for an article, which had purported to show that people could use ESP to predict whether they would be shown erotic pictures in the future. The replication found no observable effect, but (according to the authors’ account of it)had a difficult time finding a publisher.

bq. Here’s the story: we sent the paper to the journal that Bem published his paper in, and they said ‘no, we don’t ever accept straight replication attempts’. We then tried another couple of journals, who said the same thing. We then sent it to the _British Journal of Psychology,_ who sent it out for review. For whatever reason (and they have apologised, to their credit), it was quite badly delayed in their review process, and they took many months to get back to us.

bq. When they did get back to us, there were two reviews, one very positive, urging publication, and one quite negative. This latter review didn’t find any problems in our methodology or writeup itself, but suggested that, since the three of us (Richard Wiseman, Chris French and I) are all skeptical of ESP, we might have unconsciously influenced the results using our own psychic powers. … Anyway, the BJP editor agreed with the second reviewer, and said that he’d only accept our paper if we ran a fourth experiment where we got a believer to run all the participants, to control for these experimenter effects. We thought that was a bit silly, and said that to the editor, but he didn’t change his mind. We don’t think doing another replication with a believer at the helm is the right thing to do … [the] experimental paradigms were designed so that most of the work is done by a computer and the experimenter has very little to do (this was explicitly because of his concerns about possible experimenter effects).

Although the Bartlett piece doesn’t make this suggestion, I can’t help wondering whether the reviewer was one of the authors of the original piece. Myself, I’ve had a couple of interesting interactions with editors over the years, but nothing that even comes close to matching this. I suppose you could make an argument that if you think that psychic powers are plausible subjects of investigation, you have to account for the possibility of compromising psychic effects, but the potential for abuse (e.g through claims about ever-more speculative ways in which the experiment could be compromised) is obvious.

Share this:

I once had an article on a sex work-related topic sent off for review to a known anti-sex work campaigner. Their review recommended rejection because we didn’t write the article from an anti-sex work perspective. Data, analysis, public health theory, all irrelevant. We used the term “sex worker” instead of “prostituted woman” so …

On the other hand, my partner once was tasked with setting up a peer reviewed journal in a very broad field, and she had to find peer reviewers from scratch for a new journal with a very wide remit. Inevitably, if she didn’t get very clear recommendations from the editorial board (most of whom were asleep) or the authors themselves, there was a risk she would select exactly the wrong reviewer. She was clever enough to avoid this mistake, but I think in journals with broad remits and unhelpful boards, this will happen. Maybe that’s what happened here…? Or maybe psychology is just full of crackpots.

Which RPG was it in which you could have an “Atheist” skill that worked to counter the various cleric or magical skills? Simply having a high-level atheist in the same room as someone who could normally reliably heal someone with deity-granted powers could cause their powers to fail. So there is precedent for this idea in geek culture. Maybe the reviewer figured that psychology is more like an RPG than a science anyways.

I think there is backstory here – Wiseman in particular has on several occasions in the past objected to parapsychology studies because of the involvement of people who were strong believers and somebody might be getting their own back.

In general, though, accounts of the peer review process from somebody who has had a paper rejected are about as reliable as a farmer’s accounts of a legal dispute over the location of a fence. The sentence “the reviewer had no real problems with our methodology or analysis, but …” has been used to introduce some hooleys.

A referee for a political philosophy journal recommended rejection of a paper of mine for its “conservative political implications” (these, in the referee’s imagination imho). To be fair, the paper was correctly rejected for other and better reasons, contained in another report. But I was mildly shocked that an anonymous reviewer was prepared explicitly to list “conservatism” as a reason for rejection.

This is an illustration of the futility of experimentation in the absence of theory. Because there’s no theory about what exactly ESP is and how it works, there’s no experimental result that can’t be explained in some ad-hoc manner.

Seems to me the real problem illustrated here isn’t the reviewer’s response, it’s the widespread unwillingness to publish replication studies. How is science supposed to work if there’s no outlet for something as fundamental as that? Shouldn’t all journals have a replication corner or something like that?

“Shouldn’t all journals have a replication corner or something like that?”
To chime in with the last two, nah – that’s not sexy. Science (at least with regards to publication) is no longer about confirmation; it’s more about New and Groundbreaking Results. I can’t help but feel that the (to me) surprising number of retractions over the past 5 years and more is due to this. Greater competition for shrinking amounts of funding and all that.

in public health I think replication is challenged by ethics. If a well designed study finds a treatment works then it’s really hard to justify replicating the experiment ethically, because you’re expected to give the proven treatment to the control group as well.

If parapsychology researchers want to force their random results into the mainstream they should try public health. Keep plugging away until one of your p values proves that your psychic powers are a boon to public health, and no one will ever be able to refute you – on ethical grounds.

sg states: “… replication is challenged by ethics. If a well designed study finds a treatment works then it’s really hard to justify replicating the experiment ethically, because you’re expected to give the proven treatment to the control group as well.”

Unfortunately, the results of what appear to be well-designed medical studies often cannot be replicated. Initial studies generate enthusiasm, which wanes as further studies contradict these optimistic results. So, it is best to be skeptical until more supporting data are available.

The problem is that the results of medical research are influenced by the study protocol, patient population, hidden biases, etc., etc.

“This latter review didn’t find any problems in our methodology or writeup itself, but suggested that, since the three of us (Richard Wiseman, Chris French and I) are all skeptical of ESP, we might have unconsciously influenced the results using our own psychic powers. … Anyway, the BJP editor agreed with the second reviewer, and said that he’d only accept our paper if we ran a fourth experiment where we got a believer to run all the participants, to control for these experimenter effects.”

…and when that fourth experiment is added the reviewer will of course complain that the authors unconscious psychic powers (UPP) influenced that too: in selecting the believer the UPP had a spillover effect on the believer which in turn distorted the results. Clearly, a believer is needed to select the believer. (Begin regress!).

To avoid that the reviewer and the journal editor has a justificatory task of providing reason to believe that UPP can only work in one interpersonal step, not two. Or three, or…

“If parapsychology researchers want to force their random results into the mainstream they should try public health. Keep plugging away until one of your p values proves that your psychic powers are a boon to public health, and no one will ever be able to refute you – on ethical grounds.”

Why pick on Public Health? If there was a International Tribunal for Crimes Against the Students t-Test, most medical journal articles would get convicted.

Poor Student. All he was trying to do was improve batches of Guinness, and he (or she, as he was anonymous) ended up responsible for the infinite number of “Eating Strawberry Ice-Cream Reduces Epileptic Seizures,” headlines from credulous journalists caused by crappy Med Journal articles written by MD’s torturing data into Minimum Publishable Units.

Replication studies would seem to be an obvious place for experimentation with new approaches to peer review and online publication outside the established, largely commercialized structure. This work is essential yet carries no commercial value and very little academic prestige; on the other hand, a near-guarantee of some form of publication may have a certain attraction. Scholars write book reviews, after all.

It’s not just that failed replications get rejected; in some cases they are suppressed or discarded by the authors or funders. What is required is a registry of empirical research with which all experiments are to be registered, with a complete spec of the methods to be used, hypotheses to be tested, calculations to be run (etc), before the experiment is run. As results are obtained or calculations performed (etc), results must be lodged with the registry.

The registry could then keep records of all replications, too. In an electronic age, it ought to be possible to keep copies of all data and documentation along with the experiment. Administratively speaking, while not trivial the project would certainly be relatively simple.

Experiments not so registered, or for which not all of the pre-specified information would be regarded as methodologically compromised. Likewise researchers who fail to supply data for experiments registered. Last minute changes would be visible and their consequences for the integrity of the experiment assessed.

This would impose no great burden on researchers and could improve the integrity of scientific work no end, since it would reduce the scope for bias conscious or unconscious, in particular such practices as cherrypicking statistical tests and various other Texan practices. I imagine a single repository for scientific data would be very useful in all sorts of other ways too.

Replicating previous experiments might be useful as a training exercise for third year undergrads, MSc students or something – presumably in most cases exactly re-running an experimemt will be much more straightforward than designing, developing and performing one for the first time.

One of the joournals in monetary economics (J of Monetary Econ or J of Money, Banking, and Credit) requires authors to submit their date (etc.) when the paper is submitted. I know people who have their PhD students do replications as a part of their course work. I also recall (no data) that this has led to at least a couple of retractions.

1) why on earth would anyone care about ESP enough to work on it ? You could also waste your time on phlogiston, angels on the head of a pin, aether, etc…but shouldn’t you have better things to do ?
2) this argument is about quality, not philosophy of science: if you are doing quality science, your work is solid (although I’m an experimental molecular biologists, so I can replicate things before I send the MS off )
[I assume most readers here know the statistics on citation rates; allowing a large heaping of salt for Garfield’s ego, still, the avg citation rate is, what <1 ?
what does that tell you ?
3) Journals don't publish nonreplication studies cause it would bring up point 2 above, and suggest that the journal's papers are poor

why on earth would anyone care about ESP enough to work on it ? You could also waste your time on phlogiston, angels on the head of a pin, aether, etc…but shouldn’t you have better things to do ?
this argument is about quality, not philosophy of science: if you are doing quality science, your work is solid (although I’m an experimental molecular biologists, so I can replicate things before I send the MS off )
[I assume most readers here know the statistics on citation rates; allowing a large heaping of salt for Garfield’s ego, still, the avg citation rate is, what <1 ?
what does that tell you ?
Journals don't publish nonreplication studies cause it would bring up point 2 above, and suggest that the journal's papers are poor

I think the just solution would be organizing a boycott of Journal of Personality and Social Psychology until they agreed to publish the non-replication paper. The moral being, journals that publish garbage should pay for the clean up.

The interesting implication of the BJP comments is that “psychic powers” or the belief of the experimenter could influence the results. Given the experimenters followed standard social psychology procedure (e.g. the computer doing most of the work), and they clearly are agreeing with the original Bem conclusions, doesn’t it follow that most of Social Psychology could therefore be up for critique on the grounds that most experimenters do have a hypothesis in mind before running a series of experiments. I mean, isn’t that why it’s called hypothesis testing?weight loss los angeles

ezra abrams @29 2) this argument is about quality, not philosophy of science: if you are doing quality science, your work is solid

I’m not sure exactly what this is supposed to mean, but it looks wrong. Is it: if your science is of good quality, there is no need to replicate it? Or even: good science shines through so that everyone can see there is no need to replicate it? At the very least this obviously can’t be the case wrt statistical findings.

QB @33 – I’d be interested too, but don’t have access.

In general, isn’t the proposal, in outline anyway, one that every scientist would publicly agree with (even if they may privately fear or resent the impact on their own practice)? If so, can’t we get some kind of campaign group going? Canvass for general support and interest, develop the proposal, work out issues of aegis, funding etc…

How should we promote publication of data that can be replicated and/or reproduced? The articles in the special section on Data Replication & Reproducibility propose a number of possibilities that target funders, journals, and the research culture itself. What do you think?

Ideally, scientists would fully disclose their own raw data and methods and also spend time replicating others’ work. What would best ensure this good behavior?

Recognition and rewards from institutions 7.45%

Funding earmarked for replication studies 20.42%

More publication by journals of data that confirm or refute previous work 43.52%

Interestingly, I saw Daryl Bem give a talk recently on experimenter effects in precognition research. The reviewer’s comments may not be very far off. When it comes to replication of these findings, whether the experimenter is a “believer” or not tracks very closely with whether the precognition effects were replicated or not.

In the end, I went into the talk ready for some real drama and Bem’s talk, actually, came off quite reasonable. He presented the findings he had in a very “matter of fact” way and then strongly encouraged replication. He makes all of his experimental materials available online. I’m surprised that the null results were received so poorly (given the response of the field to the original JPSP paper), but I’m not surprised that you got the feedback you did re: experimenter effects.

Additionally, now that I have read over some of the other comments, is it that surprising that psych journal have a habit of not publishing null results given the possibility of Type II errors in the field? Alpha is always set at .05 or lower, but beta is always left free to vary. It’s hard to take null results seriously when most studies are severely under powered.

I haven’t read your paper, but I wonder whether you did a power analysis to make a strong argument for “proving” the null hypothesis. Did you try looking at the effect size of your results and arguing that if would take X thousand more participants to make it significant? This wouldn’t get around the “experimenter effects” argument, but would make your null effects more believable.

As an aside, I have found it very interesting to look at the recent precognition debate in psychology through the lens of how the field of physics has changed its view on the possibility of time travel in recent years. Not so long ago physicists who though time travel was possible were considered crack pots, but now is seems to be a generally accepted idea that it may be possible (says the non-physicist). I think it provides an interesting take on the recent precog findings in psychology). Perhaps precog research ought to still be relegated to the fringes of psychology, but maybe it shouldn’t be written off completely just yet.

Timothy Scriven @24 – any more info about the steps taken in health sciences?

QB @41 – thanks. I notice they seem to have messed the science up a bit by combining (at least) two distinct though connected desired behaviours: 1. ‘disclose their own raw data and methods’ and 2. ‘spend time replicating others’ work’.

Also not really clear why respondents have to pick only one option rather than rank the options or score them for effectiveness.