Three myths about scientific peer review

by Michael Nielsen on January 8, 2009

What’s the future of scientific peer review? The way science is communicated is currently changing rapidly, leading to speculation that the peer review system itself might change. For example, the wildly successful physics preprint arXiv is only very lightly moderated, which has led many people to wonder if the peer review process might perhaps die out, or otherwise change beyond recognition.

I’m currently finishing up a post on the future of peer review, which I’ll post in the near future. Before I get to that, though, I want to debunk three widely-believed myths about peer review, myths which can derail sensible discussion of the future of peer review.

A brief terminological note before I get to the myths: the term “peer review” can mean many different things in science. In this post, I restrict my focus to the anonymous peer review system scientific journals use to decide whether to accept or reject scientific papers.

Myth number 1: Scientists have always used peer review

The myth that scientists adopted peer review broadly and early in the history of science is surprisingly widely believed, despite being false. It’s true that peer review has been used for a long time – a process recognizably similar to the modern system was in use as early as 1731, in the Royal Society of Edinburgh’s Medical Essays and Observations (ref). But in most scientific journals, peer review wasn’t routine until the middle of the twentieth century, a fact documented in historical papers by Burnham, Kronick, and Spier.

Let me give a few examples to illustrate the point.

As a first example, we’ll start with the career of Albert Einstein, who wasn’t just an outstanding scientist, but was also a prolific scientist, publishing more than 300 journal articles between 1901 and 1955. Many of Einstein’s most ground-breaking papers appeared in his “miracle year” of 1905, when he introduced new ways of understanding space, time, energy, momentum, light, and the structure of matter. Not bad for someone unable to secure an academic position, and working as a patent clerk in the Swiss patent office.

How many of Einstein’s 300 plus papers were peer reviewed? According to the physicist and historian of science Daniel Kennefick, it may well be that only a single paper of Einstein’s was ever subject to peer review. That was a paper about gravitational waves, jointly authored with Nathan Rosen, and submitted to the journal Physical Review in 1936. The Physical Review had at that time recently introduced a peer review system. It wasn’t always used, but when the editor wanted a second opinion on a submission, he would send it out for review. The Einstein-Rosen paper was sent out for review, and came back with a (correct, as it turned out) negative report. Einstein’s indignant reply to the editor is amusing to modern scientific sensibilities, and suggests someone quite unfamiliar with peer review:

Dear Sir,

We (Mr. Rosen and I) had sent you our manuscript for publication and had not authorized you to show it to specialists before it is printed. I see no reason to address the in any case erroneous comments of your anonymous expert. On the basis of this incident I prefer to publish the paper elsewhere.

Respectfully,

P.S. Mr. Rosen, who has left for the Soviet Union, has authorized me to represent him in this matter.

As a second example, consider the use of peer review at the journal Nature. The prestige associated with publishing in Nature is, of course, considerable, and so competition to get published there is tough. According to Nature’s website, only 8 percent of submissions are accepted, and the rest are rejected. Given this, you might suppose that Nature has routinely used peer review for a long time as a way of filtering submissions. In fact, a formal peer review system wasn’t introduced by Nature until 1967. Prior to that, some papers were refereed, but some weren’t, and many of Nature’s most famous papers were not refereed. Instead, it was up to editorial judgement to determine which papers should be published, and which papers should be rejected.

This was a common practice in the days before peer review became widespread: decisions about what to publish and what to reject were usually made by journal editors, often acting largely on their own. These decisions were often made rapidly, with papers appearing days or weeks after submission, after a cursory review by the editor. Rejection rates at most journals were low, with only obviously inappropriate or unsound material being rejected; indeed, for some Society journals, Society members even asserted a “right” to publication, which occasionally caused friction with unhappy editors (ref).

What caused the change to the modern system of near-ubiquitous peer review? There were three main factors. The first was the increasing specialization of science (ref). As science became more specialized in the early 20th century, editors gradually found it harder to make informed decisions about what was worth publishing, even by the relatively relaxed standards common in many journals at the time.

The second factor in the move to peer review was the enormous increase in the number of scientific papers being published (ref). In the 1800s and early 1900s, journals often had too few submissions. Journal editors would actively round up submissions to make sure their journals remained active. The role of many editorial boards was to make sure enough papers were being submitted; if the journal came up short, members of the editorial board would be asked to submit papers themselves. As late as 1938, the editor-in-chief of the prestigious journal Science relied on personal solicitations for most articles (ref).

The twentieth century saw a massive increase in the number of scientists, a much easier process for writing papers, due to technologies such as typewriters, photocopiers, and computers, and a gradually increasing emphasis on publication in decisions about jobs, tenure, grants and prizes. These factors greatly increased the number of papers being written, and added pressure for filtering mechanisms, such as peer review.

The third factor in the move to peer review (ref) was the introduction of technologies for copying papers. It’s just plain editorially difficult to implement peer review if you can’t easily make copies of papers. The first step along this road was the introduction of typewriters and carbon paper in the 1890s, followed by the commercial introduction of photocopiers in 1959. Both technologies made peer review much easier to implement.

Nowadays, of course, the single biggest factor preserving the peer review system is probably social inertia: in most fields of science, a journal that’s not peer-reviewed isn’t regarded as serious, and so new journals invariably promote the fact that they are peer reviewed. But it wasn’t always that way.

Myth number 2: peer review is reliable

Update:Bill Hooker has pointed out that I’m using a very strong sense of “reliable” in this section, holding peer review to the standard that it nearly always picks up errors, is a very accurate gauge of quality, and rarely suppresses innovation. If you adopt a more relaxed notion of reliability, as many but not all scientists and members of the general public do, then I’d certainly back off describing this as a myth. As an approximate filter that eliminates or improves many papers, peer review may indeed function well.

Every scientist has a story (or ten) about how they were poorly treated by peer review – the important paper that was unfairly rejected, or the silly editor who ignored their sage advice as a referee. Despite this, many strongly presume that the system works “pretty well”, overall.

There’s not much systematic evidence for that presumption. In 2002 Jefferson et al (ref) surveyed published studies of biomedical peer review. After an extensive search, they found just 19 studies which made some attempt to eliminate obvious confounding factors. Of those, just two addressed the impact of peer review on quality, and just one addressed the impact of peer review on validity; most of the rest of the studies were concerned with questions like the effect of double-blind reviewing. Furthermore, for the three studies that addressed quality and validity, Jefferson et al concluded that there were other problems with the studies which meant the results were of limited general interest; as they put it, “Editorial peer review, although widely used, is largely untested and its effects are uncertain”.

In short, at least in biomedicine, there’s not much we know for sure about the reliability of peer review. My searches of the literature suggest that we know don’t much more in other areas of science. If anything, biomedicine seems to be unusually well served, in large part because several biomedical journals (perhaps most notably the Journal of the American Medical Association) have over the last 20 years put a lot of effort into building a community of people studying the effects of peer review; Jefferson et al‘s study is one of the outcomes from that effort.

In the absence of compelling systematic studies, is there anything we can say about the reliability of peer review?

The question of reliability should, in my opinion, really be broken up into three questions. First, does peer review help verify the validity of scientific studies; second, does peer review help us filter scientific studies, making the higher quality ones easier to find, because they get into the “best” journals, i.e., the ones with the most stringent peer review; third, to what extent does peer review suppress innovation?

As regards validity and quality, you don’t have to look far to find striking examples suggesting that peer review is at best partially reliable as a check of validity and a filter of quality.

Consider the story of the German physicist Jan Hendrik Schoen. In 2000 and 2001 Schoen made an amazing series of breakthroughs in organic superconductivity, publishing his 2001 work at a rate of one paper every 8 days, many in prestigious journals such as Nature, Science, and the Physical Review. Eventually, it all seemed a bit too good to be true, and other researchers in his community began to ask questions. His work was investigated, and much of it found to be fraudulent. Nature retracted seven papers by Schoen; Science retracted eight papers; and the Physical Review retracted six. What’s truly breathtaking about this case is the scale of it: it’s not that a few referees failed to pick up on the fraud, but rather that the refereeing system at several of the top journals systematically failed to detect the fraud. Furthermore, what ultimately brought Schoen down was not the anonymous peer review system used by journals, but rather investigation by his broader community of peers.

You might object to using this as an example on the grounds that the Schoen case involved deliberate scientific fraud, and the refereeing system isn’t intended to catch fraud so much as it is to catch mistakes. I think that’s a pretty weak objection – it can be a thin line between honest mistakes and deliberate fraud – but it’s not entirely without merit. As a second example, consider an experiment conducted by the editors of the British Medical Journal (ref). They inserted eight deliberate errors into a paper already accepted for publication, and sent the paper to 420 potential reviewers. 221 responded, catching on average only two of the errors. None of the reviewers caught more than five of the errors, and 16 percent no errors at all.

None of these examples is conclusive. But they do suggest that the refereeing system is far from perfect as a means of checking validity or filtering the quality of scientific papers.

What about the suppression of innovation? Every scientist knows of major discoveries that ran into trouble with peer review. David Horrobin has a remarkable paper (ref) where he documents some of the discoveries almost suppressed by peer review; as he points out, he can’t list the discoveries that were in fact suppressed by peer review, because we don’t know what those were. His list makes horrifying reading. Here’s just a few instances that I find striking, drawn in part from his list. Note that I’m restricting myself to suppression of papers by peer review; I believe peer review of grants and job applications probably has a much greater effect in suppressing innovation.

George Zweig’s paper announcing the discovery of quarks, one of the fundamental building blocks of matter, was rejected by Physical Review Letters. It was eventually issued as a CERN report.

Berson and Yalow’s work on radioimmunoassay, which led to a Nobel Prize, was rejected by both Science and the Journal of Clinical Investigation. It was eventually published in the Journal of Clinical Investigation.

Krebs’ work on the citric acid cycle, which led to a Nobel Prize, was rejected by Nature. It was published in Experientia.

Wiesner’s paper introducing quantum cryptography was initially rejected, finally appearing well over a decade after it was written.

To sum up: there is very little reliable evidence about the effect of peer review available from systematic studies; peer review is at best an imperfect filter for validity and quality; and peer review sometimes has a chilling effect, suppressing important scientific discoveries.

At this point I expect most readers will have concluded that I don’t much like the current peer review system. Actually, that’s not true, a point that will become evident in my post about the future of peer review. There’s a great deal that’s good about the current peer review system, and that’s worth preserving. However, I do believe that many people, both scientists and non-scientists, have a falsely exalted view of how well the current peer review system functions. What I’m trying to do in this post is to establish a more realistic view, and that means understanding some of the faults of the current system.

Myth: Peer review is the way we determine what’s right and wrong in science

By now, it should be clear that the peer review system must play only a partial role in determing what scientists think of as established science. There’s no sign, for example, that the lack of peer review in the 19th and early 20th century meant that scientists then were more confused than now about what results should be regarded as well established, and what should not. Nor does it appear that the unreliability of the peer review process leaves us in any great danger of collectively coming to believe, over the long run, things that are false.

In practice, of course, nearly all scientists understand that peer review is only part of a much more complex process by which we evaluate and refine scientific knowledge, gradually coming to (provisionally) accept some findings as well established, and discarding the rest. So, in that sense, this third myth isn’t one that’s widely believed within the scientific community. But in many scientists’ shorthand accounts of how science progresses, peer review is given a falsely exaggerated role, and this is reflected in the understanding many people in the general public have of how science works. Many times I’ve had non-scientists mention to me that a paper has been “peer-reviewed!”, as though that somehow establishes that it is correct, or high quality. I’ve encountered this, for example, in some very good journalists, and it’s a concern, for peer review is only a small part of a much more complex and much more reliable system by which we determine what scientific discoveries are worth taking further, and what should be discarded.

Further reading

I’m writing a book about “The Future of Science”; this post is part of a series where I try out ideas from the book in an open forum. A summary of many of the themes in the book is available in this essay. If you’d like to be notified when the book is available, please send a blank email to the.future.of.science@gmail.com with the subject “subscribe book”. I’ll email you to let you know in advance of publication. I will not use your email address for any other purpose! You can subscribe to my blog here.

More in the Moorcock vein. It's easy to imagine the reaction of the critic, holding their nose at writing to formula. But you can turn that around, regarding Dent (and, more plausibly, Moorcock) as a student and theoretician of structure. And that's a pretty powerful point of view. Of course, word-by-word Dent is a poor […]

Fascinating both intrinsically, and for the commentary. The commentary first: part of the interest is from people who desire an easy way to write (or, more accurately, to have written). But there is also clearly a genuine interest on the part of many: what does this guy know that I don't about storytelling? You may […]

Kevin Kelly interviews Brian Eno. Slow to get going, but fascinating. Eno proposes "process, not product", says that it's his "ease of seduction" that means he often gets things first, talks about putting more "Africa" into computers, and generally makes many interesting comments.

Documentary of Wolfgang Steiner, one of the world's top ski-jumpers in the 1970s. The spine of the documentary is a sequence of extraordinary shots of Steiner's jumps, taken with a pair of high-speed cameras.

Remarkable survey of the cutting edge of surfing. We see the origins of tow-rope surfing (where surfers are pulled by jet skis into waves that are too big to paddle out to), the use of hydrofoil designs that put the board a foot or two _above_ the wave, and even the use of weather stations […]