Can Peer Review be better Focused?

Paul Ginsparg
Cornell University

Abstract:
If we were to start from scratch today to design a quality-controlled archive
and distribution system for scientific and technical information, it could take
a very different form from what has evolved in the past decade from
pre-existing print infrastructure. Recent technological advances could provide
not only more efficient means of accessing and navigating the information,
but also more cost-effective means of authentication and quality control.
I discuss relevant experiences of the past decade from open electronic
distribution of research materials in physics and related disciplines,
and describe their implications for proposals to improve the implementation of
peer review.

There has been much recent discussion of free access to the on-line scholarly
literature. It is argued that this material becomes that much more valuable
when freely accessible [1], and moreover that it is in public
policy interests to make the results of publicly funded research freely
available as a public good [2]. It is also suggested that this could ultimately
lead to a more cost-efficient scholarly publication system. The
response of the publishing community has been that their editorial processes
provide an essential service to the research community, that these are
labor-intensive and hence costly, and that even if delayed, free access could
impair their ability to support these operations. (Or, in the case of
commercial publishers, reduce revenues to below the profit level necessary to
satisfy their shareholders or investors.) Informal surveys (e.g., [3]) of
medium- to large-scale publishing operations suggest a wide range in revenues
per article published, from the order of $1000/article to more than
$10,000/article. The smaller numbers typically come from non-profit operations
that provide a roughly equivalent level of service, and hence are more likely
representative of actual cost associated to peer reviewed publication. Even
some of these latter operations are more costly than might ultimately be
necessary, due to the continued need to support legacy print distribution, but
the savings from eliminating print and going to an all-electronic in-house
work-flow are estimated for a large non-profit publisher to be at most on the
order of 30% [4]. The majority of the expenses are for the non-automatable
editorial oversight and production staff:
labor expenses that are not only unaffected by the new technology but that also
increase faster than the overall inflation rate in developed countries.

A given journal could conceivably reduce its costs by choosing to consider
fewer articles, but this would not reduce costs in the system as a whole,
presuming the same articles would be considered elsewhere.
If a journal instead considers the same number of articles, but
publishes fewer by reducing its acceptance rate, this results not only in an
increased cost per published article for that journal,
but also in an increased cost for the system as a whole, since the
rejected
articles resubmitted elsewhere will typically generate editorial costs at other
journals. Moreover, in this case there is yet an additional hidden cost to the
research community, in the form of redundant time spent by referees, time
typically neither compensated nor accounted.

One proposal to continue funding the current peer-review editorial system is to
move entirely from the subscription model to an "author-subsidy" model, in
which authors or their institutions pay for the material, either when submitted
or when accepted for publication, and the material is then made freely
available to readers. While such a system may prove workable in the long-run,
it is difficult to impress upon authors the near-term advantages of moving in
that direction. From the institutional standpoint, it would also mean that
institutions that produce a disproportionate amount of quality research would
pay a greater percentage of the costs. Some could consider this unfair,
though in the long-term a fully reformed and less expensive scholarly
publication system should nonetheless offer real savings to those
institutions, since they already carry the highest costs in the subscription
model. Another short-term difficulty with implementing such a system is the
global nature of the research enterprise, in which special dispensation might
be needed to accommodate researchers in developing countries, operating on
lower
funding scales. Correcting this problem could entail some form of progressive
charging scheme and a proportionate increase in the charges to authors in
developed countries, increasing the psychological barrier to moving towards an
author-subsidy system. (The other resolution to the problem of unequal
resources -- moving editorial operations to developing countries to take
advantage of reduced labor costs -- is probably not feasible, though it is
conceivable that some of the production could be handled remotely.) A system
in which editorial costs are truly compensated equitably would also involve a
charge for manuscripts that are rejected (sometimes these require even more
editorial time than those accepted), but implementing that is also logistically
problematic.

The question for our scholarly research communications infrastructure is:
if we were not burdened with the legacy print system and associated
methodology, what system would we design for our scholarly communications
infrastructure? Do the technological advances of the past decade suggest a new
methodology that provides greater utility to the research enterprise at the
same or lower cost?

My own experience as a reader, author, and referee in Physics suggests that
current peer review methodology in this field strives to fulfill roles for two
different timescales: to provide a guide to expert readers (those well-versed
in the discipline) in the short-term,
and to provide a certification imprimatur for the long-term. But as I'll argue
further below, the attempt to perform both functions in one step
necessarily falls short on both timescales: too slow for the former,
and not stringent enough for the latter. The considerations that follow here
apply primarily to those many fields of research publication in which the
author, reader, and referee communities essentially coincide. A slightly
different discussion would apply for journal publication in which the
reader community greatly outnumbers the author community, or vice versa.

Before considering modifications to the current peer review system, it's
important to clarify its current role in providing
publicity, prestige, and readership to authors. Outsiders to the system are
sometimes surprised to learn that peer-reviewed journals do not certify
correctness of research results. Their somewhat weaker evaluation is
that an article is a) not obviously wrong or incomplete, and b) is potentially
of interest to readers in the field.
The peer review process is also not designed to detect fraud, or plagiarism,
nor a number of associated problems -- those are all left to posterity to
correct. In many fields, journal publication dates are also used to stake
intellectual property rights (indeed their original defining function [5]).
But since the journals are not truly certifying
correctness, alternate forms of public distribution
that provide a trustworthy datestamp can equally serve this role.

When faculty members are polled formally or informally regarding peer review,
the response is frequently along the lines "Yes, of course, we need it
precisely as currently constituted because it provides a quality control
system for the literature, signalling important contributions, and hence
necessary for deciding job and grant allocations."
But this conclusion relies on two very strong implicit assumptions:
a) that the necessary signal results directly from the peer review process
itself, and b) that the signal in question could only result from
this process. The question is not whether we still need to facilitate
some form of quality control on the literature; it is instead whether
given the emergence of new technology and dissemination methods in the past
decade, is the current implementation of peer review still the most effective
and efficient means to provide the desired signal?

Appearance in the peer-reviewed journal literature certainly does not
provide sufficient signal: otherwise there would be no need to
supplement the publication record with detailed letters of recommendation and
other measures of importance and influence. On the other hand, the detailed
letters and citation analyses would be sufficient for the above
purposes, even if applied to a literature that had not undergone
that systematic first editorial pass through a peer review system.
This exposes one of the hidden assumptions in the above: namely that
peer-reviewed publication is a prerequisite to entry into a system
that supports archival availability and other functions such as citation
analysis. That situation is no longer necessarily the case.
(Another historical argument for journal publication is
that funding agencies require publication as a tangible result of research
progress, but once again there are now alternate distribution mechanisms to
make the results available, with other potential supplemental means of
measuring impact.)

There is much concern about tampering with a system that has evolved over much
of the past century, during which time it has served a variety of essential
purposes. But the cat is already out of the bag: alternate electronic archive
and distribution systems are already in operation, and others are under
development. Moreover, library acquisition budgets are unable to keep pace
even with the price increases from the non-profit sector. It is therefore both
critical and timely to consider whether modifications of existing methodology
can lead to a more functional or less costly system for research communication.

It is also useful to bear in mind that much of the current entrenched
methodology is largely a post World War II construct, including both the
largescale entry of commercial publishers and the widespread use of peer review
for mass production quality control. It is estimated that there are well over
$8 billion/year in revenues in STM (Scientific, Technical, and Medical) primary
publishing, for somewhere on the order of 1.5-2 million articles
published/year. If non-profit operations had the capacity to handle the
entirety, and if they could continue to operate in the $500-$1500 revenue per
published article range, then with no other change in methodology there might
be an immediate 75% savings in the system, releasing well over $5 billion
globally. (While it is not likely that institutions supporting the current
scholarly communications system would suddenly opt to reduce their overhead
rates, at least their rate of increase might be slowed for a while, as the
surplus is absorbed to support other necessary functions.) The commercial
publishers stepped in to fulfill an essential role during the post World War II
period, precisely because the non-profits did not have the requisite capacity
to handle the dramatic increase in STM publishing with then-available
technology. An altered methodology based on the electronic communications
networks that evolved through the 1990's could prove better scalable to
larger capacity. In this case, the technology of the 21st century
would allow the traditional players from a century ago, namely the professional
societies and institutional libraries, to return to their dominant role in
support of the research enterprise.

The arXiv [6] is an automated distribution system for research articles,
without the editorial operations associated to peer review.
As a pure dissemination system, i.e., without peer review, it operates at a
factor of 100 to 1000 times lower in cost than a conventionally peer-reviewed
system [3]. This is the real lesson of the move to electronic formats and
distribution: not that everything should somehow be free, but that with
many of the production tasks automatable or off-loadable to the authors, the
editorial costs will then dominate the costs of an unreviewed distribution
system by many orders of magnitude.
This is the subtle difference from the paper system, in which
the expenses directly associated to print production and distribution were
roughly the same order of magnitude as the editorial costs. When the two were
comparable in cost, it wasn't as essential to ask whether the production and
dissemination system should be decoupled from the intellectual authentication
system. Now that the former may be feasible at a cost of less than
1% of the latter, the unavoidable question is whether the utility provided by
the latter, in its naive extrapolation to electronic form,
continues to justify the associated time and expense. Since many communities
rely in an essential way on the structuring of the literature provided by the
editorial process, a first related question is whether some hybrid methodology
might provide all of the benefits of the current system, but for
a cost somewhere in between the greater than $1000/article cost
of current editorial methodology and the less than $10/article cost of a pure
distribution system. A second question is whether a hybrid methodology might
also be better optimized for the differing needs, on differing timescales, of
expert readers on the one hand and neophytes on the other.

The arXiv was initiated in 1991, before any physics journals were
on-line. Its original intent was not to supplant journals, but to provide equal
and uniform global access to prepublication materials (originally it was only
to have had a three month retention time). Due to the multi-year period
from '91 until established journals did come on-line en masse, the
arXiv de facto took on a much larger role, by providing the unique on-line
platform for near-term (5-10 yr) "archival" access. Electronic offerings have
of course become commonplace since the early 1990's:
many publishers now put new material on-line in e-first mode, and the
searchability, internal reference linking, and viewable formats they provide
are at least as good as those of the automated arXiv. These conventional
publishers are also set up to provide superior services wherever manual
oversight, at additional cost, can improve on the author's product: e.g.,
correcting bibliographic errors and standardizing the front- and back-matter
for automated harvesting. (Some of these costs may ultimately decline or
disappear, however, with a more standardized "next-generation" document format,
and improved authoring tools to produce it -- developments from which automated
distribution systems will benefit equally.)

We can now consider the current roles of the arXiv and of the on-line physics
journals and assess their overlap. Primarily, the arXiv provides instant
pre-review dissemination, aggregated on a field-wide basis, a breadth far
beyond the capacity of any one journal. The journals augment this with some
measure of authentication of authors (they are who they claim to be), and a
certain amount of quality control of the research content. This latter, as
mentioned, provides at least the minimum certification of "not obviously
incorrect, not obviously uninteresting"; and in many cases provides more than
that, e.g., those journals known to have higher selectivity convey an
additional measure of short-term prestige. Both the arXiv and the journals
provide access to past materials; and one could argue that arXiv benefits in
this regard from the post facto certification functions provided by the
journals. It is occasionally argued that organized journals may be able to
provide a greater degree of long-term archival stability, both in aggregate and
for individual items, though looking a century or more into the future
this is really difficult to project one way or another.

With conventional overlapping journals having made so much on-line progress,
does there remain a continued role for the arXiv, or is it on the verge of
obsolescence?
Informal polls of physicists suggest that it remains unthinkable
to discontinue the resource, that it would simply have to be reinvented because
it plays some essential role not fulfilled by any other. Hard statistics
substantiate this: over 20 million full text downloads during
calendar year '02, on average the full text of each submission downloaded over
300 times in the 7 years from '96-'02, and some downloaded in the tens of
thousands of times. The usage is significantly higher than comparable on-line
journals in the field, and, most importantly, the access numbers have
accelerated upwards as the conventional journals have come on-line over the
past seven years. This is not to suggest, however, that physicist users are in
favor of rapid discontinuation of the conventional journal system either.

What then is so essential about the arXiv to its users? The immediate
answer is "Well, it's obvious. It gives instant communication,
without having to wait a few months for the peer review process."
Does that mean that one should then remove items after some fixed time
period? The answer is still "No, it remains incredibly useful as a
comprehensive archival aggregator," i.e., a place where for certain fields
instead of reading any particular journal, or set of journals, one can browse
or search and be certain that the relevant article is there, and if it's not
there it's because it doesn't exist.
(This latter archival usage is the more problematic with respect to the
refereed journals, since the free availability could undercut the
subscription-based financial models -- presuming the author-provided version
is functionally indistinguishable from the later journal version).

It has been remarked [7] that physicists use the arXiv site and do not appear
concerned that the papers on it are not refereed. The vast majority of
submissions are nonetheless submitted in parallel to conventional journals (at
no "cost" to the author), and those that aren't are most frequently items such
as theses or contributions to conference proceedings that nonetheless have
undergone some effective form of review. Moreover, the site has never been a
random UseNet newsgroup-like free-for-all. From the outset, a variety of
heuristic screening mechanisms have been in place to ensure insofar as possible
that submissions are at least of refereeable quality. That means they
satisfy the minimal criterion that they would not be peremptorily rejected by
any competent journal editor as nutty, offensive, or otherwise manifestly
inappropriate, and would instead at least in principle be
suitable for review (i.e., without the risk of alienating or wasting the time
of a referee, that essential unaccounted resource). These mechanisms
are an important -- if not essential -- component of why readers find the site
so useful: though the most recently submitted articles have not yet necessarily
undergone formal review, the vast majority of the articles can, would, or do
eventually satisfy editorial requirements somewhere. Virtually all are in that
grey area of decidability, and virtually none are entirely useless to active
physicists. That is probably why expert arXiv readers are eager and willing to
navigate the raw deposited material, and greatly value the accelerated
availability over the filtering and refinement provided by the journal
editorial processes (even as little as a few months later).

The idea of using prior electronic distribution to augment the referee
process goes back at least to [8]. Proposals along the lines of decoupling
peer review from arXiv distribution can be found in [9], and the notion of
"overlay" journals is further discussed in [6],[10].
A review of various "decoupling" and "author subsidy" models proposed in the
mid to late 1990's, taking advantage of new technology to implement
improvements in research communication, can be found in [11].
(Note, in particular, the "eprint moderator model" [12], intended to reduce
costs by reducing the amount of material distributed in a commercial manner.)
Recent experience in physics and related disciplines continues to reinforce the
desirability of experimentation within this model space, with the expectation
that similar implementations will prove feasible in other disciplines.

According to the observations above, the role of refereeing may be over-applied
at present, insofar as it puts all submissions above the minimal criterion
through the same uniform filter.
The observed behavior of expert readers indicates that they don't
value that extra level of filtering above their preference for instant
availability of material "of refereeable quality." Non-expert readers typically
don't need the availability on the timescale of a few months, but do eventually
need a much higher level of selective filtering than is provided on the short
timescale. Expert readers as well could benefit on a longer timescale (say a
year or longer) from more stringent selection criteria, for the simple reason
that the literature of the past decade is always much larger than the
"instantaneous" literature. More stringent criteria on the longer timescale
would also aid significantly in the job and grant evaluation functions, for
which signal on the year or more timescale remains sufficiently timely. More
stringent evaluation could potentially play a far greater role than
peer-reviewed publication currently does, as compared to external letters and
citation analyses.

Can these considerations be translated into either a more functional or more
cost-effective peer review system? As already discussed, editorial costs
cannot be reduced by adopting a lower acceptance rate on some longer timescale,
but with the same number of submissions considered as currently, and by the
current methodology.
Instead the simplest proposal is a two-tier system, in which on a first pass
only some cursory examination or other pro forma certification is given for
acceptance into a standard tier. This could be minimally labor-intensive,
perhaps relying primarily on an automated check of author institutional
affiliation, prior publication record, research grant status, or other related
background; and involve human labor primarily to adjudicate incomplete or
ambiguous results of an automated pass.

Then at some later point (which could vary from article to article, perhaps
with no time limit), a much smaller set of articles would be selected for the
full peer review process. The initial selection criteria for this smaller set
could be any of a variety of impact measures, to be determined, and based
explicitly on their prior widespread and systematic availability and
citability: e.g., reader nomination or rating, citation impact, usage
statistics, editorial selection, ... . The instructions to expert reviewers
would be similar to those now, based on quality, originality, and significance
of research, degree of pedagogy (for review articles), and so on. The
objective would be greater efficiency by focusing the comprehensive process not
only on a smaller subset, but also that with a higher likely acceptance rate.
These are the articles most likely to be archivally useful, and hence merit the
enhanced editorial treatment for upgrade into the upper tier, including, for
example, text clarifications and other improvements. This would also reduce
the inefficient expenditure of community intellectual resources on articles
that may not prove as useful in the long-term. Upper tier enhancements could
include anything from a thorough blind refereeing to open professional
annotation and comment. The upper tier could also combine commentary on many
related papers at once. The point is that it's possible to provide more signal
of various sorts to users on a smaller subset of articles, without worry about
fairness issues of limited dissemination for the rest, and this can be done at
lower overall cost than the current system, both in time spent by editors and
in elective time spent by referees.

The standard tier would provide a rapid distribution system only marginally
less elite than much of the current publication system, and enormously useful
to readers and authors. Articles needn't be removed from the standard tier, and
could persist indefinitely in useful form (just as currently in the arXiv),
available via search interfaces and for archival citation -- in particular,
they would remain no less useful than had they received some cursory glance
from a referee. Rapid availability would also be useful for fields in which
the time to publication time is perceived to be too long. The standard tier
availability could also be used to collect confidential commentary from
interested readers so that eventual referees would have access to a wealth of
currently inaccessible information held by the community, and help to avoid
duplication of effort. In addition, articles that garner little attention at
first, or are rejected due to overly restrictive policies, only to be properly
appreciated many years later, would not be lost in the short-term, and could
receive better long-term treatment in this sort of system. Various gradations,
e.g., appearance in conference proceedings, would also automatically appear in
the standard tier and provide additional short-term signal occasionally useful
to non-expert readers.

The precise criteria for entry into the standard tier would depend on its
architecture.
Adaptable criteria could apply if it is some federation of institutionally and
disciplinarily held repositories. The institutional repositories could rely on
some form of internal endorsement, while the disciplinary aggregates could
rely either on affiliation or on prior established credentials ("career review"
[13] as opposed to "peer review"). Alternate entry paths for new participants,
such as referrals from prior credentialed participants or direct appeal for
cursory editorial evaluation (not full-fledged peer review), would also be
possible. The essential idea is to facilitate communication within the
recognized research community, without excessive noise from the exterior [9].
While multiple logically independent (though potentially overlapping
[3]) upper tiers could naturally evolve, only a single globally held standard
tier is strictly necessary, with of course any necessary redundancy for full
archival stability. Suitable licensing procedures or copyright retention [2]
to facilitate such a system are consistent with the spirit of copyright law,
"To promote the Progress of Science and useful Arts" (for a recent discussion,
see [14]).

At the second stage, it might also be feasible and appropriate for the referees
and/or editor to attach some associated text explaining the context of the
article and the reasons for its importance. Expert opinion could be used not
only to guide readers to the important papers on the subject, but also guide
readers through them. This would be a generalization of review volumes,
potentially more timely and more comprehensive. It could include both suggested
linked paths through the literature to aid in understanding an article, and
could also include links to subsequent major works and trends to which an
article later contributed. This latter citation tree could be frozen at the
time of the refereeing of the article, or could be maintained retroactively for
the benefit of future readers. Such an overlay guide to the "primary"
literature could ultimately be the most important publication function provided
by professional societies. It might also provide the basis of the future
financial model for the second stage review process, possibly a combination of
subscription (electronic, or even print if desired) and upper tier author
subsidy. It could subsidize part of the cost of the less selective "peer
reviewable" designations in the first stage for those lacking institutional
credentials, perhaps together with a first stage "editorial fee" far smaller
than for the later full editorial process.

As just one partial existence proof for elements of this system, consider for
example the Mathematical Reviews, published by the American Mathematical
Society. It provides a comprehensive set of reviews of the entire mathematical
literature and an invaluable resource to mathematicians. It currently
considers on the order of 100,000 articles per year, and chooses to review
approximately 55,000 of those, at a rough overall effective editorial cost of
under $140 per review [15]. The expenses include a nominal payment to
reviewers, and also curation and maintenance of historical bibliographic data
for the discipline. (Mathematician Kuperberg has also commented that "Math
Reviews and Zentralblatt are inherently more useful forms of peer review" [16],
though observes ironically that their publishers do not share this conviction.)
Mathematical Reviews uses as its information feed a canonical set of
conventional mathematics journals. In the future, such an operation could
conceivably use some equally canonicalized cross-section taken from a standard
tier of federated institutional and disciplinary repositories, containing
material certified to be "of peer reviewable quality." While not all upper tier
systems need to aspire to such disciplinary comprehensivity, this does provide
an indication that they can operate usefully at an order of magnitude lower
cost than conventionally peer reviewed journals.

The modifications described here are intended as a starting point for
discussion of how recent technological advances could be used to improve
the implementation of peer review. They are not intended to be revolutionary,
but sometimes a small adjustment, with seemingly limited conceptual content,
can have an enormous effect. In addition, these modifications could be
undertaken incrementally, with the upper tier created as an overlay on the
current publication base, working in parallel with the current system. Nothing
would be jeopardized, and any new system could undergo a detailed efficacy
assessment that many current implementations of peer review have either evaded
or failed.

Acknowledgements: I thank David Mermin, Jean-Claude Guédon,
Greg Kuperberg, Andrew Odlyzko, and Paul Houle for comments. This text evolved
from discussions originally with an American Physical Society publications
oversight subcommittee on peer review, on which I served in early 2002 along
with Beverly Berger, Mark Riley and Katepalli Sreenivasan.

[4] This estimate is for the American Physical Society, which publishes over
14,000 articles per year, and derives from figures discussed with its
publications oversight committee.
The percentage estimated for other publishing operations will vary,
especially when editorial time and overhead is differentially accounted.
In the discussion that follows, however, it matters only that there will be
no windfall savings to publishers from going all-electronic, while
employing the same overall labor-intensive methodology.