Navigation

User login

The Scholarly Journal in Transition and the PubMed Central Proposal

Submitted by editor on 23 September 1999 - 12:00am

Michael Day discusses the scholarly journal in transition and the PubMed Central proposal.

In my opinion, there is no real question that completely paperless systems will emerge in science and in other fields. The only real question is "when will it happen?" We can reasonably expect, I feel, that a rather fully developed electronic information system ... will exist by the year 2000, although it could conceivably come earlier. F. Wilfrid Lancaster (1978) [1]

Predicting the future is very dangerous. Wilfrid Lancaster's 21 year old comment may seem unduly optimistic when it mentions the arrival of 'completely paperless' systems, but the Internet and the World Wide Web would appear (almost) to be his 'fully developed electronic information system'. The Internet is increasingly being used for scholarly and scientific communication and it is often suggested that its use for the dissemination of research findings may mean the end of the traditional printed scholarly journal.

An important debate on these issues was triggered when in March 1999, Harold Varmus, Nobel Prize winner and director of the US National Institutes of Health (NIH), began to circulate a proposal for an online service that would give free access to published material in the biomedical sciences. A more detailed proposal was published in April and appeared on the NIH Web site on the 5th May, with an addendum published on the 20 June [2]. Following extensive discussion and debate, a service to be known as 'PubMed Central' was announced on the 30 August [3].

This paper will take a brief look at the development and role of printed journals and how developments in electronic communication may effect this. It will then look in more detail at the NIH proposal and some of the objections it received. The paper will end with some discussion of the role of libraries in this new environment.

Journals and scientific communication

Historical background

It must first be noted that printed journals have played a distinguished role in scientific communication for over three hundred and thirty years. The first scientific periodicals (the Journal des sçavans and the Philosophical Transactions of the Royal Society) were first published in 1665 and provided a practical way for scientists of the day to communicate with each other. Throughout the eighteenth and early nineteenth centuries, the nature of journals slowly changed, resulting in a relative decline in the importance of learned society proceedings, and the successful creation of more specialised journals, reflecting the fragmentation of knowledge into more specialised disciplines [4]. By the end of the nineteenth century, features like the peer-review of submitted papers had begun to become standard in many disciplines. Scholarly and scientific papers published in peer-reviewed journals remain what Maurice Line calls "the established medium of record and dissemination" [5].

The role of printed journals

Despite this, there is a growing feeling that in the Internet age the printed journal may not be able to survive as a primary means of scholarly communication for much longer. The Internet offers, for example, an extremely convenient and efficient way to disseminate scholarly papers to their audience.

Research projects back in the 1980s first proved the technical feasibility of electronic journals [6]. However, they failed to 'take-off' at the time because computer networks were immature, user interfaces were poor and developers only had a vague idea about the different functions that the printed journal carries out. Too much time was spent considering the integration of new interactive or multimedia features rather than considering the real reasons why authors publish in existing journals, and why libraries and other consumers are prepared to spend large amounts of money on them. For example, Cliff McKnight has recently commented that unless readers (and by extension authors) can do "at least the same things - and preferably more - with electronic journals as they do with paper, what incentive is there for them to change?" [7]. We need, therefore, to remind ourselves why the printed journal has been such a successful part of scholarly communication. It is, in part, because the peer-reviewed journal has fulfilled a number of different requirements. The following list is based on those identified by Fytton Rowland in an Ariadne article published in 1997 [8].

Dissemination - publication of a paper in a peer-reviewed journal allows an author to disseminate important research findings to the wider research community and beyond. It is important to recognise, however, that the content of many published papers will previously have been discussed informally, reported on at conferences and distributed as pre-prints.

Quality control - editorial processes (when consistently applied) can help to ensure a high written standard of papers, but the core quality control process is the peer-review of all submitted papers.

Establishing priority - one of the most important functions of the printed journal, especially in the STM disciplines, is to be able to establish priority over a particular discovery or advance. Jack Meadows considers this to be the 'basic motivation' of authors in many cases, and much more important than being read or cited by their peers [9]. Frank Close in his book on the 1989 cold fusion controversy agrees [10].Usually in science there is a great pressure to be first, to win the race and gain the honour of discovery. That honour requires acceptance by the community of science which in turn needs refereed publication of all the details necessary for the successful replication of the discovery by other scientists. Only then will the claimed discovery be agreed upon and the credits come your way. All research is geared towards eventual publication; gaining funding to support your research is arguably the only venture whose urgency approaches that of getting the results onto paper and staking a claim to priority.

The recognition of authors - in addition to being able to establish priority, authors would also value publication in refereed-journals as a means of raising their profile and (hopefully) to be able to gain further research contracts or promotion.

The creation of a public domain archive - once published, journal papers are (by definition) in the public domain and research libraries can collectively act as a distributed archive, preserving the knowledge embodied in them for current and future scholars.

The peer-reviewed printed journal currently fulfils most of these needs rather well. Printed journals are particularly good for helping establish priority and for creating an archive available for long term use. Any electronic communication system developed to replace the printed journal will have to take account of all of these requirements.

The end of printed journals?

There are still those who argue, like Lancaster, that the distribution of the vast majority of scholarly and scientific information will soon go completely electronic. The Internet has had a great impact on scholarly communication, and the decline of the traditional printed journal is being predicted once more [11]. In 1995, for example, the mathematician Andrew Odlyzko argued that most printed journals would be likely to disappear within the next ten to twenty years. He predicts that scholarly publishing will soon move to electronic delivery mechanisms for two reasons - the "economic push of having to cope with increasing costs of the present system and the attractive pull of the new features that electronic publishing offers" [12]. Many journal publishers are aware of these factors and have begun to set up Web-based services that give access to electronic versions of existing printed journals, often using the Portable Document Format (PDF) which maintains the typography and layout of the printed version. For some more radical proponents of electronic communication, this does not go far enough. They ask why the status quo in paper journals should simply duplicate itself in the new medium [13]. They see self-publishing (sometimes called self-archiving) through the Internet as a means of returning the responsibility of ownership and distribution of scholarship to its creators [14].

The most prominent proponent of this idea is Stevan Harnad, director of the Cognitive Sciences Centre at the University of Southampton. One of Harnad's key assumptions is that when scholars and scientists publish in peer-reviewed journals they are not primarily interested in monetary reward - which in any case would probably not be forthcoming - but in having their work read, used, built-upon and cited [15]. In the 'Gutenberg era', authors had to perpetuate what Harnad calls a 'Faustian bargain', made between authors and commercial publishers, whereby authors trade the copyright of works in exchange for having them published [16]. He argues that this type of bargain made sense when publishing remained a exclusive and expensive domain, but has no relevance in the electronic era when scholars can publish their own papers at little or no personal cost. In addition to the benefits of improved accessibility, an increased speed of publication and possible financial savings, Harnad suggests that network publication would enable authors to interact better with their peers, for example published articles could be open to immediate comment and response, i.e. what has been characterised by the term 'scholarly skywriting' [17]. In order to facilitate the post-Gutenberg era, Harnad, Odlyzko and others have formulated what they refer to as a 'subversive proposal' to bring down the 'paper house of cards' [18]. They suggest that all authors of non-trade works should make available the texts of all current papers on the Internet and that readers would rapidly form the habit of accessing the free electronic version of a paper rather than a more expensive paper version published much later [19].

The most frequently cited model of the 'subversive-proposal' in action is the 'e-print archive' set up by Paul Ginsparg at the Los Alamos National Laboratory (LANL) [20, 21]. Ginsparg's original service, which went online in August 1991, gave electronic access to pre-prints in the domain of high-energy physics. It very quickly became the primary means of scholarly communication in its subject area and has since expanded to cover the whole of physics, mathematics and computer science. An physicist was quoted in 1994 as saying that the archive had completely changed the way people in the field exchanged information: "the only time I look at the published journals is for articles that predate the Los Alamos physics databases" [22].

The original NIH proposal

It was partly the success of Ginsparg's physics e-print server that inspired the original NIH proposal. Varmus also shares Harnad's assumption that the publishing that scientists do is quite different from trade publishing [23]

... what scientists are about is generating results, typically paid for by public or private funders; the scientists' objective is to get results of the research seen by as many people as possible. We have no interest in making money by publishing our results. We want to get them out where everyone sees them. It's good for our careers, it's good for the development of our science in our community, and it's good for achieving our ultimate goal of learning more about biological systems and achieving progress against disease.

The initial NIH draft was entitled E-biomed: a proposal for electronic publications in the biomedical sciences, and suggested that the NIH, through the National Center for Biotechnology Information (NCBI), should "facilitate a community-based effort to establish an electronic publishing site". Varmus wrote:

In the plan we envision, E-biomed would transmit and maintain, in both permanent on-line and downloaded archives, reports in the many fields that constitute biomedical research, including clinical research, cell and molecular biology, medically-related behavioral research, bioengineering, and other disciplines allied with biology and medicine.

A core part of the proposal was that submission to E-biomed could take two different paths, one with a formal peer-review process, the other with minimal review - a bit like the Los Alamos e-print archives.

With peer review - authors would submit reports electronically into the E-biomed server, requesting peer-review by the editorial board of a particular co-operating journal. If the paper were accepted for publication, it would immediately be made available through the E-biomed repository and afterwards from the publishers. If the paper is not accepted however, the authors could either re-submit it to another editorial board or accept its publication in some other form.

Without peer review - authors would submit reports directly into the E-biomed 'general repository'. Instead of a formal peer-review, each report would only require approval by 'two individuals with appropriate credentials'. These credentials would be established by the E-biomed Governing Board and would "be broad enough to include several thousands of scientists, but stringent enough to provide protection of the database from extraneous or outrageous material".

Access to the archived reports in the server would be available immediately to any user with Internet access and would be searchable by a single search engine. It was also hoped that the proposed service would contribute to the development of new ways of presenting the results of research, help speed up the dissemination of information and help to reduce the current high cost of scientific journals. In the initial proposal, it was suggested that authors would retain copyright to reports deposited in the repository. The proposal was also careful to stress that the NIH would only provide financial, technical and administrative support for E-biomed, and that it would neither 'own nor operate' the repository [24].

The NIH proposal was intended to stimulate wider discourse about the effective use of electronic communication methods for the dissemination of the results of biomedical research. It immediately generated a large amount of comment, some of it supportive [25, 26], some of it very critical.

Reactions to the proposal

Criticisms of the proposal concentrated on four main issues:

The importance of peer-review.

There was a particular concern about the quality of papers that would be submitted to and published in the non peer-reviewed 'general repository' part of E-biomed. One US-based biochemist was quoted as saying that the repository would "inevitably become a massive repository of taxpayer-supported junk" [27]. Arnold S. Relman, a former editor of the New England Journal of Medicine, wrote an important editorial in that journal describing the NIH proposal as a "potential threat to the evaluation and orderly dissemination of new clinical studies". Acknowledging the journal's self-interest, the editorial argued that there were basic differences in the publication needs of clinical medicine and other scientific fields and suggested that the NIH proposal might threaten the public interest [28].

The best way to protect the public interest is through the existing system of carefully monitored peer review, revision, and editorial commentary in journals, and by timing public disclosure to coincide with the date of journal publication. Mistakes, inaccuracies, and misinterpretations in clinical research pose a far greater risk to health and the public welfare that do errors in basic-science research.

In an e-mail response to Relman, Harnad considers these objections to E-biomed to be unjustified. He points out that papers submitted to the refereed sector of E-biomed will still be peer-reviewed and that there is no reason why simultaneous editorial commentary could not also be arranged, if it is required. The difficulty with timing public disclosure to coincide with the date of journal publication is described as 'nonsense' and "the unfortunate retardation of a bygone papyrocentric era" [29].

If it is not merely a reflexive bid to safeguard journal primacy and revenue ... then it is merely an expression of a superstitious adherence to completely irrelevant and obsolete features of the print-on-paper era for journal publication.

However, some of the most prestigious general medical journals have had long-standing rules forbidding the prior release of any information before publication. The New England Journal of Medicine, for example, has a policy known as the Ingelfinger Rule (named after the editor who promulgated it) that will not permit the publication of a manuscript whose substance has already been submitted (or reported) elsewhere. The rule is intended to prevent the journal being 'scooped' by its rivals and uphold the conventions of the biomedical research community by promoting the orderly dissemination of information. Franz Ingelfinger commented that the average researcher is not too pleased if a competitor bypasses these conventions "to gain publicity, and perhaps priority, by presenting unpublished results at press conferences or by interviews with reporters" [30]. The New England Journal of Medicine exempts from the rule, however, all presentations made at scientific meetings and all published abstracts, although it discourages the prior dissemination of any more detailed information than was originally presented [31].

This discouragement of prior-publication persists in the Internet age, especially with regard to non-refereed pre-prints. Back in 1995, a New England Journal of Medicine editorial stressed that any study is incomplete unless it has been peer-reviewed, revised accordingly and published, and that this applies equally to electronic pre-prints [32].

Thus, posting a manuscript, including its figures and tables, on host computer to which anyone on the Internet can gain access will consititute prior publication.

This position has been subject to some criticism. Ronald LaPorte and Bernard Hibbitts claim that it interferes with scientists' rights to do what they like with their work, even before the copyright of the article has been transferred to the publisher [33]. It certainly may impede the development of a successful eletronic pre-print culture in the biomedical sciences. A recent BMJ editorial, however, took a more moderate stance than its Boston-based counterpart - suggesting that electronic pre-prints might be analogous to conference presentations or abstracts. Tony Delamothe suggested that the existing exceptions applying to these forms of dissemination might be extended to apply to pre-prints [34].

Varmus addressed the quality issue in his addendum of the 20 June, which acknowledged that the non-reviewed component of E-biomed might tempt some to disseminate "information of marginal value or accuracy", but that "few scientists would knowingly put such information into the public domain, because it would soon diminish their reputations".

The role of the NIH

Despite Varmus's assurance that the NIH would only act as a facilitator of a community-based effort, some criticism of the proposal centred on what was seen as the extension of the power and influence of the US Federal Government. The editor of Science, Floyd E. Bloom, asked whether "a monopolistic archive under government control by the major research funder enhance scientific progress better than the existing journal hierarchy?" [35]. Others thought that the NIH's proposed role was "intrusive" [36]. Michele Hogan, executive director of the American Association of Immunologists, commented that the NIH "could become the sole supplier of scientific content, and would have to assume responsibility for the publication peer-review exercise", thus the community would "lose peer-review independent of the government" [37]. Varmus responded to this criticism by noting that the proposed system would not be owned by the NIH and that it would only work if the international scientific community was "broadly represented in its operation and governance". Harnad has further noted that there will be no monopoly [38].

Multiple journals - indeed the entire hierarchy that currently exists - will continue to exist for authors and readers. Nor will it be government controlled. (As always, quality will be controlled by peer reviewers, who, like the authors, do their work for free! ...)

Competition with existing journals

Much criticism of the NIH proposal centered on its potential to compete with and undermine existing journals. A report in Nature noted that several observers had said that it "might create an unhealthy monopoly, erode the diversity of existing journals, and reduce competition between journals for the best papers" [39]. Varmus responded by saying that he would encourage journals, and especially those "with strong reputations for rigorous reviewing and careful editing, to become part of the system".

Finance

Some of the most important criticisms of the E-biomed proposal concerned its vagueness on financial matters [40]. Suggestions that the repository would undermine the viability of scientific societies who currently publish journals (and depend on their income) met with the response that they should look for other ways of generating funds [41].

... societies should not be seen as slowing a revolution in publishing that could make all journals more accessible.

As for the questions of how much the repository would cost and who would pay for it, the proposal was still vague. If access to the submitted reports were to remain free at the point of use, payment would have to be through some other mechanism, possibly from fees levied on authors. Varmus suggested that:

One straightforward strategy would be the imposition of fees for authors - perhaps a small fee at the time of submission and a larger one at the time of acceptance.

PubMed Central

At the end of August, a revised proposal for a service now to be known as 'PubMed' Central emerged from NIH. PubMed Central will be integrated into the existing PubMed biomedical literature database but will form just one component of an expanded electronic repository for the life sciences. The repository is due to 'go live' in January 2000 [42].

The PubMed Central announcement took into account some of its critics concerns and recognised the legitimacy of publishers' concerns. While the original draft proposal suggested that reports would be added to the repository immediately upon their acceptance, the PubMed Central announcement stated that the submission of content "can occur at any time after acceptance for publication, at the discretion of the participants". It was also less radical on copyright, effectively side-stepping this difficult issue by saying that copyright would "reside with the submitting groups (i.e. the publishers, societies, or editorial boards) or the authors themselves, as determined by the participants".

The role of libraries

Initiatives like PubMed Central are very attractive to libraries if they result in reduced spending on journal subscription and processing costs and if they aid the 'free' dissemination of information. However, they also provide new challenges. Like publishers, libraries are merely intermediaries in the scholarly and scientific communication process. If scholars and scientists are able to devise successful electronic communication systems that effectively bypass the library, surely there will be very little left for them to do?

However, this is being overly pessimistic. For one thing, the increased use of electronic technologies for the dissemination does not mean that printed information will immediately disappear. Odlyzko points out that printed collections will still need to be maintained at least until their eventual digitisation, and that this process is likely to take a long time [43]. In any case, while scholarly journals may indeed go electronic; other forms of printed information are likely to survive for a long time to come. The historian Robert Darnton, for example, expresses the opinion that electronic technology will act as a supplement to, and not a substitute for Gutenberg's invention [44].

Libraries will probably also have a role in acting as gatekeepers for a range of electronic information services, including journals. For example, libraries (or library consortia) have had to get used to negotiating electronic content licences with publishers rather than just purchasing hard-copy [45]. This type of activity is likely to become even more important.

Resource discovery and metadata

One area where libraries and other information professions will have an interest is in ensuring that the wide variety of scholarly publications available online can be discovered and accessed by users. PubMed Central will only be one part of a much wider information landscape, even within in the field of biomedicine. While Varmus is confident that all reports filed in PubMed Central would be searchable by a single search engine, it might be desirable to create new services that combined searches of PubMed Central with library catalogues and Internet information gateways. In any case, a PubMed Central search engine would need to give access to a extremely large and constantly growing resource. The American Society for Investigative Pathology has suggested that "effort and attention be directed to novel approaches to create search engines that will assist researchers in navigating the material" [4].

Long-term preservation

Another potential role for research libraries, albeit a more problematic one, would be to help ensure the long-term preservation of the papers stored in services like PubMed Central and the Los Alamos e-print archives [47]. We have seen that with printed journals, libraries collectively act as a distributed archive, preserving the knowledge embodied in them. Varmus has said that the danger of losing vast quantities of published data from PubMed Central would be remote as the service would be "mirrored", and would be backed-up on tape, CD-ROMs or 'long-lived paper' - at more than one site. At first sight this appears secure, but without any stronger institutional commitment to preserving the material submitted to PubMed Central - NIH after all are only supposed to be contributing technical assistance and financial support - and this may not be enough. At the very least, repositories like PubMed Central and e-print servers should adhere to best-practice in ensuring the long-term integrity of their service.

Conclusions

The NIH proposal for a PubMed Central service has certainly focussed the biomedical community's mind on the potential of electronic networks for the dissemination of information. The debate that followed the publication of the draft proposal, however, revealed serious divisions within the community and important concerns about its implications for the quality of published papers. The debate also demonstrates how cautious one should be when equating the practices of physicists (with regard to their use of the Los Alamos e-print archives) and other disciplines. As Edward Valauskas has noted, "comparing scholarly communication in the fast-paced world of high-energy physics to the mere academic deliberations of humanists, social scientists, and non-physics scientists is dangerous" [48]. It will be interesting to wait and see what effect PubMed Central will have upon scholarly communication in biomedicine.

5. Maurice B. Line, 'The publication and availability of scientific and technical papers: an analysis of requirements and the suitability of different means of meeting them.' Journal of Documentation, 48 (2), June 1992, pp. 201-219.

18. Ann Okerson and James O'Donnell (eds.), Scholarly journals at the crossroads: a subversive proposal for electronic publishing. Washington, D.C.: Association of Research Libraries, 1995.<URL:http://www.arl.org/scomm/subversive/index.html>

33. Ronald E. LaPorte and Bernard Hibbitts, 'Rights, wrongs, and journals in the age of cyberspace: "We all want to change the world"' British Medical Journal, 313, 21-28 December 1996, pp. 1609-1611.<URL:http://www.bmj.com/cgi/content/full/313/7072/1609>