Prepublication: Everybody’s Doing It?

Imagine for a moment this conversation between a senior graduate student and his dissertation adviser:

“Everybody’s doing it. Physicists and computer scientists do it all the time. And even Carol Greider has done it, and she’s a Nobel laureate.”

“Yes,” his adviser from her work, “she is a Nobel laureate; she can take that risk. But, I don’t have tenure, and I am still working on my first NIH grant. You don’t have a degree yet. None of these things—your PhD, the grant renewal, my promotion—come without publications in a peer-reviewed journal, and most peer-reviewed journals in our field, at the least the ones that count for grant renewals and promotion, don’t allow publication of previously released data.”

“But why let the publishers decide what is good science—why not let the scientific community decide and crowd source the review?”

“I agree, but I also want a future. We write the paper and submit it. So do your homework, let’s go to a journal with a short turnaround time, open review, and a reputation for publishing good science.”

Open Data and the Biological Sciences

The debate over prepublication in biology is raging. Prepublication is the standard in physics, computer science, math, and economics to get results publicly available quickly for scientific commentary, and it doesn’t seem to interfere with career advancement and grant renewals. Is there a good reason that the same practice isn’t followed in the life/biological sciences?

For the last decade or so, data sharing Web sites and life scientists who are active participants in the online community have been driving the conversation about open access and open science. The web site figshare was launched in 2012 to allow users to upload data in any format to be viewed in a web browser. Each piece of data is citable, and users retain ownership. The first publisher client to join figshare was Faculty1000, and figshare also provides the infrastructure behind PLOS and other major journals. They are committed to the open science movement and serve all fields, not just the life sciences. Other data sharing services include: GitHub (software) and Zenodo.

A preprint site for the biological sciences called bioRxiv (pronounced “bioarchive”) was first started in 2013 and hosted on a Cold Spring Harbor server. Since that time it has published 3,100 preprints, but the preprint movement and this site recently gained momentum from a February 2016 meeting of ASAPbio (Accelerating Science and Publication in Biology) (1,2) which was attended by researchers, publishers, and funders to address barriers to the use of preprints in biology. It is this site that has received the support of Nobel laureates in biology who have published preprints of their data (3) and recently grabbed headlines in the popular press (3).

To prepublish or not: That is the question

The arguments for quickly making data publicly available are wide ranging. First, scientists argue that publicly-funded science should be publicly available as soon as possible and not hidden behind a journal pay wall (3,4). In cases of a public health crisis, some researchers even propose that data should be available in real time. For instance, David O’Connor at the University of Wisconsin-Madison, is making his Zika research data available in real time. If other researchers who are actively designing experiments can design better experiments or ask better questions because his data sets are available, he says, then maybe solutions to major health crises might be more quickly achieved (4). However others, like Andrew Miller who is a publisher of journals at Elsevier, are concerned about public access to unreviewed, unvetted public health data (3). What’s to prevent media outlets or people with an agenda from picking up one piece of unvetted data, out of context and making wild, unsubstantiated claims? The same argument can be asked of data that are difficult to interpret. How will public policy makers react when they see the process of science unfolding on a web site, with scientists debating the finer points of a data set to reach the larger truth of a complicated system (think climate and herd immunity)? (This author asks: How is this different from what happens today?)

Would crowd-sourcing science brain power be a benefit of prepublication?

Another argument for open prepublishing is that crowd-sourcing the brain power of the scientific community improves the science that eventually does get published. Advocates of the traditional peer-review process say that peer review accomplishes that, but scientists argue that peer review has issues. For one, peer review doesn’t catch everything. There are some notable examples of poor science that has been published in peer-reviewed journals: Andrew Wakefield’s work on vaccines using fraudulent data sets and unethically obtained samples comes immediately to mind (5). Certainly the arsenic bacteria report would have benefited from feedback from the larger community (6,7) before publication.

Some scientists are concerned that making their hypotheses, data and results available as prepublications will lead to being scooped or losing intellectual property rights. And many scientists cite the policies of certain journals as a barrier to peer-reviewed publication. Even here at Promega we discourage customers from sharing data in our online publications if they think they will later need those data for a peer-reviewed publication.

Further, many young researchers feel that promotion and funding are tied not just to peer-reviewed publication, but to peer-reviewed publication in journals of a certain “ilk”. Many of those so called “top tier” journals forbid prior publication of data before submitting work to them (3,4). If we are going to encourage open science, then judging the actual science published rather than the place it was published will need to become the cultural norm.

If you build it, will they come?

Maybe not. When I was researching in preparation to write this article, one thing I noticed was that many prepublication advocates talk about “everybody’s doing it,” but they never mention chemists. So I did a very unscientific search for “chemistry prepublication.” I didn’t get much. In 2000, a prepublication server for chemistry was launched by Elsevier (8). There is one article assessing the effectiveness of this preprint service for chemistry, but it was behind a pay wall, and did not come in by the time of publication of this blog. The Elsevier preprint service for chemistry is now defunct:

“Despite their wide readership, the Chemistry, Maths and Computer Science research communities did not contribute articles or online comments to the Preprint service in sufficient numbers to justify further development. Consequently on the 24th of May, 2004 the three Elsevier Preprint Servers–Chemistry, Math and Computer Science–stopped accepting new submissions to their sites. The current site is now a freely available and permanent web archive for those research articles already submitted to the Preprint Servers” (9)

So apparently people were anxious to read preprints, but not contribute them.

What about the overall time to publication?

New technologies have improved efficiency and transparency in publishing.

Still, the publication process for any manuscript can be a painful one, and researchers have complained about wait times of over a year between submission of a manuscript and publication (10). In work for Nature, Daniel Himmelstein, looked at submission and publication dates in the PubMed database to see what the trends are for publishing times for scientific manuscripts. The news is actually good. Technology and the internet era seem to have ushered in some efficiency and transparency. Publication delays have been cut approximately in half since the early 2000s (10). These gains however seem mostly to be in the acceptance-to-publication end of the process, with the “journal shopping” to get an article reviewed and the peer-review process still being lengthy. However there are some journals that are addressing the peer review process. Journals that belong to the PeerJ family, have an open peer review policy in which reviewers names and comments are posted alongside the articles. According to Himmelstein these journals have a relatively short, 74-day median review time for manuscripts (10). The eLife journals promise to make editorial decisions within days, and they counsel their reviewers not to ask for new experiments unless those experiments can be performed within 2 months.

What do you think?

There is pressure on publishers and societies to make the publication process more friendly to scientists. Could a process like prepublication be added to the front of the review process, where the scientific community at large could comment, perhaps improving the quality of submitted manuscripts and the speed of publication?

It will be interesting to see if bioRxiv continues to see its submissions grow and if traditional peer-reviewed journals embrace the prepublication process as a means to shorten the peer review process and increase the quality of the science they publish.

So, readers: What do you think? Would you be willing to prepublish your data sets on an open science server like bioRxiv or figshare? Let us know in the comments.

**If you want to follow more discussion on this topic, the twitter hashtag #asapbio provides some snippets of the latest conversations for you.

Michele Arduengo

Michele earned her B.A. in biology at Wesleyan College in Macon, GA, and her PhD through the BCDB Program at Emory University in Atlanta, GA. Michele is the social media manager at Promega and managing editor of the Promega Connections blog. She enjoys getting lost in a good book, trumpet playing, knitting, and snowshoeing.