Let’s Make Open Access Work

This is a blog post that will please no one. That is not the intention; I am not writing it to pick fights. But the topic is open access (OA), and on this topic, fights inevitably erupt; it is scholarly communications’ equivalent of the Culture Wars. For my part, I stand with Voltaire: The perfect is the enemy of the good. Already in the background I can hear advocates of perfection beginning to sharpen their swords.

So, without reference to the many arguments on all sides of the matter, How can we make OA work?

For starters, we need a software platform. Initially this need not be complex, but it must scale to handle large amounts of material, and that material cannot be restricted to text or PDF files. The platform might be a huge repository. Or it may be several repositories, all of which are indexed by various means, including the most important one today, Google Web Search. Qualified researchers may upload their work (which we still call “papers”) to this repository. We have models for such repositories now at many universities, but the scale suggests another model, the huge “cloud computing” services of companies such as Google, Scribd, and Amazon, among others. It is the miracle of Moore’s Law that makes these data centers possible at little or no cost to the user. Consider for a moment that Google now permits a user to upload any document for online storage to the Google Docs application — for free. That’s the kind of platform we require.

Second, unless we are willing to have the repository filled with junk, spam, and reckless outpourings, we need some way to filter papers that are uploaded. In traditional publishing, the answer to this is simple: have editors, including peer reviewers, assess papers prior to publication. But this is costly and time-consuming; indeed, one of the reasons arXiv came into existence was to speed up the process. Whatever the many merits of peer review, such review prior to publication may slow down the dissemination of ideas, and speed should be one of the goals of any OA service.

Thus, how to determine what can be posted and what cannot? The answer is to formalize some of the policies that are now in place at many major universities, policies that I call “provostial publishing.” Unlike traditional publishing, where editors review each paper for publication, provostial publishing is a means to determine which authors can post to the repository. The requirement: the author must be affiliated with or sponsored by an established institution. Thus both Harvard and MIT have mandated that faculty deposit their papers into an OA repository. No one is reviewing those papers beforehand; it’s enough that the authors have achieved a position on the faculty. Whereas editors select papers, provosts select authors.

Provostial publishing is a means to assert a baseline level of quality control for what would otherwise be open to massive abuse and “data dumping.” We want OA to be open, but we don’t want it to be foolish. So, for example, I have experience in the world of publishing and digital media and may be permitted to deposit papers in a repository in those areas. (This, by the way, is precisely how Scholarly Kitchen operates.) But suppose I were to hanker, say, to present my grand theory of cognitive science. I have no credentials in the field, no doctorate, no research record. My paradigm-busting paper on cognitive science would not have the blessing of a provost or other sponsor and would thus not be entitled to a place on the repository’s servers. Similarly, a cognitive scientist with no experience in publishing, who has not gone through the years of apprenticeship, would not have access to deposit documents in a repository dedicated to publishing matters. This is not a free speech issue. The Web abounds in venues, but we needn’t open all services to anyone who comes along.

A large collection of papers openly available to anyone to read creates its own set of problems: Which papers are worth paying attention to? After all, not all of the provost’s picks get it right 100% of the time. Our OA service needs a form of post-publication peer review. And here we are quite fortunate, as the current crop of Web 2.0 services presents plentiful models for online commentary. Professor Jones, who is a member of the faculty of Ultimate U., deposits her paper, “Specific Qualities of the Generic,” in an OA repository. Indexed by Google and other services, the paper quickly comes to the attention of people working in the field, who post their critiques alongside Jones’s document. The software enables comments and comments on comments; the paper lives with its commentary all around it — in one virtual place, for the convenience of all researchers.

It will be noted that some papers will attract a great deal of commentary and other papers none at all. This is as it should be. Papers with important information will be cited repeatedly, which in turn will give them higher search engine ranking and bring other readers to them (a process known as “the law of increasing returns“).

Our OA service has thus put an end to one of the most inefficient aspects of traditional publishing: whether a paper is good or bad, it costs the same amount of money to put it through the editorial review process. Post-publication peer review aligns effort and cost with the quality of the material.

This leads us to the economics of our new OA service. By switching from pre-publication peer review to post-publication peer review — and placing a big bet on the utility of search engines — we have shifted the large, ongoing costs of managing a publishing operation to a one-time investment in the software platform, which enables the deposit and review of papers. Some current OA services charge large fees to authors, but this is because they are clinging to the editorial model of traditional publishing. The combination of Provostial Publishing, cloud repositories, and post-publication peer review drives the cost of scholarly communications down and down. Recall that you can upload anything to Google Docs. If the platform is properly designed, the marginal cost of adding new papers and commentary approximates zero.

To finance this service (putting aside the start-up costs), an author would pay to have his or her paper deposited in the OA repository. We don’t know how much to charge, but we know the formula: the number of papers multiplied by the deposit fee per paper must exceed the ongoing operating costs. If those operating costs are, say, $1 million a year (it is simply astounding to see how little it costs to operate cloud-computing services once the underlying platform is built) and we anticipate that 1,000 papers will be deposited each year, the minimum cost to each author would be $1,000. If we forecast 10 million papers, an author would be charged $.10. It may be that we will find that the administrative cost to collect such small sums hardly make it worth the effort. Perhaps upon being granted an advanced degree, a prospective author would simply write a check for $50 for lifetime deposit fees, subject always to the sponsorship of a provost or provost’s proxy.

One policy that I would strongly urge upon any OA service is to invite commercial exploitation, provided that such commerce does not require that any of the research material go behind a pay wall. This recommendation runs counter to the common practice of stipulating that there will be no commercial use of the material. Capital, however, makes things happen. We cannot tell in advance what new services entrepreneurs will come up with, but we are likely to find exciting new add-on capabilities for the OA content. It is one thing to insist that all research content be OA, quite another to banish financial incentives altogether. The scholarly community would benefit from harnessing the profit motive to the aims of researchers.

What’s not to like about this model, which I contend could become economically sustainable in a short time? Or I could say, What’s to like? This model incorporates many of the innovations of services such as arXiv, PLOS One, DSpace, and BMC, but it does not deliver everything that we expect when we turn to an established journal. The key to this model is to substitute information technology for human-mediated editorial activity and the investments in branding that go with it. Perhaps the trade-offs are too great. On the other hand, this kind of development may be inevitable, at least in part, for some disciplines. Once established, such services may go through a process of continual improvement. If they don’t satisfy the needs of the research community, they will disappear.

Joseph Esposito

Joe Esposito is a management consultant for the publishing and digital services industries. Joe focuses on organizational strategy and new business development. He is active in both the for-profit and not-for-profit areas.

Discussion

36 Thoughts on "Let’s Make Open Access Work"

This is fun Joe (I design organizations for a living). There are several obvious problems worth discussing. For example, I don’t think your post publication peer review mechanism works, because journal acceptance and/or citation play a fundamentally different role from Web commentary.

Journal acceptance certifies the importance of the paper. Citations are part of explanation, first of the problem, where one’s primary precursors are cited, including the founders of the problem field. And then of the methods and data, where one cites the sources one is building upon. Thus citations are endorsements of the importance of prior work, which s why they are used to measure importance.

Commentary tends to play the opposite role. In many cases comments are objections (like this one). Moreover, comments take on a life of their own, spawning debate that may have little to do with the original article. Thus a lot of commentary may indicate a weak thesis, or a controversial one. In either case they are no substitute for peer review.

Commentary is useful and it should be added to every publication, but it is no substitute for peer review. We have added it to our repository of DOE research reports:

I have mixed feelings about a degree as qualifier for publication. . .this may work well in some fields, but it’s a really bad idea in others. In paleontology, there is a significant minority of interesting, quality papers written and published by individuals without a Ph.D., for instance. I agree that there should be a way to avoid legitimizing “crank” work, but at the same time we don’t want to bar the doors to those who do quality work but don’t have the magic ticket for entry to the academic sideshow.

I agree with you on this – and I discussed such cases in the threads on the NCERC posts. Some fields at least would have to also have additional ways of approving quality researchers, perhaps Museums or Companies can vouch, or a committee can take a look (similar to a hiring committee)?

I definitely agree here. What makes a paper good or not depends on what’s in the paper, not on some proxy like institutional affiliation or whose lab you come from. This is the main thing scientists complain about now, that they’re judged based on the impact factor of the journal they published in, not on the true importance of their work.

What’s more, it’s technically possible now to look at research output in fine grained detail and see what really has been influential and what hasn’t. There’s not as much need to filter on the way into the archive if someone browsing the archive can intelligently filter what they see, and this way we make inclusiveness into an advantage. It’s a way past the age-old tradeoffs between quality and exclusivity.

(This is an example of how quickly the thread of commentary can shift away from the original article.) Many of what you call GW denialists (we prefer skeptics) have Ph.D.’s, including me. Perhaps you are not familiar with the field? The hypothesis of anthropogenic climate change is not only unproven, it is increasingly controversial.

Most of the ideas mentioned in the original article have already been tried in various repositories, either of the institutional or discipline-specific variety. It might be worth looking at some of these existing repositories, then, to see why they’ve made the choices they did, and how well they seem to work.

For instance, both ArXiv.org and our own institutional repository at Penn have the notion of subject-specific “provostial selection”. In both cases, it’s field-specific. At ArXiv,org, you need to be endorsed by someone else already endorsed in a field to deposit papers into that field’s collection. At Penn, you need to affiliated with a particular research unit to put papers into that research unit’s collection.

As Joe alluded to in some of his examples, this helps provide an initial filter against spam and nuttiness, though it’s still a much lower bar than peer review. As pointed out upthread, the simple acquisition of a PhD is both too restrictive and too broad. For instance, David Wojick and I both have PhDs, but neither of us have one in climate sciences, so that credential shouldn’t count for anything in giving either of us credibility on those subjects.

In the specific areas where we do have credentials and well-reviewed experience, it’s a different story, but there are enough examples of geniuses in one field making fools of themselves in other fields to make a general-purpose provostial endorsement problematic. (I took my undergrad degree at the same place that Serge Lang finished out his career, so I became familiar with the phenomenon early.)

But we are not talking about something as simple as Penn or a single discipline. Joe is talking about a central facility for all scholarly publication. If you want to certify, and track, all the world’s scholars at the disciplinary and institutional levels the paperwork is staggering. Why not just read the papers and see if they are any good, which is how we do it now?

For example, since my Ph.D. field is the logic of science I might publish on any discipline, and have published on several. I had a paper in Scientometrics that went from nanotechnology to bird flu. I have studied the logic of the climate change debate for 18 years. Am I not qualified to write about it?

Only accepting papers referred by someone whose paper has already been accepted is certainly simpler, but it hardly seems like open access. It sounds more like a closed club. The point is that there is probably no good alternative to reading the paper.

I’m no student of scholarly publishing, Joe, but wouldn’t the model you propose start well enough with papers rejected by journals or which are too timely to wait for the journal review process? A prominent scholar named Michael Gazzaniga told me 10 years ago that 72% of submissions to scholarly journals are rejected. Of the 28% published, the top 3% or 5% are clearly superior; the next 20+% accepted are not sharply distinguishable from the top 20+% that are rejected! Gazzaniga’s theory then (10 years ago) was that this represented a lot of good lost scholarship and he was looking for a solution, similar to what you’re proposing, to let it see the light of day.

Whether those percentages are right or not, the principle certainly is. No review mechanism is perfect. Limitations to the number of words in a journal are created by physical and commercial realities, not scholarly realities. Some sort of solution like what you propose would seem, in time, to be inevitable.

I seem to recall studies showing that virtually all papers eventually got published somewhere – the irony is that the worst (or least appropriately targeted by the author in terms of journal(s) to which submitted, at any rate) consume the most time and effort by reviewers, editors and publishers!

My second organizational mechanism problem is with your “provostial publishing” requirement that “the author must be affiliated with or sponsored by an established institution.”

This might be doable if you want to limit authors to, say, employees of accredited institutions of higher learning in OECD countries, otherwise this kind of certification is an unworkable nightmare. For example, a lot of research is done by commercial entities. I myself am a self employed subcontractor to a federal contractor. Who gets certified? And of course we want to encourage research by a myriad of small entities, profit or non, as well as individuals, all of whom will have to be certified.

Plus this kind of institutional certification is laborious and expensive on both sides. Peer review is free but certification looks like a back room expense of potentially great magnitude, perhaps billions of dollars a year, not millions. The present system just looks at the papers, which is very simple. Factoring in the employer will make things much more difficult.

Moreover, employer or institutional certification does not keep people from publishing on topics about which they know nothing, so long as their institution is certified. What keeps Harvard poets from filling up the quantum mechanics category? Conversely, people with experience will be excluded if they lack the proper affiliation.

In short, you are trying to redesign science as an organizational system, from the top down. Your proposal is utopian, to say the least.

(2) Unrefereed preprints, publicly posted, are not publications in the scholarly sense. They are just unpublished drafts.

(3) Peer review is meant to serve both to (1) improve and (2) tag the quality of research, providing an *advance* quality filter to guide the user in what is worth the investment of the scarce time to read and the risk of trying to build upon.

(4) Post hoc, self-appointed public commentary on publicly posted, unrefereed preprints does not provide this filtration function for the potential user (or provides it only partially, uncertainly, and too late).

(5) Many fields and authors do not and will not publicly post unrefereed drafts (sometimes just to safeguard their scholarly reputations, sometimes to protect public health from unrefereed claims).

(6) Authors who want public feedback on their unrefereed drafts can already solicit it today (though success is not guaranteed — nor always deserved) by posting their preprints online publicly in their institutional repository, or a central repository, or even their own blogs or websites.

(7) Authors don’t have to pay a penny to post their unrefereed preprints today, nor do they have to pay for any feedback, peer or otherwise, that they elicit.

(8) None of this is what open access is about.

(9) Open access is about authors providing free online access to their refereed journal articles (postprints, not preprints).

(10) This too costs the author nothing today (as long as institutional subscriptions are paying for the peer review).

(11) So what is needed today is not something else to pay for, but open access to what is already being paid for (by institutional subscriptions).

(12) Open access can be and is being provided by author self-archiving or refereed postprints — and mandates from authors’ universities and funders to self-archive their refereed postprints (Green OA).

(13) If and when mandated self-archiving makes subscriptions unsustainable as the way of paying for peer review, institutions can pay for their authors’ peer review fees out of their windfall savings from their institutional subscription cancellations (Gold OA).

(14) Till then, universal Green OA is enough.

(15) None of this has anything to do with peer review, for which peer commentary is supplement, not a substitute.

[note that in the following URLs, . is replaced by :, / ir replaced by -, and lnth and ptth are backwards — to avoid the SSP automatic spam filter that is triggered by too many URLs! apolgies — SH]

I strongly agree with Stevan that peer commentary is not peer review. It’s also worth noting that one function of peer review (other than filtering) is to help authors revise/improve manuscripts, which would not be catered for in Joe’s model.

Moreover, I would be hard pressed to find an academic who felt achieving a faculty position in a given subject should be sufficient qualification to publish at will. Just look at the controversy ‘communication’ of papers to PNAS by Academy members continues to generate.

It may be that publishing needs reinvention, but we should be wary of dispensing with certain features of the current model without thinking very very carefully.

I like this idea, but a problem I can see is that post publication review like any form of crowdsourcing only works when the crowd is significantly larger than the body of work.

If we did this tomorrow even including a provostial model(borrowing Mike Shatzkin’s calculations of a 72% rejection rate). Surely the number of scholars willing to contribute to post-pub review would be significantly fewer than the number of papers submitted?

I think there’s ways of pulling in all the commentary of and about papers and finding participation. There’s already a large community at http://scientificblogging.org, why can’t PLoS or someone go out and find commentary wherever it happens, instead of expecting people to comment on-site?

Crotty et al. often bemoan the level of participation, but I think it’s a technical issue with a technical solution.

It’s well understood that online commentary (about anything) needs community management and building the community requires careful cultivation of both roles, audience and content creators. Contributing post-pub reviews is only one side of things, cultivation of the audience is needed too, by smart and useful exposure of the post-pub reviews where it can do the most good.

If you think the lack of participation is solely a technical problem, then you’re extremely out of touch with both the scientific community, and with humans as a whole.

Not everyone is interested in reading blogs. Even fewer are interested in leaving comments on blogs. Even fewer are interested in writing blogs. Yesterday this blog had around 2,000 pageviews but strangely, there were not 2,000 comments left. By your reasoning, this is a technical issue, with a technical solution, right? Why weren’t there 2,000 new blog entries written about yesterday’s articles? Technical issues?

There are well over 5 million working scientists in the US alone. Yet there are at best a few thousand science blogs. Are we willing to cede the entire balance of determining value in science to the small group of outliers who are interested in blogging?

Beyond this, there are much-debated issues around the reticence of most scientists to publicly criticize the work of their peers. There are fears of discrimination if the criticized scientist ends up on a grant review committee, a hiring committee, a meeting organization committee, etc.

Also, there are many, many valid and useful scientific papers published that don’t inspire comment.

As a working scientist, how many hours of your day are you willing to commit to leaving comments on the papers of others? How much time are you willing to take away from doing actual experiments?

So claiming a lack of online commentary on scientific papers is simply a technical matter is absurdly naive.

But that’s a major undertaking, and one which would need to be paid for. Once you hire an entire editorial staff to chase post-publication reviewers, you’re eliminating the financial benefits of the lowered overhead that a post-publication reviewed OA system would engender.

Also note that as an editor who arranges peer review, I often have to go through between 5 and 10 scientists in order to get 1 to agree to review a paper. Which further speaks to the idea that getting people to review and rate papers goes well beyond being just a “technical” problem.

I liked this article and arXiV’s ‘provostial posting’ seems a good way forward. The essence of scholarly publishing is that each author involved has a reputation to lose or gain. There should be a recognised way for authors not affiliated to any institution to acquire (and lose) posting rights.

The archive should be organised so that it is easy to refer to the past body of work of any scholar and determine whether they are a serious contributor to the field or not. Also, there should be a “hide/show all future posts from this individual” button, not to be lightly used, but important for attentional economy. Awareness of this tool should keep authors temperate and on-topic, especially if it is used by others as a negative metric in future funding decisions.

There’s little mysterious about what scholarly journals currently do at present: they take in papers, distribute them amongst their databased referees, gather their comments, and publish revised versions, reflecting the referee’s concerns, and, in passing, impart a certain stamp of quality, the fact of publication being an instant badge that reads “survived peer review process at journal X.”

If there is a mystery that journals guard to their chests, it is the quality of refereeing. Both author and referee are judged by the editors, but (usually) it is only the author that is published. (Reviewer’s comments may mutate into published commentary pieces.)
Online publishing can easily–indeed more efficiently and transparently–model this process.

Just as authors will need to triangulate their reputations with others’ to achieve access to the archive, so they will be responsible for triangulating the initial reputation of each “paper” by seeking serious comment and criticism to append to their initial effort.

If they then wish to change the paper to reflect this early, serious criticism, a transparent version trail should be available for public consumption, but the latest version should be that which is most prominently available.

Technical solutions (registers of checksums etc) are available to ensure the integrity of file versions.

I really doubt that this or any OA platform could really be successful or as productive as our current system without some sort of editorial oversight. That doesn’t necessarily mean though that each discipline would need a separate expert editor or that the platform would need to be divided into journal-like sections based on disciplines. I think there are ways to utilize the technological advantages that this kind of broad OA platform could offer (namely timeliness, broader dissemination, and the opportunity for interdisciplinary collaboration) by adopting some of the mechanisms that successful online communities like Reddit, Digg, or Metafilter currently use for moderation, such as allowing the three Fs: flags, favorites, and forums for both the articles and the commentary. Favorites would allow the cream to rise to the top much faster than one might expect search engine rankings could do, and flags make an editor’s/moderator’s job significantly easier by crowd sourcing the task of identifying what needs to be edited out. In fact, Metafilter takes the idea of moderation even farther by also offering a forum to discuss the editorial decisions. The folks who run the site listen and modify policy based on the reasoning and welfare of the larger community. This sort of system doesn’t then require disciplinary expertise for each of the articles; instead the most valuable traits of the editors/moderators are their sense of fairness and their community building skills. If we’re imaging a utopian architecture for dissemination, we need to also consider a key component for any utopian endeavor—the role of the community, and specifically how that community is governed and how it is allowed to express itself.

Maybe this isn’t a perfect replacement for most of our current system of scholarly communication. But as imagined, it still seems an intriguing and potentially highly productive platform that I would love to see built and nurtured.

Joe has made a good start because his model raises the various rule issues that need to be resolved. Both the US Congress and the Federal science agencies are grappling with these issues. This group has something to offer as well. As I said at the beginning, it is a fun game, but it is not simple. The trick is to isolate and resolve dozens of individual issues.

The important thing, in my view, is not to design the specifics of the system beyond the broad outline I sketched in the individual post. The important thing is to architect a serial set of opportunities for enhancement. This is why I emphasized the need to open the service for commercial exploitation. Need a new feature? Let an entrepreneur invest in it–and one will. If you don’t compromise the underlying access to the articles, what’s not to like?

But as I pointed out in my original posts, your broad outline does not work. You need to get at least specific enough to know that the basic system can be built at a reasonable cost. If it can’t then the issue of enhancements is irrelevant.

We are obviously talking past each other, unless your friend has already certified all the world’s scholars, and got them to post their writings, which I doubt. The issues I have raised are not technological, so you must not understand what I have said.

There is an assumption built into the crowdsourcing model that one person’s view is worth the same as another’s.

I don’t think in science that that is true – that’s not to say that all opinions from lay person to expert aren’t valid, but that we would need a way of distinguishing between the assessment criteria used.

One thing I find missing in discussions like this is the bit about discovery. The assumption is that Google (or equivalents) will simply find the articles and present them to readers and that, combined with word of mouth (or its e-equivalents), will be enough to ensure that content is found and downloaded. I’ve been looking at download data from some institutions that do ‘provost’ OA publishing and the numbers aren’t great, in fact they are poor. For example, one institution that publishes about the same volume of new work every year as the one I work for got around 750k downloads in 2011 and this figure was 50% down on the previous year! Another similar institution told me they have a big problem – no-one seems to be reading their research because traffic levels are low. By comparison, we delivered 10 million downloads in 2011, 30% up on 2011 and we’re on track to repeat this growth in 2012. Whereas these two peer institutions have entirely passive ‘Google will find it’ approaches to publishing, we have a ‘push’ multi-channel approach combined with a global sales and marketing network that actively promotes our content to audiences. If the objective of OA is to maximise audience reach and downloads, then it seems to me that institutions and authors need to do a lot more than post and hope.
[In case you’re wondering, all three institutions offer all their content free to read online – in our case we also offer a subscription platform which provides additional value-added services to subscribers. The income from the latter covers all publishing costs, including the free to read online service.]

Recent Tweets

@deevybee @cshperspectives @laurencepearl @acmedsci @royalsociety @UKRI_CEO I'm not sure I am particularly well qualified to contribute to this debate, but to my mind most molecular cell biology research comprises a sequential series of incremental steps that can't be predicted in full at the time of hypothesis formulation [1/2]

@deevybee @laurencepearl @MJ_Humphries @acmedsci @royalsociety @UKRI_CEO I think preregistration is great. But I don’t think it will work for everything, particularly not the sort of follow-your-nose, serendipitous work often seen in mol/cell biology. 1/2

Next Article:

The mission of the Society for Scholarly Publishing (SSP) is to advance scholarly publishing and communication, and the professional development of its members through education, collaboration, and networking. SSP established The Scholarly Kitchen blog in February 2008 to keep SSP members and interested parties aware of new developments in publishing.

The Scholarly Kitchen is a moderated and independent blog. Opinions on The Scholarly Kitchen are those of the authors. They are not necessarily those held by the Society for Scholarly Publishing nor by their respective employers.