Friday, January 20, 2017

The NIH Public Access Policy: A triumph of green open access?

There has always been a contradiction at the heart of the open access
movement. Let me explain.

The Budapest Open Access Initiative (BOAI) defined open
access as being the:

“free availability
[of research papers] on the public internet, permitting any users to read,
download, copy, distribute, print, search, or link to the full texts of these
articles, crawl them for indexing, pass them as data to software, or use them
for any other lawful purpose, without financial, legal, or technical barriers
other than those inseparable from gaining access to the internet itself. The
only constraint on reproduction and distribution, and the only role for
copyright in this domain, should be to give authors control over the integrity
of their work and the right to be properly acknowledged and cited.”

BOAI then proceeded to outline two strategies for achieving open access:
(I) Self-archiving; (II) a new generation of open-access journals. These two
strategies later became known, respectively, as green OA and gold OA.

At the time of the BOAI meeting the Creative Commons licences had not
been released. When they were, OA advocates began to insist that to meet the
BOAI definition, research papers had to have a CC BY licence attached, thereby
signalling to the world that anyone was free to share, adapt and reuse the work
for any purpose, even commercially.

For OA purists, therefore, a research paper can only be described as
open access if it has a CC BY licence attached.

The problem here, of course, is that the vast majority of papers
deposited in repositories cannot be made available on a CC BY basis, because green
OA assumes authors continue to publish in subscription journals and then
self-archive a copy of their work in an open repository.

Since publishing in a subscription journal requires assigning copyright
(or exclusive publishing rights) to a publisher, and few (if any) subscription
publishers will allow papers that are earning them subscription revenues to be
made available with a CC BY licence attached, we can see the contradiction built
into the open access movement. Quite simply, green OA cannot meet the
definition of open access prescribed by BOAI.

To see how this works in practice, let’s consider the National
Institutes of Health (NIH) Public Access Policy. This is described on
Wikipedia as an “open access mandate”, and by Nature as a green OA policy, since it requires that all papers published
as a result of NIH funding have to be made freely available in the NIH repository
PubMed Central (PMC) within 12
months of publication. In fact, the NIH policy is viewed as the premier green OA
policy.

But how many of the papers being deposited in PMC in order to comply with
the Policy have a CC BY licence attached and so are, strictly speaking, open
access?

There are currently 4.2 million articles in PMC. Of these around 1.5
million consist of pre-2000 historical content being deposited as part of the
NIH’s scanning
projects. Some of these papers are still under copyright, some are in the
public domain, and some are available CC BY-NC. However, since this is
historical material pre-dating both the open access movement and the NIH Policy
let’s put it aside.

That leaves us with around 2.7 million papers in PMC that have been published
since 2000. Today around 24% of these papers have a CC BY licence attached. In
other words, some 76% of the papers in PMC are not open access as defined by
BOAI.

The good news is that the percentage with a CC BY licence is growing, and the table below (kindly
put together for me by PMC) shows this growth. In 2008, just 8% of the papers in
PMC had a CC BY licence attached. Since then the percentage has grown to 12% in
2010, 14% in 2012, 19% in 2014 and, as noted, it stands at 24% today.

So, although the majority of papers in PMC today are not strictly
speaking open access, the percentage that are is growing over time. Is this a
triumph of green OA? Let’s consider.

There are two submission routes to PMC. Where there is an agreement between
NIH and a publisher, research papers can be input directly into PMC by that
publisher. Authors, and publishers with no PMC agreement, have to use the NIH Manuscript Submission System (NIHMS,
overview here).

The table above shows that the number of “author manuscripts” that came via the NIHMS route represents just 19% of the content in PMC. And since some publishers do
not have an agreement with PMC, the number that will have been self-archived
by authors will be that much lower. So the overwhelming majority of papers being
uploaded to PMC are being uploaded not by authors, but by publishers, and it seems safe to assume that those papers with a CC BY licence attached (currently 24% of the total) will have been
published as gold OA rather than under the subscription model.

We could also note that just 0.06% of the papers in PMC today that were deposited
via the NIHMS have a CC BY licence attached, and we can
assume that these were submitted by gold publishers that do not have an agreement
allowing for direct deposit, rather than by authors.

In short, it would seem that the growth in CC BY papers in PMC is a
function of the growth of gold OA, not green OA. As such, we might want to conclude that the success of PMC is a
triumph of gold OA rather than of green OA.

Does this matter? The answer will probably depend on one’s views of the
merits of article-processing charges, which I think it safe to assume most of
the papers in PMC with a CC BY licence will have incurred.

Either way, that today 76% of the content in PMC – the world’s premier
open repository – still cannot meet the BOAI definition of open access suggests
that the OA movement still has a way to go.

2 comments:

OA advocates are a plurality, not a monolith. “They” do not agree that only CC-BY = OA.

There are two "shades" of OA:

"Gratis OA" = free access"Libre OA" = CC-BY

The right measure of proportion OA for PMC (or any repository) is the percent that is Gratis or Libre OA, not just the percent that is CC-BY. (It also matter when it is deposited: immediately or a year or more after publication.)

The PMC figures are insufficient. Percent OA in PMC does not even represent percent OA in biomedicine, in the US or globally, let alone in all fields. And PMC, as Richard notes, is largely publisher-deposited, which means it's For-Fee Fool's Gold OA rather than author-deposited For-Free Green OA.

That the percentage OA is growing globally with time is inevitable, as the old researchers are retiring with time, and the young researchers have more sense.

The goal, however, is OA, not "living up to the BOAI definition."

And the growth rate is still absurdly slow, compared to what it could and ought to be (and have been).

In my opinion, the success of open access depends in great measure on the involvement and collaboration of authors. Firstly, when they choose where to publish. If they don't pay attention to which the self-archiving policy of the journal is, before submitting a manuscript, that is unlikely that they can fulfill open-access policies later. Secondly, when they submit one of the versions of their article (pre-print or post-print) to a repository, when the publisher's version is not permitted. Over a 75% of articles could be archived in a repository using one of the author's versions, if they submitted them (http://hdl.handle.net/10668/2215). So, promoting open access among researchers and authors, and making them aware of their options and responsability before submitting an article to a scientific journal, could increase substantially the success of Open Access, specially Libre OA.