Wednesday, October 05, 2016

The
introduction I wrote for the recent Q&A with Clifford Lynch has attracted some
commentary from the institutional repository (IR) and open access
(OA) communities. I
thank those who took the time to respond. After reading the comments the
following questions occurred to me.

Judging
by the Mark Twain quote with which COAR’s Kathleen Shearer
headed her response (“The reports of
our death have been greatly exaggerated”), and judging by CORE’s Nancy Pontika insisting
in her comment that we should
not give up on the IR (“It is my strong belief that we don’t need to abandon
repositories”) people might conclude that I had said the IR is dead.

Indeed,
by the time Shearer’s comments were republished on the OpenAIRE blog (under the title
“COAR counters reports of repositories’ demise”) the wording had strengthened –
Shearer was now saying that I had made a number of “somewhat questionable
assertions, in particular that institutional repositories (IRs) have failed.”

That
is not exactly what I said, although I did quote a blog post by Eric Van de
Velde (here) in which he
declared the IR obsolete. As he put it, “Its flawed foundation cannot be
repaired. The IR must be phased out and replaced with viable alternatives.”

What
I said (and about this Clifford Lynch
seemed to agree, as do a growing number of others) is that it is time for the
research community to take stock, and rethink what it hopes to achieve with the
IR.

It
is however correct to say I argued that green OA has “failed as a strategy”. And
I do believe this. I gave some of the reasons why I do in my introduction, the
most obvious of which is that green OA advocates assumed that once IRs were
created they would quickly be filled by researchers self-archiving their work.
Yet seventeen years after the Santa Fe meeting, and 22 years
after Stevan Harnad began his long campaign to persuade
researchers to self-archive, it is clear there remains little or no appetite
for doing so, even though researchers are more than happy to post their papers
on commercial sites like Academia.edu and ResearchGate.

However,
I then went on to say that I saw two possible future scenarios for the IR. The
first would see the research community “finally come together, agree on the
appropriate role and purpose of the IR, and then implement a strategic plan
that will see repositories filled with the target content (whatever it is
deemed to be).”

The
second scenario I envisaged was that the IR would be “captured by commercial
publishers, much as open access itself is being captured by means of
pay-to-publish gold OA.”

Neither
of these scenarios assumes the IR will die, although they do envisage somewhat
different futures for it. That said, what they could share in common is a
propensity for the link between the IR and open access to weaken. Already we
are seeing a growing number of papers in IRs being hidden behind login walls –
either as a result of publisher embargoes or because many institutions have
come to view the IR less as a way of making research freely available, more as a
primary source of raw material for researcher evaluation and/or other internal
processes. As IRs merge with Research Information Management (RIM) tools and Current
Research Information Systems (CRIS) this darkening of
the content in IRs could intensify.

What
makes this darkening likely is that the internal processes that IRs are
starting to be used for generally only require the deposit of the metadata
(bibliographic details) of papers, not the full-text. As such, the underlying documents
may not just be inaccessible, but entirely absent.

This
outcome seems even more likely in my second scenario. Here the IR is (so far as
research articles are concerned) downgraded to the task of linking users to
content hosted on publishers’ sites. Again, to fulfil such a role the IR need host
only metadata.

2.So what is the role of an institutional repository?
What should be deposited in it, and for what purpose?

As
I pointed out in my introduction, there is today no consensus on the role and
purpose of the IR. Some see it as a platform for green OA, some view it as a
journal publication platform, some as a metadata repository, some as a digital
archive, some as a research data repository (I could go on).

It
is worth noting here a comment posted on my blog
by David Lowe. The reason why the IR will persist, he said, “is not related to
OA publishing as such, but instead to ETDs.” Presumably this means that Lowe
expects the primary role of the IR to become that of facilitating ETD
workflows.

It
turns out that ETDs are frequently locked behind login walls, as Joachim Schöpfel and Hélène Prost pointed
out in a
2014 paper called Back to Grey: Disclosure and Concealment of
Electronic Theses and Dissertations. “Our paper,” they wrote “describes a
new and unexpected effect of the development of digital libraries and open
access, as a paradoxical practice of hiding information from the scientific
community and society, while partly sharing it with a restricted population
(campus).”

And
they concluded that the Internet “is not synonymous with openness, and the
creation of institutional repositories and ETD workflows does not make all
items more accessible and available. Sometimes, the new infrastructure even
appears to increase barriers.”

In
short, the roles that IRs are expected to play are now manifold and
sometimes they are in conflict with one another. One consequence of this is
that the link between the repository and open access could become more and more
tenuous. Indeed, it is not beyond the bounds of possibility that the link could
break altogether.

3.To what extent can we say that the IR movement – and the
OAI-PMH standard on which it was based – has proved successful, both in terms
of interoperability and deposit levels?

As
I said in my introduction, thousands of IRs have been created since 1999. That
is undoubtedly an achievement. On the other hand, many of these repositories remain
half empty, and for the reasons stated about we could see them increasingly being populated
with metadata alone.

Both
Shearer and Pontika agree that more could have been achieved with the IR. With
regard to OAI-PMH Pontika says that while it has its disadvantages, “it has
served the field well for quite some time now.”

But
what does serving the field well mean in this context? Let’s recall that the
main reason for holding the Santa Fe meeting, and for developing OAI-PMH, was
to make IRs interoperable. And yet interoperability remains more
aspiration than reality today. Perhaps for this reason most research papers are
now located by means of commercial search engines and Google Scholar, not
OAI-PMH harvesters – a point Shearer conceded when I
interviewed her in 2014.

Of
course, if running an IR becomes less about providing open access and more
about enabling internal processes, or linking to papers hosted elsewhere, interoperability
begins to seem unnecessary.

4.Do IR advocates now accept that there is a need to re-think
the institutional repository, and is the IR movement about to experience a
great leap forward as a result?

Most
IR advocates do appear to agree that it is time to review the current status of
the institutional repository, and to rethink its role and purpose. And it is
the Confederation of Open Access Repositories (COAR) that is leading on this.

Shearer,
who is the executive director of COAR (and so presumably responsible for the working
group), explains in her response that the group has set itself the task of
identifying “the core functionalities for the next generation of repositories,
as well as the architectures and technologies required to implement them.”

As
a result, Shearer says, the IR community is “now well positioned to offer a
viable alternative for an open and community led scholarly communication
system.”

So
all is well? Not everyone thinks so. As an anonymous commenter pointed
out
on my blog: “All this is not really offering a new way and more like reacting
to the flow. Maybe that has to do with the kind of people working on it, the IR
crowd is usually coming from the library field and their job is not to be
inventive but to archive and keep stuff save.”

Archiving
and keeping stuff save are very worthy missions, but it is to for-profit publishers that
people tend to turn when they are looking for inventive solutions, and we can see that legacy publishers are
now keen to move into the IR space. This suggests that if the goal is to create a community-led
scholarly communications system COAR’s initiative could turn out to be a case
of shutting the stable door after the horse has bolted.

5.What is the most important task when seeking to
engineer radical change in scholarly communication: articulating a vision,
providing enabling technology, or getting community buy-in?

“Ultimately,
what we are promoting is a conceptual model, not a technology,” says Shearer
“Technologies will and must change over time, including repository
technologies. We are calling for the scholarly community to take back control
of the knowledge production process via a distributed network based at
scholarly institutions around the world.”

Shearer
adds that the following vision underlies COAR’s work:

“To position distributed repositories as
the foundation of a globally networked infrastructure for scholarly
communication that is collectively managed by the scholarly community. The
resulting global repository network should have the potential to help transform
the scholarly communication system by emphasizing the benefits of collective,
open and distributed management, open content, uniform behaviors, real-time
dissemination, and collective innovation.”

As
such, I take it that COAR is seeking to facilitate the first scenario I
outlined. But were not the above objectives those of the attendees of the 1999
Santa Fe meeting? Yet seventeen years later we are still waiting for them to be
realised. Why might it be different this time around, especially now that legacy
publishers are entering the market for IR services, and some universities seem minded to outsource the hosting of research papers to commercial
organisations, rather than work with colleagues in the research community to create an interoperable network of distributed repositories?

What
has also become apparent over the past 17 years is that open movements and initiatives
focused on radical reform of scholarly communication tend to be long on
impassioned calls, petitions and visions, short on collective action.

As
NYU librarian April Hathcockput it when reporting on
a Force11 Scholarly
Commons Working Group
she attended recently: “As several of my fellow librarian colleagues pointed
out at the meeting, we tend to participate in conversations like this all the
time and always with very similar results. The principles are fine, but to me,
they’re nothing new or radical. They’re the same things we’ve been talking about
for ages.”

Without
doubt, articulating a vision is a good and necessary thing to do. But it can
only take you so far. You also need enabling technology. And here we have
learned that there is many a slip ‘twixt the cup and the lip.” OAI-PMH has not
delivered on its promise, as even Herbert Van de Sompel, one of the architects
of the protocol, appears to have concluded. (Although this tweet suggests that he too
does not agree with the way I characterised the current state of the IR
movement).

Shearer
is of course right to say that technologies have to change over time. However, choosing
the wrong one can at derail, or significantly slow down, the objective you are working towards.

But
even if you have articulated a clear and desirable vision, and you have put the
right technology in place, in the generally chaotic and anarchic world of
scholarly communication you can only hope to achieve your objectives if you get
community buy-in. That is what the IR and self-archiving movements have surely
demonstrated.

6.To what extent are commercial organisations colonising
the IR landscape?

In
my introduction I said that commercial publishers are now actively seeking to colonise
and control the repository (a strategy supported by their parallel activities aimed
at co-opting gold open access). As such, I said, the challenge the IR community
faces is now much greater than in 1999.

In
her response, Shearer says that I mischaracterise the situation. “[T]here are
numerous examples of not-for-profit aggregators including BASE, CORE, SemanticScholar, CiteSeerX, OpenAIRE, LA Referencia and SHARE (I could go on),” she said. “These
services index and provide access to a large set of articles, while also, in
some cases, keeping a copy of the content.”

In
fact, I did discuss non-profit services like BASE and OpenAIRE, as well as PubMed
Central, HAL and SciELO. In doing so I pointed out that a high percentage of the
large set of articles that Shearer refers to are not actually full-text
documents, but metadata records. And of the full-text documents that are deposited, many are locked behind login walls. In the
case of BASE, therefore, only around 60% of the records it
indexes provide access to the full-text.

In
addition, many consist of non-peer-reviewed and non-target content such as blog
posts. That’s fine, but this is not the target content that OA advocates say they want to see made open access. Indeed, in some cases
a record may consist of no more than a link to a link (e.g. see the first item
listed here).

So
the claims that these services make about indexing and providing access to a large
set of articles need to be taken with a pinch of salt.

It
is also important to note that publishers are at a significant advantage here,
since they host and control access to the full-text of everything they publish.
Moreover, they can provide access to the version of record (VoR) of articles.
This is invariably the version that researchers want to read.

It
also means that publishers can offer access both to OA papers as well as to paywalled
papers, all through the same interface. And since they have the necessary funds
to perfect the technology, publishers can offer more and better functionality,
and a more user-friendly interface. For this reason, I suggested, they will
soon (and indeed some already are)
charging for services that index open content, as I assume Elsevier plans to do
with the DataSearch service it is
developing. This seems to me to be a new form of enclosure of the commons.

Shearer
also took me to task for attaching too much significance to the partnership between Elsevier
and the University of Florida – in which the University has agreed to outsource
access to papers indexed in its repository to Elsevier. I suggested that by signing
up to deals like this, universities will allow commercial publishers to increasingly
control and marginalise IRs. This is an exaggeration, says Shearer “[O]ne repository
does not make a trend.”

I
agree that one swallow does not a summer make. However, summer does eventually arrive,
and I anticipate that the agreement with the University of Florida will prove the
first swallow of a hot summer. Other swallows will surely follow.

Consider,
for instance, that the University of Florida has also signed a Letter of
Agreement
with CHORUS in a pilot initiative intended to scale up the Elsevier project “to
a multilateral, industry effort.”

And
just last week it was announced that Qatar University
Library
has signed a deal with Elsevier that apes the one signed by the University of
Florida. I think we can see a trend in the making here.

As
things stand, therefore, it is not clear to me how initiatives like COAR and SHARE can hope to match the collective power of
legacy publishers working through CHORUS.

Let’s
recall that OA advocates long argued that legacy publishers would never be able
to replicate in an OA environment the dominance they have long enjoyed in the
subscription world. As a result, it was said, as open access commodifies the services
they provide publishers will experience a downward pressure on prices. In
response, they will either have to downsize their operations, or get out of
the publishing business altogether. Today we can see that legacy publishers are
not only prospering in the OA environment, but getting ever richer as their
profits rise – all at the expense of the taxpayer.

But
let me be clear: while I fear that legacy publishers are going to co-opt both
OA and IRs, I would much prefer they did not. Far better that the research
community – with the help of non-profit concerns – succeeded in developing COAR’s
“viable alternative for an open and community led scholarly communication
system.”

So
I applaud COAR’s initiative and absolutely sign up to its vision. My doubts are
that, as things stand, that vision is unlikely to be realised. For it to happen
I believe more dramatic changes would be needed than the OA and IR movements appear
to assume, or are working towards.

7.Will the IR movement, as with all such attempts by the
research community to take back control of scholarly communication, inevitably
fall victim to a collective action dilemma?

Let
me here quote Van de Sompel, one of the key architects of OAI-PMH. Van de
Sompel, I would add, has subsequently worked on OAI-ORE (which Lynch mentions in the
Q&A) and on ResourceSync (which Shearer
mentions in her critique).

In
a retrospective on repository
interoperability efforts published last year Van de Sompel concluded, “Over the
years, we have learned that no one is ‘King of Scholarly Communication’ and
that no progress regarding interoperability can be accomplished without active
involvement and buy-in from the stakeholder communities. However, it is a
significant challenge to determine what exactly the stakeholder communities
are, and who can act as their representatives, when the target environment is
as broad as all nodes involved in web-based scholarship. To put this
differently, it is hard to know how to exactly start an effort to work towards
increased interoperability.”

The
larger problem here, of course, is the difficulties inherent in trying to get
the research community to co-operate.

This
is the problem that afflicts all attempts by the research community to, in
Shearer’s words, “take back control of the knowledge production process.” What inevitably
happens is that they bump up against what John Wenzler, Dean of Libraries California
State University, has described as a “collective
action dilemma”.

But
what is the solution? Wenzler suggests the research community should focus on trying
to control the costs of scholarly communication. Possible ways of doing this he
says could include requiring pricing transparency and lobbying for government
intervention and regulation. “[T]he government can try to limit a natural
monopoly’s ability to exploit its customers by regulating its prices instead.”)

He
concedes however: “Currently, the dominant political ideology in Western
capitalist countries, especially in the United States, is hostile to regulation,
and it would be difficult to convince politicians to impose prices on an
industry that hasn’t been regulated in the past.”

He
adds: “Moreover, even if some kind of International Publishing Committee were
created to establish price rates, there is a chance that regulators would be
captured by publisher interests.”

It
is worth recalling that while OA advocates have successfully persuaded many
governments to introduce open access/public access policies, this has not put
control of the knowledge production process back into the hands of the research
community, or reduced prices. Quite the reverse: it is (ironically) increasing the power and dominance
of legacy publishers.

In
short, as things stand if you want to make a lot of money from the taxpayer you
could do no better than become a scholarly publisher!

I
don’t like being the eternal pessimist. I am convinced there must be a way of
achieving the objectives of the open access and IR movements, and I believe it
would be a good thing for that to happen. Before it can, however, these
movements really need to acknowledge the degree to which their objectives are being
undermined and waylaid by publishers. And rather than just repeating the same old
mantras, and recycling the same visions, they need to come up with new and more
compelling strategies for achieving their objectives. I don’t claim to know
what the answer is, but I do know that time is not on the side of the research
community here.