Thursday, September 22, 2016

Seventeen years ago 25 people gathered in Santa Fe, New Mexico,
to discuss ways in which the growing number of e-print servers and digital
repositories could be made interoperable.

As scholarly archives and repositories had begun to proliferate a number
of issues had arisen. There was a concern, for instance, that archives would
needlessly replicate each other’s content, and that users would have to learn
multiple interfaces in order to use them.

Photo courtesy Susan
van Hengstum

It was therefore felt there was a
need to develop tools and protocols that would allow repositories to copy
content from each other, and to work in concert on a distributed basis.

Key to the OAI-PMH approach was the notion that data providers – the individual archives – would be given easy-to-implement mechanisms for making information about what they held in their archives externally available. This external availability would then enable third-party service providers to build higher levels of functionality by using the metadata harvesting protocol.

The repository model that the organisers of the Santa Fe meeting had very much
in mind was the physics preprint server arXiv This had been created in 1991 by
physicist Paul Ginsparg, who was one of the attendees of the New Mexico meeting. As a
result, the early focus of the initiative was on increasing the speed with
which research papers were shared, and it was therefore assumed that the
emphasis would be on archiving papers that had yet to be published (i.e.
preprints).

However, amongst the Santa Fe attendees were a number of open
access advocates. They saw OAI-PMH as a way of aggregating content hosted in
local – rather than central – archives. And they envisaged that the archived
content would be papers that had already been published, rather than preprints.
These local archives later came to be known as institutional repositories, orIRs.

In other words, the OA advocates present were committed to the
concept of author self-archiving (aka green open access). The
objective for them was to encourage universities to create their own repositories and then instruct their researchers
to deposit in them copies of all the papers they published in subscription journals.

As these
repositories would be on the open internet outside any paywall the papers would be freely
available to all. And the expectation was that OAI-PMH would allow the content from all
these local repositories to be
aggregated into a single searchable virtual archive of (eventually) all
published research.

Given these different perspectives there was
inevitably some tension around the OAI from the beginning. And as
the open access movement took off, and IRs proliferated, a number of other
groups emerged, each with their own ideas about what the role and target
content of institutional repositories should be. The resulting confusion
continues to plague the IR landscape.

Moreover, today we can see that the
interoperability promised by OAI-PMH has not really materialised, few
third-party service providers have emerged, and content duplication has not
been avoided. And to the exasperation of green OA advocates, author
self-archiving has remained a minority sport, with researchers reluctant to
take on the task of depositing their papers in their institutional repository. Given this, some believe the IR now faces an existential threat.

In light of the challenging, volatile, but
inherently interesting situation that IRs now find themselves in I decided
recently to contact a few of the Santa Fe attendees and put some questions to
them. My first two approaches were unsuccessful, but I struck third-time lucky
when Clifford Lynch, director of the Washington-based Coalition for
Networked Information (CNI), agreed to answer my questions.

I am publishing the resultant Q&A today. This
can be accessed in the pdf filehere.

As is my custom, I have prefaced the interview with
a long introduction. However, those who only wish to read the Q&A need
simply click on the link at the head of the file and go directly to it.