Scholarly content and the cliff edge: the place of subject ‘repositories’

The famous (and famously reclusive) author J.D. Salinger died on 27 January this year, two days after the anniversary of the birth of Robert Burns – a day which is celebrated across Scotland and in many parts of the world. Salinger and Burns are of course connected, since the title of Salinger’s most famous novel, The Catcher in the Rye, is based on a mishearing of the Burns song Comin’ Through the Rye by the protagonist, 17-year old Holden Caulfield:

… I’m standing on the edge of some crazy cliff. What I have to do, I have to catch everybody if they start to go over the cliff—I mean if they’re running and they don’t look where they’re going I have to come out from somewhere and catch them. That’s all I’d do all day. I’d just be the catcher in the rye and all.

Salinger, J.D., The Catcher in the Rye, Chapter 22

The idea of being a ‘catcher’ struck me when I attended a conference held at the British Library last week, Subject Repositories: European Collaboration in the International Context. Neil Jacobs of JISC mentioned Glasgow University Library’s policy of seeking to ‘catch’ researchers close to the end of funded projects to ask if they would like help with their outputs. Certainly, it is easy to argue for libraries to be the ‘catchers in the rye’ when it comes to digital scholarly works and outputs – and the obvious place to deposit these materials is the institutional repository.

However, we were gathered at the BL to hear about subject repositories – including EconomistsOnline which was being launched during the event. And we heard about several very successful subject repositories in a number of very good presentations. The event left me reflecting on a number of things. For example, some subject repositories are success stories almost against all odds. Services like arXiv and RePEc have captured their respective corners of academia so effectively that they go on existing and attracting even without much resource (almost none in the case of RePEc), and their proven value is such that people probably would pay to maintain them (as arXiv is now proposing for its heaviest users). This makes them the inverse of many institutional repositories, which can’t attract content almost irrespective of the amount of resource invested.

The event inevitably had an economics flavour, and one of the most interesting presentations was by Christian Zimmerman, a Law Professor at the University of Connecticut, who does some of the management of RePEc. RePEc has established a unique role for itself in a field in which several subject repositories co-exist. Harvesting the Open Access corpus means that papers are re-aggregated on a regular basis (as Roy mentions in his post below), and we heard about the re-use of RePEc data in EconPapers, IDEAS, Socionet, NEP, EconomistsOnline, Inomics, Q-Sensei, Google Scholar, Econlit, and WorldCat (via OAIster). Another reason for RePEc’s success is that it maintains its own rankings – now with sufficient validity among economists that they are often taken into account when interviews are held for faculty positions. Christian Zimmerman put it this way: ‘Librarians can beat the bushes all they want. This works!’.

Clifford Lynch of CNI reflected upon the respective roles of institutional and subject repositories. Institutional repositories are necessary catchers. But subject repositories will be experimental laboratories for how we do peer review, attention-allocation, ranking, the building of collaboration environments and other innovative things as research domains make increasing use of the network level. He queried the binary argument that institutional repositories are a surer bet because subject repositories – floating freely with only the nebulous support of domain communities – may come crashing down if resource is suddenly withdrawn. And once again, there is a paradox in play here, with some of the most successful repositories being sustained on a very small resource base, but nonetheless likely to be guaranteed by their very success. The best subject repositories, because of scale and credibility, can develop sticky value-added services in ways that institutional repositories will find much more difficult. When you begin to look into RePEc for example, you see – beyond the ‘badly designed website’ which Christian Zimmerman seemed quite proud of (almost as a badge of credibility) – an intricate ecology of partnerships and providers, with records being fed into the main service from a large number of institutionally-based archives – of working papers and other series – around the world. Other partners provide statistical services, and there is also a variety of expressions of the same data, as the economics community is encouraged to take the base data and provide it in bespoke ways to fit different needs.

All of which leads me to wonder about the way libraries should be organised to play their role as scholarship’s catchers most effectively. We talk often about ‘new roles for librarians’, considering their retraining needs as data experts or informaticians, for example. But isn’t there a need also for libraries to organise as a community and step up to new roles in a reconfigured scholarship support environment? Certainly, all academic libraries should be collecting digital research outputs at the institutional level – and they should be doing this for the sake of scholarship first and foremost, rather than simply to meet research assessment requirements or tackle publishers over commercial journal prices. These institutional repositories should be archives in the sense that they catch digital outputs produced on campus. They may also be value-added providers to their own campuses, eg for research assessment. But services like arXiv, PubMed Central and RePEc are disciplinary venues. They may or may not be literal repositories (RePEc, for example, is not). What is important about them is that they are dynamic places for the sharing of scholarship – the places where our digital networked world allows the invisible college to become visible – necessary because the published literature system, the record of scholarship, does not have the functionality that scholars need. And for our subject librarian colleagues, they are also potential homes for digital collections. The scope for subject librarians working with scholars to support collaborative research is a fascinating one, touched upon by Clifford Lynch.

Libraries in the aggregate have just about worked out what their role is in respect of the record of scholarship, but they have not as yet done so in respect of these new domain venues. Yet, however well our academic colleagues have done in setting up successful services for their communities, there are still risks whose management they would concede to their librarians and archivists (and the leadership roles of Cornell in respect of arXiv and the British Library and National Library of Medicine in respect of PubMed Central are instructive). The danger is not so much that a whole service disappears, but rather that someone’s wonderful outputs will dart out of the rye and towards the edge of the cliff. How does the library world collectively ensure that these vital services are sustained, that the necessary routes to institutional repositories exist and are maintained, that preservation is allocated somewhere, that no more time and money than is absolutely necessary is spent on quality control of metadata, and that reaggregation is increasingly rational and efficient? What is the role of the ARL, RLUK and equivalent communities internationally? Of national libraries? Where, indeed, will OAIster fit in as OCLC develops its role as an international aggregator of Open Access repository content? In what forum can we get the players together to ensure we do this most cost-effectively for the scholars who increasingly depend on these services?