Rethinking collections - Libraries and librarians in an open age: A theoretical view

Open access, one of the most important of the potentials unleashed by the combination of the electronic medium and the World Wide Web, is already much more substantial in extent that most of us realize. More than 10 percent of the world’s scholarly peerreviewed journals are fully open access; this does not take into account the many journals offering hybrid open choice, free back access, or allowing authors to selfarchive their works. Scientific Commons includes more than 16 million publications, nearly twice as much content as Science Direct. Meanwhile, even as we continue to focus on the scholarly peerreviewed journal article, other potentials of the new technology are beginning to appear, such as open data and scholarly blogging. This paper examines the library collection of the near and medium future, suggests that libraries and librarians are in a key position to lead in the transition to an open age, and provides specific suggestions to aid in the transition.

How much open access is there already?

More than 10 percent of the worlds scholarly, peerreviewed journals are fully open access. As of August 2007, the Directory of Open Access Journals (DOAJ) lists over 2,800 open access journals [1]. Estimates of the number of peerreviewed journals in the world vary, ranging from about 20,00025,000 titles [2]. Regardless of the estimate employed, the open access journals listed in DOAJ exceed 10 percent of the worlds scholarly, peerreviewed journals. DOAJ is also growing rapidly, consistently adding, on average, more than one new title per calendar day [3].

This does not tell us the total percentage of open access articles. There could be differences in size between the average open access and tollaccess journal. The total percentage of open access at the article level is difficult to estimate. One intriguing comparison: according to the Elsevier Science Direct Web site [4] as of July 2007, there were 8.4 million items in Science Direct, which Elsevier claims is over 25 percent of the worlds STM literature. On the same day, the open access initiative Scientific Commons [5] listed 15.4 million items, close to twice as much as Science Direct.

This raises the question: is it possible that we are approaching the point where more than half the worlds scientific literature is already open access? This possibility seems astonishing and unlikely, even to this open access advocate and perennial optimist. However, the number is sufficiently intriguing to warrant further investigation; a quick glance at the contents of Scientific Commons would appear to indicate a very scholarly set of contents, with a high percentage peerreviewed and open access.

Table 2: Number of articles/items in Science Direct (tollaccess) and Scientific Commons (open access) as of July 2007

There are more than 800 open access archives (institutional or disciplinary repositories) [6]. A quick glance at a few archives will serve to illustrate just how many open access resources there are at present. The largest archive, with more than one million articles, is PubMedCentral [7], managed by the National Center for Biotechnology Information in the U.S. National Library of Medicine.

Many research funding agencies, particularly agencies supporting medical research, have recently either adopted open access policies, or are considering adopting or strengthening their open access policies. Suber (2006) discusses one particularly active month (October 2006):

Weve never had a month like October 2006. Depending on how you count, more OA mandates came into being in October 2006 than in all previous months combined. I count six adopted mandates, two proposed mandates, two adopted nearmandates, and one adopted mandate limited to data. That comes to eleven actions in five countries (U.K., Austria, Canada, the U.S., and China). [8]

When these open access policies come into effect, the growth rate of PubMedCentral is likely to accelerate.

arXiv.org is the worlds oldest and second largest open access archive, with over 400,000 fulltext articles as of July 2007. The usage of arXiv is notable. Participation in selfarchiving of preprints approaches 100 percent in some subdisciplines, such as high energy physics. Download statistics exceed half a million items per day, over 45 million fulltext downloads per year [9].

Something interesting to consider: as might be expected, the physics community fully participates in peer review. However, by the time an article is published, it has already been read and often commented on. This prepeer review reading may be improving and simplifying the peer review process; for example, one author reports that posting to arXiv resulted in comments from experts. After considering these and revising the article accordingly, little further change was needed during the peer review process. arXiv is managed by Cornell University Library.

An important, and likely growing, content area for repositories is open data. One example of a repository illustrating the already substantial contents in this area is Pangaea [10], the Publishing Network for Geoscientific & Environmental Data, with over half a million records. As described on the Pangaea Web site, Pangaea is a public digital library for science aimed at archiving, publishing and distributing georeferenced data with special emphasis on environmental, marine and geological basic research. Data can be retrieved by the PANGAEA search engine or through links on Web pages.

The CERN Documents Server [11], managed by the CERN Library, is an excellent example of an institutional repository, with over 800,000 bibliographic records, including more than 360,000 fulltext documents. CERN also hosts a different kind of collection, likely to become much more prevalent in the near future: webcasts of the Open Archives Initiative (OAI) series of conferences.

Clearly, open access resources are already very substantial, and libraries and librarians are playing key roles in most major open access initiatives.

The changing scholarly information landscape

Open access is one of the key elements in transitioning to the world of collections of the future, but it is not the only one. The formats and types of information that libraries are, or should be, collecting, are growing in numbers and types. Information itself is changing, too.

Questions about whether the journal, or even the article, will continue in the electronic era are not new. Odlyzko, in 1995, predicted the demise of the traditional scholarly journal within 1020 years [12]. The journal, with its bundling of articles in issues within a given size and in a particular format, makes sense for a print environment. This bundling represented efficiencies in printing and postage costs, and for the reader with relatively few reading options, some regularity in supply of scholarly reading materials.

In the electronic environment, while there may be reason to continue bundling articles into journals, primarily for branding purposes (known quality), it does not necessarily make sense to continue bundling into volumes and issues, particularly not issues of equal size, particularly not in an open environment where subscription sales are not dependent on delivering a certain quantity of content. Not bundling into issues of equal size means that publication of articles need not be delayed, nor articles solicited, in order to fill issues.

Even though the peerreviewed article, preferably in a highimpact journal, continues to be the focus for tenure and promotion, the tremendous potential of the new media is already drawing scholars to experiments with new forms of scholarly communication.

One example is the scholarly blog. Peter Suber is an academic renowned internationally for his work in the area of open access. Many consider Subers Open Access News blog [13] to be the most authoritative, comprehensive source of information on this topic. Subers analysis of open access initiatives, whether on his blog or in the SPARC Open Access Newsletter, represent scholarly communication at its best in my opinion.

If Peter were to abandon his blog, or even reduce the time spent on his blog, in order to prepare articles for the traditional peer review process, what would happen? Would we have more knowledge about open access, or even more authoritative information about open access? In my opinion, if Peter Subers focus were the peerreviewed article rather than blogging, we would be waiting longer for less knowledge about open access. It is hard to keep up with Open Access News; however, without this blog, it would be even harder to keep up with open access initiatives around the world. Fewer people would have the knowledge to write with authority in this area. Growth of knowledge about open access would likely slow down, and, most likely, so would the open access movement, too.

My own experience as a serious scholarly blogger in the areas of open access, scholarly communications, and information policy, supports this view. While I actively participate in the peer review literature system as author, reviewer, and editor, and appreciate the value of this form of quality control, in these rapidly evolving areas timeliness of presentation of ideas, research, and analysis is more important. Many of my most important contributions to the debates surrounding open access, for example, are posted to the Imaginary Journal of Poetic Economics[14], or to a listserv. These contributions may or may not be included in peerreviewed literature at a later date.

If libraries focus solely on collecting peerreviewed or formally published literature and not blogs and listservs, some of my best writings, and some of the ideas contained there and not expressed elsewhere, are likely to be lost. One example is my Transitioning to Open Access Series[15]. I am very aware that many people are working on figuring out how to transition to an open access scholarly publishing system. As soon as I have an idea that might be helpful, it is written up and posted to the Imaginary Journal of Poetic Economics (IJPE). Some of these posts are read by others, and have influence. They may, or may not be, included in a formal publication at some point in the future. If libraries do not take the lead in preserving information like this, it is at risk of being lost. Northwestern University Libraries has taken an interesting step towards collecting this blog, in that they have created a catalogue record for IJPE[16].

arXiv began adding blog trackbacks in 2005. Paul Ginsparg wrote

The underlying idea is to replicate in some online form the common experience of going to a meeting or conference, and receiving from a friend/expert some informal recent research thoughts and an instant overview of a subject area. [17]

Collections of the future

Digital heritage collections

Digital heritage collections are a focus area for many libraries. These collections represent an opportunity for libraries to greatly expand access to special and archival collections, as well as to play a different role in providing access to historical information  from primarily collecting secondary sources to identifying, preserving, and making accessible primary materials in digital form. One example of leadership in this area is the AlouetteCanada Open Digitization Initiative, which

Imagines a Country where every citizen has the opportunity to access its online cumulative digital heritage, and which is able to harness the will and energy of every library, archive, gallery, museum, and historical society or institute of record to create a comprehensive collection of digital resources for the benefit of its citizens ... . [18]

Library collaborations are not new. What is new about this form of collaboration is its multisectoral nature, the partnerships between different types of libraries and other organizations, and its scope. Library collections work of the future may, increasingly, be a collaborative venture.

Figure 1: The item of the future.

The discrete item  the book, the journal article  is becoming less and less relevant in todays interconnected world. The collection of the future may be a collection of collections of interrelated and/or interlinked items. For example, with the electronic medium, it is physically easy for authors to create collections of their own works, whether in an electronic portfolio, their personal Web page, or through open access repositories. It seems obvious that this feature will be highly desired by authors, and libraries and IR repository software developers would be well advised to design systems with this feature in mind.

Open access means that an individual work could easily become a compilation of both the work itself, and works cited; the actual works, that is, not just citations or links. This has benefits for the author and future readers, in providing a direct and immediate link (and additional preservation as well) to cited works.

This has benefits for students and teachers, too. A future research paper might be handed in as a collection of works  the students own paper, and fulltext of works cited. Perhaps this would help to combat plagiarism? It is much easier to plagiarize, when the teacher does not have ready access to the cited works.

For the library, what this means is that collections work will gradually need to shift from a focus on discrete items, to a focus on comprehensive collections and links both within and outside of collections.

Key issues for collections in the future

One of the key issues for library collections in the future is preservation of electronic collections in many formats, as well as preservation of links and linked items. New selection criteria will be needed, based on priorities for ensuring access and preservation rather than purchase for use.

Ensuring ongoing access is an important, and perhaps overlooked, element in selection of electronic resources for collecting and preservation. A collection of resources housed in another country may be readily accessible over the Web today; however, it would be prudent to assume that circumstances such as natural disasters, wars, or economic sanctions, could result in loss of access if we rely exclusively on access provision by a resource provider in another country. The only way to ensure ongoing access to the worlds literature is to archive it locally.

Our collections need to support new kinds of search tools to facilitate new approaches to research, based on artificial intelligence and data mining. This means that our collections need to be open to use, as defined in the Budapest Open Access Initiative:

By ‘open access to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the Internet itself. [19]

Controlled vocabulary and subjectspecific collections, which, among other things, enhance searchability, are likely to be of continuing importance, as illustrated by the popularity of subject archives such as PubMedCentral, arXiv, rePEC, and ELIS. Controlled vocabulary is an emerging issue in metadata for institutional repositories as well.

Given the wealth of information readily available, the basic reference question is likely to shift from can you help me find some information about ...? to questions like, how do I evaluate the information I have found? or how do I make sure I dont miss anything important?

Consider, for example, the graduate student with an obligation to conduct comprehensive research. The larger the research literature grows (and, it is growing fast!), the more work it is to conduct a comprehensive literature search on any given topic. The options for graduate students of the future are to continue to seek smaller and smaller areas to research or to find new means of research that combine some higher level searching, perhaps with the aid of artificial intelligence. As more and more information is produced, information literacy will increase in importance.

Transitioning to open access

Open access is not the only potential of new technology; however, it is absolutely key to unleashing the full potential of new media. We need open access to enable new forms of research, and new types of collections. Open access is optimum for scholarship and, ultimately, more affordable (it costs money to keep readers out, and even more to overcome barriers to let selected readers in, through authentication). Open access is no longer a new, unproven idea, but rather a global movement with very substantive resources already in place, and rapidly growing.

One simple way to begin to transition is to reconsider, and if necessary, revise, the librarys vision or collections policy statement. A simple vision statement, reading something along the lines of, The purpose of our library is to support the scholarly communications needs of faculty and students, may be sufficient to set the direction for a successful transition to open access.

Once we see that open access is the optimum for scholarship in many senses, and that our purpose is to support scholarly communications needs, it is easy to understand that it makes sense to prioritize economic support for the future, rather than for the past. Instead of wondering where we will find extra dollars for open access initiatives when we have so many commitments to purchasing subscriptions, it becomes easy to see that we may not be able to afford the subscriptions until we have made the commitments we need to make for an open access future.

Transitioning journals

Many libraries are providing free or lowcost hosting and basic technical support for open access publishing by faculty members. Simon Fraser University Library is a partner in the Public Knowledge Project, which develops the popular free, open source journal publishing software Open Journal Systems [20] and hosts more than 70 journals. Athabasca University hosts a number of journals, and is planning to become an allopen access university press. At the First International PKP Scholarly Publishing Conference, we heard about the National Library of Australias Open Publish program, support for open access biomedical journals at the Indian Medlars Centre, Open Journal Systems (OJS) at the University of Alberta Libraries, Newfound Press, the digital imprint of the Tennessee University Libraries, among others [21].

One way for libraries to support new publishing initiatives is to encourage and support, or develop, publishing cooperatives as described by Raym Crow [22]. The idea of the cooperative, based on disciplinary affiliation, is to share the expenses of development and expertise, allowing smaller publishers to enjoy efficiencies of scale equivalent to those of larger publishers (e.g., sharing of development costs, risks, expertise), while still fully retaining their independence.

Libraries can assist in the transitional process through licensing activities, for example by negotiating rights for their own authors to retain copyright through an Authors Addendum [23]. Hybrid publishers are beginning to see noticeable revenue from open choice programs. For example, Oxford University Press recently announced price decreases for some of their hybrid journals, and the American Physiological Society indicated an update of up to 18 percent for some of their open choice journals [24]. It just makes sense for libraries to begin seeking combined subscriptions/open choice licenses. Libraries should also consider asking for quality and quantity protection clauses when negotiating purchase of journals not in active transition to open access, particularly when negotiating longterm offers, because, as more and more research funders require open access, and as researchers awareness of the benefits of open access increases, one wonders what a strictly tollaccess publisher will be publishing a few years down the road. Also, support for journals in transition to open access can be prioritized; for example, if it is necessary to cancel journal subscriptions, all else being equal, keep the journal that does allow selfarchiving over one that does not.

Economics: Cost per article

If the average subscription cost is the key to understanding the economics of serials in the realm of subscriptions, in open access the key to cost effectiveness is average cost per article. It is important to recognize that most OA journals do not charge article processing fees. In fact, publication charges are more likely with subscription than OA journals [25].

For those journals, which do charge article processing fees, supporters can include funding agencies, departments, libraries, and other possibilities such as special university budgets or subsidy (government, membership fees).

Article processing fees (APF) & library coordination

Coordination of article processing fees collection and payment, rather than oneoff payments, will come to be seen as a needed efficiency for research producers and open access publishers, in my opinion. Why should libraries be involved? One reason is our knowledge, about publishing, quality and open access. Libraries are in the best position to facilitate the transition from subscriptions to open access. For example, it is when libraries collect and pay article processing fees, regardless of who the payer is, that libraries are in the best position to understand the revenue publishers are receiving from this source, and the discounts from subscriptions they should expect. Leading in this arena is an excellent means for libraries to position themselves for an open access future.

As for providing the funds, there is much to be said for faculty or departmental involvement. Faculty awareness of the costs of publishing may be the most effective way to insure market incentives to moderate costs. For example, if a faculty member has access to a discretionary fund from a research grant, perhaps for research dissemination of $3,000, which could be used to pay article processing fees, attend a conference to present a paper, or produce a video to illustrate the findings, then the faculty member has significant incentive to seriously consider highquality but lowercost publishing solutions than would be the case if the cost of publishing was borne entirely by either the research funder or the library.

Advantages of paying from the library budget include maximizing the likelihood of faculty acceptance, and facilitating a gradual transition from subscriptions to open access.

A combined costsharing approach might be optimal, with capped article processing fees and/or a sliding scale based on the amount of the APF, could have the advantage of maximizing faculty acceptance, while still ensuring market forces to moderate costs, by ensuring faculty are aware and involved in decisionmaking and funding whenever costs become excessive.

Here is one example of a sliding scale approach to APFs:

APF up to $750: library pays 100%APF up to $1,500: library pays 60%APF up to $2,000: library pays 50%

This particular model assumes that the librarys contribution is capped at $1,000.

Reorganizing for change

The sooner libraries begin to reorganize for change, the better. One suggestion is to engage staff by helping them to envision exciting new roles for themselves in an open access future. Reference/liaison roles are already being expanded to include scholarly communications; anecdotal reports suggest that this enhances the quality of the relationship and communications between librarians and faculty, involving libraries more closely in a key priority for faculty members, their own research. Collections and electronic resources staff will be the front lines in the transition, managing the economics and technology of change. Copyright officers could become authors rights consultants. Interlibrary loans staff, with their experience with careful verification and handling of information and invoicing on an itembyitem basis, have the qualifications needed by staff coordinating article processing fee payment or working with institutional repositories.

Conclusions

The time is ripe to rethink collections. The universe of information has already changed significantly in the Internet age, with open access journals and archives already playing a key role in scholarly communication. More change is to be expected as we continue to explore the full potential of new media. The library collection of the future may include whole collections of digital documents, files, data, and links, with less emphasis on individual items.

Libraries can play a key role into the future. Change can begin with something as simple as revising the librarys vision statement. Libraries should support transition towards open access, employing suitable cautions, such as ensuring that market forces will be in play to moderate the average perarticle costs, to ensure a costeffective scholarly communications system, and ensuring that true open access is supported, not just free access. Libraries can play a vital role in supporting the publishing efforts of their faculty, for example by hosting and providing basic technical support for journals. It is not too soon to begin reorganizing for change. Current library staff have skills that will be needed in institutional repositories and the new world of collections; the best approach is to engage staff as soon as possible, to help them envision themselves in an open access future, so that they can help us all to figure out how to get there.

About the author

Notes

1. The Directory of Open Access Journals (DOAJ) is a librarianvetted list of currently published, fully open access journals (no embargo period), in any language. As of 17 August 2007, there are 2,802 journals listed in the DOAJ, and 59 titles were added in the last 30 days; see http://www.doaj.org.

2. The precise number of peerreviewed scholarly journals published around the world is difficult to establish. Current best estimates range from 20,00025,000 peerreviewed journals in the world. For example, Crow cites 20,000 active, peerreviewed journals (Crow, p. 4). According to Uhlrichs Periodicals Directory (http://www.ulrichsweb.com), on 27 July 2007, there were 23,488 active, academic/scholarly, refereed journals in the world.

6. The author prefers the term open access archives, reflecting what the author perceives should be the purpose of the archives: true open access and archiving in the sense of preservation. This is much clearer terminology than institutional repository, which tends to permit the inclusion of the concept of information available only within an institution, and does not specify that the information should be preserved. On 17 August 2007, OpenDOAR, an authoritative list of academic open access repositories, lists 927 repositories at http://www.opendoar.org/.

9. arXiv.org download statistics on the main server at Cornell University Library alone are in the range of half a million per day; for example, daily usage statistics on 27 July 2007, up to 19:15 EST, were 412,475. The figure of 45 million fulltext downloads per year is from Paul Ginspargs NextGeneration Implications of Open Access, CTWatch Quarterly volume 3, number 3 (August 2007), http://www.ctwatch.org/quarterly/print.php?p=80.