The Evolution of an Institutional E-prints Archive at the University of Glasgow

Submitted by editor on 8 July 2002 - 12:00am

William Nixon with some practical advice based on the Glasgow experience.

This article outlines the aims of the e-prints archive at the University of Glasgow and recounts our initial experiences in setting up an institutional e-prints archive using the eprints.org software. It follows on from the recent article by Stephen Pinfield, John MacColl and Mike Gardner in the last issue of Ariadne [1].

The Open Archives Initiative [2] and the arguments for e-prints services [3] need little introduction here and have been ably covered by previous articles in Ariadne and elsewhere. The focus here is on the implementation of an eprints.org archive in an HEI and the various decisions taken as the archive evolved from beta test site to a publicly available service.

E-prints archive at the University of Glasgow

Information Services at the University of Glasgow began working with the ePrints.org software in early 2001 when we installed the beta release and then moved to version 1.0 in April of that year. A Demonstrator version was released in November 2001 - and from April 2002 became our live service. We have not yet upgraded to version 2.0x of the eprints.org software.

These installations provided us with a range experience not just on the mechanics of the installation of the service but as a sounding board for beginning to explore the range of cultural and technical issues which need to be addressed to ensure that our institutional service is relevant and is used.

We have begun with some 30 papers, pre-prints and reports in the archive. We are lining up a considerable collection of additional content from published papers and technical reports to working papers and theses which will be added to our e-prints service. We have found that many departments are interested in making working papers and technical reports available. An excellent example of such a service is the recently launched eScholarship repository [4] by the California Digital Library at the University of California. Its initial focus is on working papers in the Humanities and Social Sciences at the University of California. This service provides us with an excellent showcase to demonstrate the international interest in this area and the potential for institutional archives to effectively disclose scholarship.

Aim of the e-prints archive at Glasgow

The aim of the e-prints archive at the University of Glasgow is to provide an effective [further] means of ensuring the disclosure of, and access to the scholarly work and research of this institution. The term "scholarly work" is intended to include peer-reviewed journal articles as well as theses, chapters, conference papers and grey literature such projects reports.

Our core range of goals includes:

Contribute to the liberation of scholarship through the development of Open Archives services locally and nationally

Disclose the scholarship of the University

Published (peer reviewed) papers

Pre-prints, working papers and technical reports

Grey literature

Theses

Getting the mix right

Our experience at Glasgow has shown that there are two key elements required for the development and launch of successful Open Archives services:

The support, endorsement and most critically, the content produced by our academic colleagues and partners

The resources [staff, equipment, expertise] to ensure that it is developed, marketed and launched properly

Implementation

The implementation of our eprints.org archive has been a joint piece of work between the Computing Services department and the University Library. The service is hosted on a Computing Services server and was installed by Computing Services staff. The Library has been responsible for its "look and feel", administration and promotion. This split seems to have been very successful and the technical expertise which Computing Services could provide was essential in getting the archive up and running. This technical expertise has included: Unix Solaris, Perl, MySQL and Apache.

The eprints.org v.2.0 documentation notes:

"Setting up an archive is not a trivial task. The biggest problem is actually deciding what you need. I would suggest setting up a demonstration version very close to the default configuration and having the manager/committee who [will] decide on what is needed have a quick play with it and comment what should be added or removed." [5]

This was very much our experience with version 1.0x and the reason we went through a number of revisions of our archive from the early beta release to the version which is now live. The pilot service was made available to a variety of information Services staff for comment and they were provided with accounts to deposit papers and provided us with our initial feedback.

The installation of the simple default version of eprints.org is straightforward but many sites want much more than this and the recent volume of traffic on the eprints-tech [6] list is testament to this.

[Figure 1: University of Glasgow ePrints Service]

Version 2.0 of the eprints.org software was released in March 2002 and has been substantially rewritten, so much so that there is no upgrade path to version 2.0 and eprints.org recommend a clean install. With our live service in place we are now looking at the implications of migrating our archive to eprints.org v.2.0 and making the transition as seamless as possible for our users. The user records and content will be imported into a new eprints.org archive which will be switched over to the same web address.

Metadata decisions

The eprints.org software is OAI compliant and will produce the necessary Dublin Core metadata for harvesting by service providers, it is however up to individual implementers to make decisions about document types, formats and subject schema.

Library of Congress Subject Headings

The most fundamental change we made to our archive was to create a subject listing based on the Library of Congress Classification Outline [7]. The pilot service subject listing was based solely on our faculties and departments and while we felt that this would be very useful for local use we wanted to use an established subject scheme which would provide some degree of consistency for future cross-searching. We already use Library of Congres Subject Headings (LCSH) in our library catalogue and have classification tools such as Classification Plus available so LCSH seemed the logical choice.

The broad multi-disciplinary nature of our archive limited us, on the whole to the main classes from A General Works to Z Bibliography. Library Science. Information Resources and the main subclasses beneath them. This was done to ensure that the listing did not become to complex or overlong. It was necessary in some instances though to go a bit deeper to ensure that various disciplines such as Computing Science, which is classified under Mathematics were represented. We reviewed the University's list of departments and also added the necessary subject headings from LCSH, for example RC1200 Sports Medicine and RC0254 Oncology, both of which are part of RC Internal Medicine. It is intended that these additional sub-headings will make it easier for individuals to select an appropriate subject when they come to self-archive their paper. They will also have the opportunity to suggest additional subjects which they would like to see added.

[Figure 2: Library of Congress Subject Headings]

Faculties and Departments

After the implementation of the LCSH listings it became apparent that it would still be very useful to provide a listing of papers deposited by Faculty and Department. The clustering of papers by department provides a ready reckoner to the number of papers an individual department/faculty deposited.

eprints.org v.1.0 can only support one subject listing (which we are using for LCSH) so we created a parallel web page of papers arranged by faculty. This had links to pre-configured searches which would list the papers deposited by faculty. All of the papers which are deposited into our archive must be associated with a faculty and we provide a drop down box of these for the user to select from. This was made a mandatory field and a paper will not be accepted for deposit unless a faculty is selected.

[Figure 3: ePrints by Faculty]

eprints.org v.2.0 provides support for multiple listings so that this index will be much easier to manage.

Look and Feel

Pinfield notes that "It will be interesting to see how institutional archives begin to appear. Institutional design policies (which often promote the use of graphics and Java-script) are likely to have an influence on institutional e-print server presentation and lead to a departure from the arXiv ethos." [8]

Institutional design polices have had an impact on the "look and feel" of our archive and we have tried to ensure that it has the same consistent look and feel as our existing resources such as the catalogue. The ePrints home page reflect this and mirrors the the layout and options available from the the Library's Information gateway MERLIN [9]. We embedded the Title / Author / Keyword search box in the ePrints home page and, with a small piece of Java-script had the cursor automatically appear there to enable users to start searching immediately. The intention here is ensure that the "look and feel" is so familiar to our users that it becomes transparent and enables them to focus on searching for the content and not re-learning the interface.

We are also exploring value-added services such linking records for material in our catalogue (Innopac) to the full text version available in our ePrints archive. This will enable us to provide full text links, in particular to our theses which are all listed in our catalogue.

[Figure 4: Link to ePrints from the library catalogue]

Additional information fields

The decision to offer a mix of different e-prints spanning published (peer reviewed) material to project reports and theses presented us with the issue of how best to allow users to easily identify the kind of content which was on display and its status e.g. Published, In Press or Unpublished. The archive can be searched by publication status or document type but this information was not displayed in the record screen. The flexibility of the eprints.org software enabled us to add these fields to ensure that they are displayed.

[Figure 5: Record display with status and ePrints type fields]

These are all elements which will be ported to our eprints.org v.2.0 archive.

OAI Registration and Compliance

A key component in making our service publicly available was to ensure that it was OAI compliant and registered as an OAI compliant data provider. We registered in November 2001 as "glasgow" and this has enabled us to subsequently register with individual service providers such as ARC using our OAI registered name.

This has enabled us to demonstrate the real, cross-archive searching potential of our archive to members of the University and to provide a wider research context beyond Glasgow.

Self-Registration and self-archiving

At the time of writing our ePrints archive has some 30 registered users, some of whom have self-archived material in our archive and some who have set-up e-mail alerts. A key component in moving from the Demonstrator version to "live" version was that users should be able to self-register. We accept that in an institution as diverse as Glasgow, as in other HEI's not all of our users will wish to self-archive their material for a variety of reasons from time constraints to technical ability but we felt that it was vital to provide this as an option.

Our early adopters (in disciplines such as Music, Life Sciences and Computing Science) have had little difficulty in uploading their material into the archive, in a range of formats and they have been very positive about the experience. Enabling self-registration and self-archiving has given us into an insight into the administrative implications of managing our archive.

In the pilot phase, the only content added was by staff responsible for the archive and the focus was on some initial content. Now that the self-archiving is available it is necessary to check if content has been deposited into the submission buffer on a daily basis (content can only be made publicly available by the e-prints administrator) - at present there is no e-mail notification when material has been deposited for review.

The ePrints Administrator will review the submitted paper and approve or reject it. They will also enhance the record as appropriate with additional subject headings, keywords or further information.

In eprints.org v.1.0 anyone can register with the archive and then submit content. We had one instance of papers being submitted from a non-University of Glasgow member and we had to respectfully decline them. e-prints.org v.2.0 will allow us to manage this much better since it will provide us with more control over who can submit papers to the archive.

We have also had some very keen colleagues suggest publisher produced PDF files which we also cannot accept until we have had approval from the publisher. These are examples of some of the cultural and organisational issues which must be addressed as the service becomes more widely used.

Feedback from staff using the service, for the purpose of depositing material has been very positive and has driven the addition of new document formats such as XML or ways to handle the deposit of multiple PDF files which need to be linked together. This was done by adding an intermediate HTML file to act as an index.

[Figure 6: Multiple document formats]

[Figure 7: Multiple PDF files with an HTML frontpage]

The role of the Library

At the University of Glasgow, the Library is the standard bearer for the advent and implementation of e-prints archives and Open Archives services. To ensure the successful implementation of this service the Library has a number of distinct roles beyond its technical provision and maintenance. These include:

Encouraging members of the University to deposit material into the ePrints archives. At Glasgow we have started an Advocacy campaign to demonstrating that this service has a broader context beyond Glasgow [10]. A recent event to raise awareness about the issues of Scholarly Communication provided us with an opportunity to launch our e-prints service and to raise its profile

Providing advice to members of the University about copyright and journal embargo policies for material which they would like to deposit in our archive, and as appropriate liaising directly with the Journal in question. This will become a pivotal role in the acceptance of our e-prints service since copyright is the number one question which members of the University ask about

Converting material to a suitable format such as HTML or PDF for import into the archive. It may also be necessary to ensure that HTML which is submitted is properly formatted and cross-browser compatible

Depositing material directly on behalf of members of the University who do not, or cannot self-archive their material. In instances in which we have deposited papers on behalf of individuals, we have created a new account for them and used that to submit their content. This has allowed us to take advantage of the eprints.org feature of displaying the number of paper an individual has in the archive. Individual can then use this to embed links on their own home or departmental web page which will list their deposited papers. When the paper is submitted the individual is notified and we provide them with their account details which they can use to set-up e-mail alerts or to submit further content themselves

Reviewing the metadata of content which has been self-archived to maintain the quality of the record and to add any additional subject headings and keywords as appropriate.

Beyond e-prints: FAIR and DAEDALUS

The JISC-funded FAIR (Focus on Access to Institutional Resources) Programme was announced in January 2002 "to fund a number of projects to support access to and sharing of institutional content within Higher Education (HE) and Further Education (FE) and to allow intelligence to be gathered about the technical, organisational and cultural challenges of these processes." [11]. FAIR was inspired by the vision of the Open Archives Initiative and will fund 14 projects with partnerships across 50 institutions [12].

The University of Glasgow has been awarded funding for a three year FAIR Project entitled DAEDALUS (Data Providers for Academic E-content and the Disclosure of Assets for Learning, Understanding and Scholarship. DAEDALUS will build upon our experience in setting up this initial e-prints archive and will establish a network of Open Archives compliant University of Glasgow Data Providers. These providers will be used to unlock the scholarly output of the University and will include:

Published (peer reviewed) papers

Pre-prints, working papers and technical reports

e-Theses

Research Resource Finding Aids (Archives and Special Collections)

Institutional Management, Policy and Regulatory documents

In addition to registering these repositories with the increasing number of service providers available we will also set-up an institutional service provider which will enable members of the University and the wider academic community to search across these resources.

DAEDALUS will explore a range of OAI-compliant software packages in addition to eprints.org and examine the technical, cultural and organisational issues involved in their implementation.

Conclusion

Philip Hunter talked about an e-Prints Revolution in the last issue of Ariadne [13] and the technical revolution is here. The OAI have just released version 2.0 of the Open Archives Initiative-Metadata Harvesting Protocol (OAI-PMH) [14] and there is an increasing range of software choices beyond eprints.org for setting up institutional archives.

The challenge, ultimately will not be the technical implementation of an e-prints service but rather the cultural change necessary for it to become embedded and commonplace in the activities of the institution. That change will be assisted however by national programmes such as FAIR and international declarations such as that of the Budapest Open Access Initiative [15].

At Glasgow the development of our e-prints service has been incremental but we have made steady progress and have been encouraged by the enthusiasm for such an archive which our early adopters have shown. We will build on this initial service and, with DAEDALUS will implement a range of new services and, more importantly continue to nurture an e-prints / open access culture.