Articles

Going beyond Citations: SERUM — a new Tool Provided by a Network of Libraries

Authors:

Juan Gorraiz,

Christian Gumpenberger

Abstract

Citation metrics are a well-established means to assess the impact of scholarly output. With the growing availability of e-journals, usage metrics have become an interesting alternative to citation metrics; they allow for viewing scholarly communication from the user's perspective. Usage metrics offer several advantages that have the potential of enhancing existing evaluation criteria for scholarly journals. This paper suggests an approach to providing global usage metrics which is supported by libraries. The goal is to provide an analytical tool called Standardized Electronic Resource Usage Metrics (SERUM) which is comparable to the Journal Citation Reports (JCR), but which makes use of download data instead of citation data. Global download data would be obtained from the publishers, assuming they are willing to contribute in order to benefit from newly established evaluation criteria for periodicals beyond the Journal Impact Factor (JIF) and consequential strengthening of their products. An international network of libraries with a sound disciplinary coverage will be established in order to obtain, manage and check the authenticity of the data delivered by the publishers. The network will act as a clearing centre operated by independent information specialists to guarantee data integrity as well as data curation according to a standardised format. Furthermore, the network's internationally distributed members should also track and manage local usage data, reflect local trends, and relate these to the global publishers' data. In addition, a regularly updated version with specific basic usage metrics and journal rankings will be offered.

Citation metrics are a well-established means to assess the impact of scholarly output. With the growing availability of e-journals,
usage metrics have become an interesting alternative to citation metrics; they allow for viewing scholarly communication from
the user's perspective. Usage metrics offer several advantages that have the potential of enhancing existing evaluation criteria
for scholarly journals.

This paper suggests an approach to providing global usage metrics which is supported by libraries. The goal is to provide
an analytical tool called Standardized Electronic Resource Usage Metrics (SERUM) which is comparable to the Journal Citation
Reports (JCR), but which makes use of download data instead of citation data.

Global download data would be obtained from the publishers, assuming they are willing to contribute in order to benefit from
newly established evaluation criteria for periodicals beyond the Journal Impact Factor (JIF) and consequential strengthening
of their products. An international network of libraries with a sound disciplinary coverage will be established in order to
obtain, manage and check the authenticity of the data delivered by the publishers. The network will act as a clearing centre
operated by independent information specialists to guarantee data integrity as well as data curation according to a standardised
format. Furthermore, the network's internationally distributed members should also track and manage local usage data, reflect
local trends, and relate these to the global publishers' data. In addition, a regularly updated version with specific basic
usage metrics and journal rankings will be offered.

Key Words

SERUM; usage metrics; scholarly journals

Introduction: Citation and Usage Metrics — An Overview

In 1927 the Gross brothers introduced the system whereby the total number of citations (excluding journal self-citations)
are used to identify the most relevant journals in a research area (Gross and Gross, 1927). For the first time citations were used as a collection management instrument. Before that time, library acquisitions departments
relied mainly on usage data. But tracking physical usage was a cumbersome and difficult task. The data obtained in this way
were usually combined with information obtained from user surveys, although these surveys were always perceived as time-consuming
as well as annoying for the library users. Document demand patterns apparent from document delivery services were also used
for decision making in academic librarianship; the relationship between the most requested and the most cited journals was
studied e.g. by Schloegl & Gorraiz (Schoelgl and Gorraiz, 2006). The introduction of the Science Citation Index by Garfield in 1964 laid the foundation for a whole new discipline called
scientometrics. In 1972 Garfield published his article ‘Citation analysis as a tool in journal evaluation’ in Science and it attracted a great deal of attention from journal editors (Garfield, 1972). Three years later Garfield launched the first edition of the ‘Journal Citation Reports’ offering global journal rankings.
This tool, initially called ‘a bibliometric analysis of science journals in the ISI database’, introduced the following citation
metrics:

Total cites: as a measure for the citation quantity

Impact factor: as a measure for the average citation frequency

Immediacy index: as a measure for the citation speed

Cited/citing half-life: as a measure for the ageing characteristics of a subject field.

Meanwhile, citation metrics have become well established (especially in Science, Technology and Medicine) and their benefits
are clearly promoted by Thomson Reuters' JCR for the different target groups:

Librarians can manage and maintain journal collections and budget for subscriptions (…).

Publishers can monitor their competitors, identify new publishing opportunities, and make decisions regarding current publications.

Editors can assess the effectiveness of editorial policies and objectives and track the standing of their journals.

Authors can identify journals in which to publish, confirm the status of journals in which they have published, and identify journals
relevant to their research.

Information analysts (science policy makers) can track bibliometric trends, study the sociology of scholarly and technical publications, and analyse
citation patterns within and between disciplines.

The importance and necessity of analytical citation tools like JCR is supported by the launch of alternative products like
SCImago Journal & Country Rank and CWTS Journal Indicators, both attempting to compensate for the shortcomings of JCR or to include new features.

The growing availability of e-journals due to the advent of the internet and their increasing acceptance resulted in a rapid
change in the user preference especially between 2001 and 2006, as reported in several studies (Kraemer, 2006; Schloegl and Gorraiz, 2010). As a result, usage metrics were reintroduced as an interesting alternative to citation metrics. Compared to the printed
era, data collection is now much easier and faster, and usage metrics allow for viewing scholarly communication from the user's
perspective.

This paper suggests an approach to providing global usage metrics which is supported by libraries. The goal is to provide
an analytical tool called Standardized Electronic Resource Usage Metrics (SERUM) which is comparable to the Journal Citation
Reports (JCR), but which makes use of download data instead of citation data.

Usage versus Citations

Citations metrics have a few well-known disadvantages. The most important ones are:

SERUM (Standardized Electronic Resource Usage Metrics), is an initiative which was suggested in response to the lack of globally
available usage metrics for scholarly communication channels (primarily e-journals). Usage metrics definitely add a new dimension
to existing quality criteria for scholarly journals, which are currently — apart from citation metrics — restricted to peer
review, timeliness, editorial excellence, language and bibliographic information. As scholarly journals are still perceived
as the most important scholarly communication channel, SERUM's initial focus is on the journal level.

The idea of SERUM is to combine the following three major features:

providing access to consolidated global usage data of e-journals;

establishing an international network of libraries that relates local usage trends to global usage data provided by the publishing
houses;

In order to pave the way for SERUM it will be necessary to win over the publishers. Experience has shown that the big traditional
publishers are more hesitant to cooperate than open access publishers. But both categories of publishers have a need to gather
information about the usage of their products and track changes on the demand side of the market. Therefore publishers should
be highly interested in the project since:

they would benefit from newly established evaluation criteria for periodicals beyond the JIF, and

a well-used initiative like SERUM would mean consequential strengthening of their products.

How is SERUM to work? Each publisher is invited to deliver usage data from as many titles per subject field as possible. The
focus should of course be on top titles in terms of usage. From these titles, a network of libraries (see below) makes a selection
of titles to be used in SERUM and for their categorisation using the most appropriate classification scheme.

The following data would be required (all at journal level) from the publishers:

total number of downloadable items per journal

number of downloadable items disaggregated (by document types) per journal

total download counts (full-text article requests — FTAs) for the current year (independent from the publication years of
the journal)

download counts (full-text article requests — FTAs) for the current year with explicit listing of downloads per publication
year of the journal for the last 5 years

distinction of downloads according to their origin (grouping by identical IP addresses without disclosing them) ↔ total view
and geographical grouping (according to geographical distribution of the SERUM library network)

The following data could be provided optionally (all at journal level):

information about the distribution of downloads (how many articles (%) accumulate 75% of the total number of downloads, quantiles
or quartiles)

total download counts (full-text article requests — FTAs) of the retrospective 1–5 years

distinction in regard to merits of the ‘downloads’: academic vs. non-academic, etc.

The data obtained would be consolidated and managed by the network of libraries and made accessible via a cooperative website
offering further usage metrics and services based on these data (see below). This website will be primarily designed for journals
analysis, but can of course be extended to serials and e-books later on. The outcome is a new instrument for the evaluation
of electronic resources going beyond citations.

ad 2. Implementation of a Network of Academic Libraries

Single publishers are obviously not in the right position to deliver and manage overall global usage data obtained from multiple
publishing houses. Moreover, suggested metrics and services should be provided by an independent, non-biased institution,
as self-beneficial data manipulation by single publishers cannot be excluded. Therefore an international network should be
established consisting of academic libraries with a sound disciplinary coverage and a significant geographical representation.
This network would be responsible for fulfilling the following tasks:

implementation of the selection procedure for considered journal titles

categorisation of journals

management and regular update of global usage data either harvested from or delivered by publishers

SERUM intends to develop a novel approach to usage metrics. Because the majority of downloads are made in the current and
preceding year (Schloegl and Gorraiz, 2010), the classical immediacy index and the usage impact factor (UIF) seem to render less relevant results. Instead we suggest:

The journal usage factor (JUF) takes into account the reference year and the preceding two years. The JUF is defined as the number of downloads from
journal items published in the current year and the previous two years divided by the number of items published in these three
years. A three-year time window assures that a very significant number of downloads is covered in most cases.

The download immediacy (DI) will be measured as the percentage of articles downloaded in the current year in relation to the total amount of downloads.

The download half-life (DHL) is defined by the number of publication years from the current year that account for 50% of the downloads of the journal.

Further characteristics of the anticipated metrics in SERUM include:

Only downloads (full-text article requests — FTAs) will be considered, no visits or hits.

SERUM will operate at journal level and not at article level. SERUM will be designed as an instrument to reflect the usage
characteristics of a journal as an entity. The four basic indicators are DT (total number of downloads), JUF, DI and DHL.

SERUM aims to disaggregate on the basis of document type (articles, review articles, proceedings papers, letters, notes and
others); in other words, calculating JUF, DI, DHL for each document type.

If possible, SERUM also aims to disaggregate on the basis of treatment or content (e.g., theoretical, methodological and experimental)
as well as in regard to the merits of the downloads (academic vs. non-academic).

SERUM's usage metrics will work at synchronic level (= downloads are tracked from one fixed year of documents issued in two
or more publication years) as well as at diachronic level (= downloads are tracked from two or more years of documents issued
in a fixed publication year). Also, all indicators will be calculated for the current and the previous two years, and the
results will be compared.

According to the skewness distribution hypothesis of downloads at article level in a given journal, we are also interested
in other distribution parameters, like the percentage of ‘non-downloads’, quantiles or quartiles.

All indicators will be calculated as weighted indicators on the basis of unique URLs, (the origin of downloads) in order to
detect data manipulation.

Time lines and graphs will be provided for the most relevant indicators.

Increase/decrease rates will be analysed.

Once SERUM has been operational for a period of time, the metrics will be corrected in order to reflect usage practices and
traditions in the different fields and disciplines. After establishing metrics for e-journals, corresponding metrics will
be provided for e-books as the next logical step.

SERUM in the Context of Initiatives to Date

With the advent of the internet and the increased usage of e-journals, users of scholarly information migrated to the digital
environment. As a consequence, ‘publishers have inherited all the knowledge of the user the librarian once had’ (Jamali et al., 2005). Libraries now depend on the usage data provided by the publishers.

Gathering local usage statistics is a cumbersome activity due to the technical requirements involved. Nevertheless the idea
of comparing locally collected usage data to vendor-provided statistics was born seven years ago (Duy and Vaughan, 2003) and picked up by Coombs (2005). SERUM aims to revive this idea and build on the experience gained from previous studies as well as benefit from ongoing
harmonisation and improved provision of vendor statistics as a result of COUNTER and SUSHI. Like MESUR, SERUM also aims to obtain usage data from the publishers in order to derive usage metrics from these. However, other than
MESUR it is neither the objective of SERUM to create a complex semantic model of the scholarly communication process nor to
produce science maps (Bollen et al., 2009) on the basis of the established reference data set. The maps and ranking services available from ‘MESUR: science maps and rankings from large-scale usage data’ are highly complex and only insightful for a minority of experts.

strengthening the role of academic libraries by implementing the network with the suggested scope of duties

introducing new aspects in the evaluation of journals for all mentioned target groups (librarians, publishers, editors, authors,
information analysts)

separating usage from citations

focusing on simplicity and usability.

Suggested Approach for SERUM Pilot

Before all else, it is crucial to win over a few publishers. Large traditional publishers have already been approached on
the subject and have either not responded or have refused their cooperation. As a consequence it was decided to address open
access publishers instead to get the process started. Two large and one small OA publishers have responded positively and
are willing to contribute to the SERUM project. The next step will be to share and discuss the data requirements and the feasibility
of delivery.

Also, the idea of SERUM will be promoted amongst academic libraries (e.g. at conferences like LIBER) in order to identify
potential partners for the network of libraries. Apart from the University Library of the Vienna University at least two further
partners would be required for the pilot. Depending on the international reaction the requirements specification will be elaborated
and a project plan designed.

Conclusions

Quite a few metrics initiatives have focused on usage data. However, so far none of them combined all the major features of
SERUM, i.e., access to consolidated global usage data of e-journals; implementing an international network of libraries that
relates local usage trends to global usage data provided by the publishing houses; offering services like ‘usable’ usage metrics
and journal rankings.

Download statistics are not only relevant and necessary at the article level. The main scholarly communication channels themselves
need to be measured, analysed and evaluated as entities (focus on journals, but also on monographs, etc.). Obviously, the
prestige and eminence of journals is of major interest to many stakeholders in the process (librarians, publishers, editors,
authors and information analysts).

Citations alone show only a part of the whole picture and are insufficient especially in the social sciences and in the arts
& humanities. This is demonstrated by ERIH, (European Reference Index for the Humanities), a unique peer-review project in this discipline that aims to compensate for
the unsatisfactory coverage of European humanities research in existing citation indices (AHCI, SSCI). However, this peer-review
project is subject to continual revision and development, and support is highly expensive. Downloads would offer a more feasible
alternative, but usable download metrics and services for a broad audience are not yet available (MESUR has a different scope).

At this point in time SERUM is only a ‘blue sky project’. Nevertheless its benefits are obvious and easy to summarise:

Moed, H.F. (2005): ‘Statistical relationships between downloads and citations at the level of individual documents within
a single journal’, Journal of the American Society for Information Science and Technology, 56(10), 1088–1097.