Editor's Summary

The experience of Altmetric LLP, an altmetric tool developer, reveals common issues that demand attention when designing alternative metrics for response to scholarly writings. Identifying what can and should be measured for different user groups is fundamental. A default is to count all relevant mentions in a set of online sources, permitting drill down for more qualitative information. Data source selection varies by need, ranging from government documents to social media comment sites. Since the topic of discussion can be elusive, a tracking method must point backward to original articles or data. Text mining helps for text documents, but audio and video are less workable. Multiple versions of a single article and subsections of books and datasets add ambiguity and redundancy. Valid interpretation depends on context and the relevance and timeliness of data and sources, requiring continual reassessment.

Keywords

altmetrics
selection
relevance
redundancy

Bulletin, April/May 2013

Five Challenges in Altmetrics: A Toolmaker’s Perspective

by Jean Liu and Euan Adie

Driven by the development of new tools for measuring scholarly attention, altmetrics constitute a burgeoning new area of information science. It is an exciting time to be involved in the field since there are so many opportunities to contribute in innovative ways.

We develop altmetrics tools and related services at Altmetric LLP, a small London-based start-up founded in 2011
[1]. Like all developers of new altmetrics tools, we frequently encounter challenges in defining what should be measured, accurately collecting attention from disparate sources and making sense of the huge amount of compiled data. We outline five of these challenges in this piece, illustrating them with examples from our experience.

It is worth noting that the altmetrics community as a whole comes together regularly to discuss these and other issues, with two open workshops held in 2012 and more planned for the future.

1. What can and should we measure?
The term altmetrics is often used loosely to refer to all non-traditional measures of re-use, engagement and impact, though emphasis is usually placed on the latter. However, impact is a multi-faceted concept,
[2] and different audiences have their own views of what kind of impact matters and the context in which it should be presented: researchers may care about whether they are influencing their peers, funders may care about re-use or public engagement and universities may wish to compare their performance with competing institutions. Accordingly, altmetrics data and methodologies are inevitably used in a variety of different ways to suit a variety of different purposes. This situation is arguably how it should be, with each interested community deciding for itself the kinds of impact or engagement it wants to track and then cherry-picking from the available tools and data.

At Altmetric, we currently only offer one new off-the-shelf metric: we try to sum up the online attention surrounding a journal article by automatically counting all the relevant mentions from a set of online sources (covering mainstream news outlets, social media and more). We then use these counts, along with the relative influence of each source, to create an aggregate metric, called the Altmetric score
[3].

The Altmetric score is only one possible, subjective measure of online attention, and ultimately any such measure is only as good as the data upon which it is based. We encourage users to drill down into the underlying data wherever possible, and to this end, we keep a clear audit trail for any activity that has contributed to the score. All of the relevant tweets, posts and other types of mention may be viewed directly. Users are free to perform their own quantitative analyses of the data or even create new metrics and tools that are suitable to measure the specific kinds of impact that they are interested in.

Having metrics in the name suggests that altmetrics is a purely quantitative affair, but this perception isn’t necessarily the case. Arguably, the current crop of tools is best used during qualitative assessment. By looking at the underlying data, one may take relevant material into account when assessing a piece of work.

2. What sources of data should be used?
Where the underlying data of altmetrics should come from is another key challenge. Typically, different data sources are required to measure different types of impact. For example, to measure impact on policy, you may need to look at government documents. Or to look at how work has influenced practitioners, you may need to monitor the online communities in which they congregate. To see how successful public outreach has been, you may want to look at Twitter and Facebook.

Each of the currently available altmetrics tools (discussed elsewhere in this issue) measures a different, though overlapping, set of sources. This diversity is partly attributable to practical considerations, as each data source has different licensing terms, collection issues and risks associated with it. It is also partly because deciding the usefulness of any one data source remains a fairly subjective process at this point.

Further complexity is added by the fact that online attention from one data source can often be measured in many different ways. For example, quantification of the mentions of scholarly articles on Facebook could take into account either all or just public wall posts, and these posts might be further parsed into the number of wall posts with an article mention or the number of “likes” and comments on that wall post. Each number emphasizes something different and thus paints a slightly different picture of engagement with an article on Facebook.

To make it easy to mix and match data from different altmetrics tools, common standards are required; however, so far, developing these standards has taken a back seat to developing the actual tools themselves.

3. How can we identify what research outputs are being discussed?
Once data sources have been identified, an altmetrics tool must be able to map the constituent attention to specific research outputs. Current tools, ours included, typically track attention through links to articles or artifacts such as datasets and presentation slides, resolving these links to unique identifiers like a DOI, PubMed ID or Handle.

A pressing day-to-day issue stems from this reliance on links. Although most tweeters, science bloggers and digitally native media outlets diligently include direct links to the journal articles they discuss, traditional news outlets have no such standard practice. As a result, a large number of science, health and technology news reports fail to include links to the research that they mention. At Altmetric, we have circumvented this particular issue by developing a text-mining mechanism that analyzes the content of news articles. This “news tracker” retrieves relevant keywords like journal titles and author names from the text, performs a search in literature databases and then matches journal articles probabilistically with their associated news coverage.

Text-mining technology might have solved some of our own product’s issues around accurately tracking the news, but identifying research output mentions within online multimedia sources has proven to be more challenging. In podcasts and videos, direct links to research outputs are only very occasionally included in an item’s metadata. References to research tend to be made verbally, and altmetrics tools lack the capabilities and resources for analyzing audio and video content to determine what has been mentioned.

There are also understandable concerns that altmetrics may be gamed or artificially increased, either by authors engaging in excessive self-promotion or inadvertently by spammers. Right now such gaming of the system is rare, but simple to spot both algorithmically; in the case of Twitter spam, where hundreds of fake accounts will suddenly engage in meaningless, random retweets, all of the accounts are quite new, follow each other and have never mentioned a scholarly article before.

In the future, more sophisticated methods of detection will certainly be required. Here, advice from experienced groups like SSRN (Social Science Research Network) and COUNTER (Counting ONline Usage of NeTworked Electronic Resources), both of whom regularly deal with these issues as they relate to download statistics, may be invaluable.

4. You say tomato, I say tomahto
Along with the issue of missing links to papers, an opposing problem exists: sometimes different versions of the same article will appear online on multiple sites and with different identifiers. For example, the PubMed Central version of an article may have only a PubMed Central ID and the original article on the publisher’s website only a DOI, with no simple way of reconciling the two.

This scatter dilutes the altmetrics for the article, as it is split among different versions, but end-users rarely care for the distinction. It is therefore necessary for altmetrics tools to maintain mappings between different sets of identifiers or to try to automatically match bibliographic metadata to known articles in literature databases. We do both of these things at Altmetric, although items sometimes still slip through the cracks.

A slightly more complicated case is that of datasets, book chapters or other items that are related to a parent book or article. Should attention paid to a dataset be reflected in the altmetrics of the journal article describing it? What if the article is cited extensively, but was written by somebody who was not the dataset’s creator? This kind of scenario already occurs, with research data deposited in sites like figshare.com and Dryad getting their own DOIs. As such, this complication calls for flexibility from altmetrics tools.

5. How do we interpret the data?
The number of scholars who regularly discuss research using social media and/or blogs has been increasing
[4], which in turn means that the number of article mentions seen by Altmetric has also been on the rise. Since launching in July 2011, we’ve collected attention and Twitter demographic information for well over one million unique articles. Already, we have an abundance of data, which will be invaluable for determining trends in the use of particular communication channels over time. As technologies progress and the landscapes of scholarly communication and publishing change, developers of altmetrics tools need to be mindful of how relevant the collected metrics are. It is potentially dangerous to create and encourage adoption of metrics based on sources that might unpredictably cease to be relevant in the future. In other words, what is considered “significant” attention according to a specific measure today (for example, number of times pinned on Pinterest) may become much less meaningful in a few years. How should we account for this?

One approach is to always put any such metric into context. There are many potential ways to do this: we benchmark the Altmetric score (see above) based on other articles within the same journal and from the same time period, as well as across the whole database. As an example, the most popular article in the Altmetric database received an incredible amount of online attention relative to other items appearing in the same journal
(Canadian Medical Association Journal). Accordingly, the article-level metrics page included a context statement, indicating that the article’s Altmetric score “is one of the highest ever scores in this journal (ranked #1 of 940).”
ImpactStory, too, benchmarks the numbers it presents, by displaying percentiles calculated from large representative samples of articles in
Web of Science.

With no gold standard of attention to refer to, optimizing thresholds (what’s a "good" level of engagement?) and benchmarks is a big challenge for making sense of altmetrics. Establishing context by comparing article-level attention within journals or against a set of other articles is a good way to start tackling the issue of potentially changing metrics, but context could be further enriched by making comparisons across articles by the same research group or even across articles of the entire discipline. Arguably, the latter comparison would be most useful. A particle physics article that is popular among a small audience of specialists could have lower Altmetric score than an average molecular genetics article that is being discussed by geneticists and members of the general public (a broad audience), and so putting the attention in perspective would be valuable.

Academics from some disciplines may prefer to use certain communication channels over others; for instance, we see more chemists than expected actively participating in academically and professionally oriented discussions on LinkedIn. Moreover, certain disciplines, notably medicine, receive a disproportionately high volume of attention in the mainstream media, and thus, online discussions of these subject areas might include numerous non-specialist participants. Ideally, various discipline-specific norms or trends could be compiled into indicators of the typical level of attention for a particular field. Readers can then interpret the quantitative altmetrics scores in light of this typical level of attention.

However context ends up being defined, detailed records of an article’s performance (in relation to others within a similar grouping) will remain informative, even if certain metrics disappear in the future. The challenge, therefore, is to create robust, informative standards of context that can withstand minor changes in technology and online scholarly communication. Much more research on the usage of particular publishing platforms and social media networks is needed in order to construct and refine typical threshold levels of attention according to specific groupings.

Concluding Thoughts
When developing altmetrics tools, a number of important considerations must be made with regards to defining metrics, improving measurement capabilities and providing contextual details for present and future data interpretation. Altmetrics toolmakers need to be flexible enough to accommodate the needs of different communities, while still guiding people towards best practice. The use of altmetrics, however they might end up being defined and measured, gives scholars the power to showcase new and unconventional forms of research impact that have previously gone unrecognized.

Certainly an important hurdle not mentioned above has been that some have felt highly skeptical towards the utility of altmetrics. As data research further validates altmetrics as useful measurements of impact, the availability of off-the-shelf tools will also drive wider adoption of altmetrics. Consequently, increased community participation will help to inform new top-down solutions for key tool-development problems.