Measuring Up: Impact Factors Do Not Reflect Article Citation Rates

This special blog post is co-authored by PLOS Executive Editor Véronique Kiermer, Université de Montréal Associate Professor of Information Science Vincent Larivière and PLOS Advocacy Director Catriona MacCallum. It accompanies the posting on BioRxiv of a research paper on citation distributions.

Journal-level metrics, the Journal Impact Factor (JIF) being chief among them, do not appropriately reflect the impact or influence of individual articles—a truism perennially repeated by bibliometricians, journal editors and research administrators alike. Yet, many researchers and research assessment panels continue to rely on this erroneous proxy of research – and researcher – quality to inform funding, hiring and promotion decisions.

In strong support for the shedding of this misguided habit, seven journal representatives and two independent researchers – including the three authors of this post – came together to add voice to the rising opposition to journal-level metrics as a measure of an individual’s scientific worth. The result is a collaborative article from Université de Montréal, Imperial College London, PLOS, eLife, EMBOJournal, The Royal Society, Natureand Science, posted on BioRxiv this week. Using a diverse selection of our own journals, we provide data illustrating why no article can be judged on the basis of the Impact Factor of the journal in which it is published.

The article presents frequency plots – citation distributions – of 11 journals (including PLOS Biology, PLOS Genetics and PLOS ONE) that range in their Impact Factor from less than three to more than 30 (the analysis covers the same period as the 2015 Impact Factor calculation.) Despite the differences in Impact Factors, the similarities between distributions are striking: all distributions are left-skewed (a majority of articles with fewer citations than indicated by the JIF) and span several orders of magnitude. The most important observation, however, is the substantial overlap between the journal distributions. Essentially, two articles published in journals with widely divergent Impact Factors may very well have the same number of citations.

Share and share alike

By publishing this data, we hope to strengthen a call for action originally voiced by Stephen Curry, one of the authors, and to encourage other journals to follow suit. In the spirit of this call we present below the plots for all seven PLOS journals [see Fig. 1]. Needless to say, there are no surprises. Despite widely different volumes, all distributions show a marked skew to the left (low citations) with a long tail expanding to the right (high citations)—a pattern obscured by use of the JIF.

Fig. 1: Citation Distributions of the PLOS Journals. Citations are to ‘citable documents’ (as classified by Thomson Reuters), which include standard research articles and reviews; distributions contain citations accumulated in 2015 to citable documents published in 2013 and 2014. Data was extracted using the “Purchased Database Method” detailed in the V. Larivière et. al. BioRxiv article. To facilitate direct comparison, distributions are plotted with the same range of citations (0-100) in each plot; articles with more than 100 citations are shown as a single bar at the right of each plot. Copyright held by Thomson Reuters prohibits publication of the raw data but aggregated data behind the graphs is available on Figshare.

We do not deny there are differences among journals, which reflect the different article types, editorial criteria, scope and volume of each publication. These effects are notable, for instance, when considering PLOS ONE—where articles are selected on the basis of being technically sound and robustly reported rather than on perceived impact or general interest. Such criteria enable the publication of small studies or those with negative, null or inconclusive results, which might not garner many citations but are crucial in mitigating against publication bias. The journal scope comprises disciplines with different citation habits and niche areas of research as well as social sciences, where citation rates are typically lower. And since volume of publication is not artificially limited, these factors provide an explanation for the relatively higher number of articles with few or zero citations (a similar explanation can account for the distribution of citations in Scientific Reports). This does not mean that PLOS ONE (or Scientific Reports) does not publish many highly-cited articles. To the contrary, a 2013 study indicated that PLOS ONE publishes its fair share of the top cited papers in the literature (relative to the number of papers it publishes). In the 2015 distributions, the volume of papers making up the high-citation tail of the PLOS ONE distribution is again substantial.

The sway of influence

What motivates our initiative to raise awareness is that despite calls to the contrary, the JIF remains a prevalent tool in evaluating scientists. Often it comes down to convenience, lack of time and appropriate alternatives, but it is also a question of culture. The misuse of the Impact Factor has become institutionalized in the research assessment methods of many universities and national evaluation panels, leading to a perverse incentive system.

For researchers, the career advancement and reputational reward of ‘aiming high’ when choosing a journal is too great to ignore, even when the consequences are to work one’s way down the Impact Factor ladder one step at a time, rejection after rejection. This sequential submission pattern not only puts an enormous burden on journal editors and reviewers, it also causes unnecessary and unacceptable delays in making results available to the wider scientific community and the public. Worse are the stories of researchers who feel compelled to alter their experimental or analytical approach to make the manuscript more attractive to one journal or another. The profound consequences are manifest in other ways—a strong disincentive to pursue risky and lengthy research programs, to publish negative results or to pursue multidisciplinary research. They also provide a potent motive to flood fields that are already over-crowded and entrench a hypercompetitive system that increasingly disadvantages graduate students and early career researchers.

Action in process

There is no escaping the fact that a paper can only be properly evaluated by reading it. However, there are tools to help filter the scientific literature for reach and impact that an article might have, and not just within the scholarly research community. Several platforms offer article-level metrics, including PLOS’ own ALM service, which provides citations and other indicators of readership and social attention. The open source software Lagotto powering PLOS ALMs underpins Crossref’s Event Tracker to capture a range of usage activity linked to any digital object identifier, including datasets.

No single metric, however, can accurately reflect the diverse impact of different research outputs (as clearly laid out in the Metric Tide report and the Leiden Manifesto for research metrics). Ultimately, the scientific community needs a better means of capturing and communicating the assessment of validity, reliability, significance and quality that takes place over time, when experts engage deeply with and build upon the results of their peers.

We also need more granular and robust ways of describing and assigning credit to the myriad different contributions of individual researchers to articles, data, software, research projects, peer review and mentoring students. Towards this aim, PLOS and other publishers are starting to require that authors register for an ORCID ID, and are introducing the CRediT taxonomy to recognize the individual contributions of authors to an article. In the EU, Science Europe has just issued a report on how to evaluate multidisciplinary research that includes a recommendation for funders to evaluate applicants on a range of outputs, rather than just on publication record.

These are all welcome steps but ultimately, the culture will only change when the institutions responsible for overseeing the assessment of researchers and those who constitute the evaluation panels take active steps to change how they assess scientists.

Meanwhile, the message from journal editors and publishers that show their citation distributions is clear: we select and publish diverse articles that attract a wide range of citations, and no article can be adequately judged by the single value of the Impact Factor of the journal in which it is published.

Image Credit: Gerd Altman, Pixabay.com

Post navigation

Véronique Kiermer is the Executive Editor at PLOS, where she works closely with the editorial teams of the seven PLOS journals. Before joining PLOS in 2015, she held several senior editorial positions with the Nature journals, worked in the biotech industry and trained as a molecular biologist. Véronique obtained her PhD in molecular biology from the Université Libre de Bruxelles, Belgium.

Interesting research. In fact, finding relevant papers for a research is also affected by these kind of issues, namely the ad-hoc process in literature review. We have seen that researchers are usually biased by who they do know or who they do believe that will bring relevance instead of finding the actual relevant literature.

In a recent research, we have tried to address this problem by using machine learning to recommend literature review accordingly to systematic methods and bibliometric factors that might imply relevance.