Do our measures of academic success hurt science?

A Ph.D. student wants to submit his research to a journal that requires sharing the raw data for each paper with readers. His supervisors, however, hope to extract more articles from the dataset before making it public. The researcher is forced to postpone the publication of his findings, withholding potentially valuable knowledge from peers and clinicians and keeping useful data from other researchers.

Many scholars can share similar stories of how career incentives clash with academia’s mission to increase knowledge and further scientific progress. The professional advancement system at universities seems to be caught in a bibliometric trap where scientific success is predominantly defined in terms of numbers – the number of grant dollars and publications, the ranking of journals, the quantity of citations – rather than impact.

As researchers make smart career decisions within this bibliometric “publish or perish” system, their choices unwittingly hamper the quality and impact of research as a whole. Science becomes “the art of the soluble” as researchers and funders alike avoid complex, real-world problems and focus instead on small-scale projects leading to incremental science (and more publications).[1][6]

If we want to address the reproducibility crisis and other current concerns about the reliability and value of academic research, we have to change the incentive structures within academia that reward certain types of research over others. We must incentivize activities that promote reproducible, high-quality, high-impact research.

The Bibliometric Trap

The use of numerical indicators to evaluate academic success is actually a fairly recent development. Only in the past thirty years has scientific quality become defined primarily by the number of peer-reviewed publications, the journal impact factor[7] (whose uptake started in the 1990s), international measures like the Shanghai Ranking[8] of universities (first published in 2003), and personal citation scores like the h-index[9] (introduced in 2005).

Numbers offer a convenient, seemingly objective way to evaluate success and compare outcomes. They fulfill a need for control and accountability. However, the impact factor of a journal doesn’t necessarily reveal anything about the quality of an article or what it contributes to the broader quest for scientific truth. Long publication lists are meaningless when many papers are never cited or read, and certainly not by practitioners outside academia. More importantly, recent replication efforts show that many studies cannot be reproduced, suggesting that prestigious publication is no guarantee of the validity of findings.[2][10]

The bibliometric approach to evaluating research leads to risk avoidance and a focus on short-term outcomes. Researchers slice their results into the smallest publishable units, dripping out findings over time so they can accrue more publications. They run and rerun analyses until they uncover a statistically significant finding – a process known as p-hacking[11] – regardless of their original research question. They focus their research on a quest for tantalizing new discoveries, avoiding the important but less glamorous work[12] of validating the findings of other scientists or sharing failures (i.e. negative findings) that others can learn from.

By and large, these researchers don’t intend to slow scientific progress. They’re simply responding to a career advancement system that rewards certain types of work over others. Yet these incentive structures often prevent science from making good on its promise of societal impact. For instance, the bibliometric framework disincentivizes many of the kinds of studies and replications needed to move along drug discovery, a slow process in which promising compounds are tested and retested numerous times.[3][13]

The bibliometric approach also devalues teaching. Since an academic career is typically built on quantifiable research output, many researchers “buy themselves out” of teaching responsibilities or spend less time and effort improving their teaching craft. This shortchanges the next generation of scientists from receiving the preparation they need to make transformational discoveries.

The Influence of Institutions

In our bibliometric world, researchers are often caught between what they should do and what they are rewarded for – and it is institutions that set these rules. For instance, the main national research funder in the Netherlands, the Netherlands Organisation for Scientific Research (NWO), recently made 3 million euros available for replication studies[14]. The organization also requires that all research papers resulting from its funding be published in open access journals.

However, when researchers apply for a prestigious individual grant from NWO, the impact factor of the journals they have been published in plays a major role in their evaluation. This evaluation system is at odds with the organization’s promotion of replication studies and open access journals, which tend to have lower impact factors.

A similar disconnect exists between journal guidelines and the policies of research institutions. Many journals now require researchers to make the data underlying their published results publicly available, a policy that promotes transparency and reproducibility. Universities and research institutes, however, rarely reward or facilitate these open data efforts. If data transparency is not backed by formal rewards and technical support, researchers may perceive open data as a sideshow on the fringe of “real” scientific work, rather than a key part of the scientific process.

Reconsidering How We Evaluate Research

As the academic community has struggled to come to terms with the “reproducibility crisis[15],” a vigorous international debate has arisen about the role of career incentives in affecting research quality. The San Francisco Declaration on Research Assessment[16] (DORA), which calls for better methods for evaluating research output, was drafted by journals, funders, and researchers in 2012 and has since been signed by more than 800 institutions and 12,000 individuals. It recommends that individual researchers not be judged by bibliometric measures, but instead by “a broad range of impact measures including qualitative indicators of research impact, such as influence on policy and practice.”

Another milestone occurred in 2015 when an international collection of experts published the Leiden Manifesto for Research Metrics[17]. The manifesto declares that “the abuse of research metrics has become too widespread to ignore,” and outlines ten principles to guide better research evaluation. These best practices include evaluating performance in relation to a scholar or institution’s mission and judging individual researchers based on a holistic, qualitative evaluation of their portfolio of work.[4][18]

In the Netherlands, our organization Science in Transition[19] has been advocating for a new way of evaluating research since 2013. Our efforts prompted heated debate, leading Dutch research organizations to drop “quantity” (of publications and funding) as a distinct category in the nationwide protocol for evaluating universities. In addition, the Dutch association of universities signed on to DORA at Science in Transition’s second conference.

Restructuring the Tenure System

These efforts to reimagine how we define research success are an important first step, but it will be up to universities and research institutions to make the concrete policy changes necessary to truly transform the system. At our institution, University Medical Center Utrecht in the Netherlands, we are working to put into practice the ideas and critiques formulated by Science in Transition and other groups.

We are actively countering the undesirable incentives in our system for evaluating researchers. All applicants for academic promotions at UMC Utrecht now submit a portfolio covering multiple domains: science, teaching, clinical activities, leadership, impact, and innovation. Candidates are required to present themselves in an inclusive way, with narratives about the impact and goals of their research.

The new evaluation system provides the review committee with a broad view of someone’s work and the opportunity to promote or hire a scholar who may not have the perfect publication profile, but excels in areas that are harder to quantify in bibliometric terms. Our hope is that this system will incentivize researchers to focus on work that advances science, regardless of how it looks when the numbers are tallied up.

We have also changed how we evaluate the university’s research programs. Such institutional evaluations are an important feature of the Dutch academic system, yet are often dominated by bibliometric measures. We’ve shifted to evaluating programs based on their wider clinical and societal impact. Research programs are asked to explain how they arrived at their main research questions, how their research fits with existing knowledge, and how their findings can advance clinical applications. We also ask both international peers and societal stakeholders to evaluate their research.

Research programs must document how patient organizations, companies, government agencies, and other stakeholders benefit from their work and are involved in structuring their research questions. In addition, programs are asked to show how their methods and data systems promote high-quality, reproducible research. Researchers should have data management plans in place and make datasets available for external use. UMC Utrecht supports these efforts by providing a template for data management plans and dedicated servers for storing and sharing data.

Our efforts at UMC Utrecht are grounded in the belief that the full power of biomedical research should be geared toward fulfilling its ultimate mission: improving healthcare. We try to align incentives for our researchers and our institution with this mission, so that citation counts and impact factors don’t get in the way of our goal. Our hope is that others in the field will realize that scientific papers are not a goal in and of themselves, but are stepping stones on the road to impact.

Rinze Benedictus is a staff advisor at the University Medical Center Utrecht and a Ph.D. researcher at Leiden University in the Netherlands.

Frank Miedema, Ph.D., is professor of immunology and dean and vice chairman of the Board of the University Medical Center Utrecht, Netherlands, and is one of the founders of Science in Transition[20], an effort to reform the scientific system to focus on creating value for society.

This article is part of a series[21] on how scholars are addressing the “reproducibility crisis” by making research more transparent and rigorous. The series was produced by Footnote[22], an online media company that amplifies the impact of academic research and ideas by showcasing them to a broader audience, and Stephanie Wykstra, Ph.D., a freelance writer and consultant whose work focuses on data sharing and reproducibility. It was supported by the Laura and John Arnold Foundation.

[1][23] In his 1969 book The Art of the Soluble, Peter Medawar explored how science is driven by the problems scientists choose to investigate. While Medawar intended the “art of the soluble” as a more neutral description of how researchers select problems they think they can solve, we see it as a political issue, where incentives push the scientific effort to more easily soluble problems.

[2][24] A wake up call about reproducibility came in 2012, when researchers from the biotechnology firm Amgen published an article[25] in Nature stating that they were only able to replicate the results from 6 out of 53 “landmark” papers, a replication rate of 11%. These papers were published in prominent journals with high impact factors in the fields of hematology and oncology. A year prior, researchers from pharmaceutical company Bayer encountered similar problems[26]. They could validate results in just a quarter of 67 papers with promising findings published in prestigious journals in cardiovascular research, oncology, and women’s health.

[3][27] As a 2016 journal article[28] put it, “These 'valleys of impact' are not correlated with societal impact. On the contrary, lower-impact studies that focus on validation, reproducibility, and implementation are as essential as drug target discovery for translation. Yet studies in these areas receive fewer citations, and the academics who publish them receive fewer accolades.”