Researcher evaluation measures do not add up

Determining a researcher's influence on the scientific community is difficult, which is why measures such as the h-index, which sums up a researcher's significance with one number, have been developed; these measures are highly popular with universities, politicians, and research foundations because they make researcher performance comparisons easy. But the measures are, unfortunately, also misleading and biased, new research from the University of Copenhagen shows.

When a researcher applies for funding for a research project or applies for a position at a university, one single number can spell success or failure, e.g. the so-called h-index: The h-index is based on the set of a researcher's most cited papers and the number of citations that they have received in other publications by other researchers. The higher h-index, the more impact a researcher supposedly has on the scientific community. The problem is that the numbers generated from h-index calculations are difficult to make sense of:

– If researcher A publishes 3 papers that are cited 60 times each by other researchers, her h-index will be 3. But researcher B, who publishes 10 articles cited 11 times each in the same period, has an h-index of 10. How do we determine which researcher has more impact? Does the low h-index conceal the better researcher? The index does not give us meaningful answers to these questions, and the index is particularly equivocal if the two researchers are from two different disciplines such as physics and philosophy in which researchers' publishing traditions differ widely, says postdoc Lorna Wildgaard from University of Copenhagen's Royal School of Library and Information Science.

Lorna Wildgaard recently defended her PhD thesis Measure Up! about the various measures of individual researcher performance, the so-called bibliometric indicators, which universities and researchers use to compare academic performance. Most of the 114 different bibliometric indicators she has analysed are ill-suited to measure what they are supposed to measure.

– One of the more serious problems with the h-index and other indicators is that they supposedly estimate the mean value of the very limited set of data that a researcher's papers comprise; this is simply not possible, a statistician would tell you. Another problem is that most indicators were not even designed to do what universities use them for – they were originally conceived as mathematical models which required further work by other researchers, not complete bibliometric indicators ready to be used for researcher evaluation. They are nevertheless used as such by universities all over the world to evaluate and measure researcher performance, and despite its many flaws, the h-index has even been incorporated into bibliographical databases.

Favours senior researchers within specific fields

The bibliometric indicators that were originally developed specifically for researcher evaluation often target a particular scientific field and cannot be directly applied to other fields. The h-index was, for example, conceived by the physicist Jorge Hirsch in 2005 to evaluate the scientific impact of Nobel Laureates and other high-profile physicists with many prizes, papers, and citations.

– The h-index is nevertheless used within many different fields and by a lot of young scientists today, but the resulting indexes will inevitably be skewed when they try to use the h-index as the physicists do, says Lorna Wildgaard, stressing that one always has to consider the specific scientific context in which the research was conducted. For even though the h-index has been improved, it is still based on Hirsch's original model because of its simplicity.

Counting papers and citations is too one-sided

Lorna Wildgaard does not discount the usefulness of bibliometric indicators altogether, but they should not stand alone as they tend to become meaningless numbers that are difficult to interpret.

– I would like to see researchers and institutions add altmetrics to their list of indicators; altmetrics is a way of estimating influence by analysing the Internet and social media. You could, for example, count the number of paper downloads from a researcher's website or track the ways in which researchers share data with other researchers and laypersons – or whether they communicate their research through podcasts and wikis. Measures such as these could be a good indication whether research has an impact beyond the scientific community. But if we restrict ourselves to counting papers and citations, we get a rather limited view of impact and influence.

###

About the thesis

With her thesis 'Measure up!: The extent author-level bibliometric indicators are appropriate measures of individual researcher performance', Lorna Wildgaard has analysed the appropriateness of 114 different bibliometric indicators within four scientific fields: astronomy, environmental science, philosophy, and public health.

The thesis is part of the research project ACUMEN, which explores all the ways in which universities and researchers measure academic work. Read more about ACUMEN.