"I am but mad north-north-west: when the wind is southerly I know a hawk from a handsaw." --Hamlet, Act II, scene ii.

Monday, 4 June 2007

Metrication

On those rare occasions when I actually have a paper to submit for publication, I tend to consider the most appropriate journal for the article I've written. I weigh up factors such as the likely readership of the paper, the readership of the journal, the journal's reputation for rapid peer review and editorial processes, and so on. I have colleagues who say that I ought to take things like the impact factor into account, because publishing in journals with high impact factors is important for my career. The question is, why?

The impact factor is calculated using a publication database by Thomson Scientific, and published on the ISI Web of Knowledge. The database is proprietary, but my employer subscribes to it. The impact factor basically works by counting the number of citations in the year to articles published in a particular journal over the last two years, and dividing by the total number of articles published over the last two years. So the 2006 numbers, which will be published at the end of this month, are derived by counting the number of citations from articles published in 2006 to articles published in the journal in question in 2004 and 2005, and dividing by the total number of articles published in the journal in question in 2004 and 2005. The impact factor applies to the journal as a whole, and not the individual papers published in it.

Why do I hate the impact factor so? There are several reasons. Firstly, it is statistically dubious. Journals with high impact factors get most of their citations from a relatively small number of highly cited papers. For example, David Colquhoun writes that in Nature in 1999 the most cited 16% of papers accounted for 50% of citations (Nature 423, p. 479). In other words, the citation rate of an individual paper is uncorrelated to the impact factor of a journal. This is the most obvious statistical drawback. Several other sources of bias are described in Seglen (1997; British Medical Journal 314, p.497). Eugene Garfield, who invented the impact factor, also agrees that the impact factor should not be used to evaluate individuals (Garfield 1998; Der Unfallchirurg 48, p. 413).

Another 'metric' that has been suggested to evaluate the contribution of scientists is the h-index. In this scheme, an author will have an h-index of h such that they have published h papers that have each been cited at least h times. It has been pointed out that Einstein, had he died in early 1906, would have an h of only 4 or 5, despite the revolutionary nature of the work he had published before that date. This makes me slightly happier about my own h-index of 0, being early in my career and having published two papers that have not yet been around long enough to be cited.

Really though, statistical arguments about biases in various metrics miss the central point, which is that it is impossible to evaluate the scientific worth of an article without reading and understanding it. This can only be done well by people who have expertise in the field of study. In other words, it can only be done well by peer review.

In the UK, academic researchers are evaluated through the Research Assessment Exercise (RAE), which has traditionally been based on peer review. In the upcoming RAE in 2008, there will be a 'shadow' metrics-based exercise running alongside the traditional peer-review based process. In RAEs after 2008, metrics will be used as the main measure of the scientific worth of individual researchers. There have been many criticisms of the traditional RAE. David Colquhoun has written "All of us who do research (rather than talk about it) know the disastrous effects that the Research Assessment Exercise has had on research in the United Kingdom: short-termism, intellectual shallowness, guest authorships and even dishonesty" (Nature 446, p. 373). The situation is hardly going to be improved by relying on a metrics-based approach, as authors inevitably play the system in order to inflate their rankings and progress their careers.

What is perhaps most disappointing is that in general scientists themselves seem to have failed to critically examine metrics such as the impact factor. There is enough material out there in the public domain (the sources cited here are only a sample) for anyone to understand the problems, if they're interested in finding out.

2 comments:

My experience with geoscience journals is the 9 out of 10 papers are, at best, irrelavant data- spinninng. A small percentage are outright misleading or fabricated. I wonder if this ratio applies to medical journals as well? I would hope clinical research is held to a higher standard!

Day job...

Fieldwork in Egypt

Welcome to Hawk/Handsaw...

This is the blog of Paul Wilson. You can find science- and pseudoscience-related things here, as well as occasional posts on my current research. There is also some stuff about what I do in my spare time (mostly cycling and complaining about politics).