"I am but mad north-north-west: when the wind is southerly I know a hawk from a handsaw." --Hamlet, Act II, scene ii.

Friday, 9 January 2009

Does the REF add up to good science?

The RAE (Research Assessment Exercise) results from the 2008 were published back in December. You might have noticed this from the number of university websites that could be found frantically spinning the results. My very own University of Manchester, for example, is claiming that Manchester had broken into the “golden triangle” of UK research, that is, Oxford, Cambridge and institutions based in London. It seems that depending on the measure you pick, we’re anywhere between third and sixth place in the UK. Clearly these are excellent results, but whether we’re really up there with the Oxfords, Cambridges, Imperials and UCLs of the world I’m not sure.

In any case, that was the last ever RAE. It has been a fairly cumbersome process, involving expert peer review of the research contribution of research institutions, that has been a real burden on the academics who have had to administer it. I’m sure there are few who will mourn its passing. Now the world of English academia is waiting, like so many rats in an experimental maze, to find out what will replace the RAE. The replacement will be a thing called the Research Excellence Framework, or REF, and at this stage exactly what it will involve is fairly sketchy. However, it will be based on the use of bibliometrics (statistical indicators that are usually based on how much published work is cited in other publications) and “light-touch peer review”.

What kind of bibliometric indicators are we talking about? Last year HEFCE (the Higher Education Funding Council for England, the body that evaluates research and decides who gets scarce research funding) published a “Scoping study on the use of bibliometric analysis to measure the quality of research in UK higher education institutions” produced by the Centre for Science and Technology Studies at the University of Leiden, Netherlands. I’ve spent a fair amount of time reading through this, and in some ways I was encouraged. It’s clear that some thought has gone into creating bibliometric indicators that are as sensible as possible: I was dreading a crude approach based around impact factors, which have already done so much damage to the pursuit of good science. The authors of the “scoping study” came up with an “internationally standardised impact indicator”: I will abbreviate this as ISII for concision. The ISII takes the average number of citations for publications for the academic unit you are interested in (this might be a research group, an academic department or an entire university), and divides it by a weighted, field-specific international reference level. The reference level is calculated by taking the average number of citations for all publications in a specific field: if the publication falls under more than one field (as many will in practice), the reference level can be calculated as a weighted average of the number of citations generated by publications in all the fields in question. So, if the ISII for your research group comes out as 1, you’re average, if above 1, better than the average, and if below 1, worse than the average. The authors of the scoping study say that they regard the ISII as being “the most appropriate research performance indicator”, and suggest that a value of >1.5 indicates a scientifically strong institution. They also suggest a threshold of 3.0 to identify research excellence. It seems that the HEFCE is expecting to adopt the ISII as the main research performance indicator, according to their FAQs, where they say “We propose to measure the number of citations received by each paper in a defined period, relative to worldwide norms. The number of citations received by a paper will be 'normalised' for the particular field in which it was published, for the year in which it was published, and for the type of output”. However, they are still deciding what thresholds they will use to decide which institutions are producing high-quality research.

All well and good. If you insist that bibliometric indicators are necessary, this is probably as good a way as any of generating those data. However, there are some problems here, as well as philosophical difficulties with the entire approach.

Firstly, what is it we are trying to measure? In theory, what HEFCE wants to do is evaluate research quality. But the ISII does not directly measure research quality. Like any indicator based on citation rates, it is measuring the “impact” of the research: how many other researchers published papers that cited the research. It ought to be clear that while this should reflect quality to some degree, there are significant confounding factors. For example, research that is done in a highly active topic is likely to be cited more than research in which fewer groups are working. This does not mean that work in less active topics is of intrinsically lower quality, or even that it is less useful.

Secondly, there is an assumption that the be-all and end-all of scientific research is publication in peer-reviewed journals that are indexed in the Web of Science citation database published by Thomson Scientific. This a proprietary database that lists articles in the journals that it indexes, and also tracks citations. Criteria for journals to be included are not in the public domain (although the scoping report suggests these are picked based on their citation impact, p. 43). A number of journals that I would not consider to be scientifically reputable are included. For example, under the heading of Integrative and Complementary Medicine, the 2007 Journal Citation Reports (a database that compiles bibliometric statistics for journals in the citation database) includes 12 journals, including Evidence Based Complementary and Alternative Medicine (impact factor 2.535!) and the Journal of Alternative and Complementary Medicine (impact factor 1.526). This reinforces the point made above: it would be possible to publish outright quackery in either of these journals, have it cited by other quacks in the quackery that they publish, and get a respectable rating on the ISII. The ISII can’t tell you that this is a vortex of nonsense: it only sees that other authors have cited the work. It is also true that not all journals are included in the citation index: for example, in my own field the Bulletin of Canadian Petroleum Geology fails to make the cut, although it has always published good quality research. Although the authors of the scoping report make clear that it is possible to expand bibliometrics beyond the citation database, this will take much more effort and it seems that HEFCE will not take this route. So we will be relying on a proprietary and opaque database to make decisions on future research funding. A further point is that it is not clear how open access publications will be incorporated in the citation index: in principle there is no reason that this can’t happen, but can we be sure it will?

Thirdly, there is the assumption that research output can only be evaluated in terms of published articles in peer-reviewed journals. I’m not sure that this accurately reflects the actual research output of many scientists. For example, most of us put a lot of effort into presentations at scientific conferences, chapters in books, or government reports that will never make it into a citation database. This has become a problem for things like, in my own field, the special publications of the Geological Society of London. These are volumes that collect recent research on specific topics, and they generally contain excellent research. But they aren’t included in citation databases and they have no impact factor. This has led to a lack of interest in publishing results in these special publications, because they don’t tick the right boxes in terms of publication metrics. This is surely a bad thing. A similar problem occurs with things like government open-file reports. These are not, in general, pieces of world-class, cutting edge research. But that does not mean that they are useless or that they have no value. For example, good regional geological work can allow mineral exploration to be better targeted, benefiting the local economy. Yet that kind of work is ignored in a framework that only considers journal articles: HEFCE says only that “We accept that citation impact provides only a limited reflection of the quality of applied research, or its value to users. We invite proposals for additional indicators that could capture this”. To me, research quality and value cannot be measured by bibliometric indicators. It can only be evaluated by reading the research, understanding its context within the totality of pre-existing research, and understanding how it contributes to new understanding. That is, it can only be evaluated through peer review.

Which brings me to my fourth point; there are some questions about the role of peer review within the REF. HEFCE says that “the scoping study recommends that experts with subject knowledge should be involved in interpreting the data. It does not recommend that primary peer review (reading papers) is needed in order to produce robust indicators that are suitable for the purposes of the REF”. However, I’m not convinced that this accurately summarises what is written in the scoping report, which says “In the application of indicators, no matter how advanced, it remains of the utmost importance to know the limitations of the method and to guard against misuse, exaggerated expectations of non-expert users, and undesired manipulations by scientists themselves…Therefore, as a general principle we state that optimal research evaluation is realised through a combination of metrics and peer review. Metrics, particularly advanced analysis, provides the tools to keep the peer review process objective and transparent. Metrics and peer review both have their strengths and limits. The challenge is to combine the two methodologies in such a way that the strengths of one compensates for the limitations of the other”.

Finally, there is a hint of conflict of interest in the preparation of the scoping report by the Centre for Science and Technological Studies: according to their website, the centre is involved in selling "products" based on its research and development in the area of bibliometric indicators. Their report in favour of bibliometric indicators might allow them to drum up significant business from HEFCE.

At present, the proposals for the REF are at a fairly early stage, but the use of bibliometric indicators seems to be entrenched, and there will be a pilot exercise on bibliometric indicators this year. However, this is based on “expert advice” that consists of a single report from an organisation that makes money by creating bibliometric indicators. While academia in general might welcome the proposals on the grounds that they will be less burdensome than the RAE and give everyone more time to do research, I don’t think many academics will be kidding themselves that the bibliometric indicators involved actually tell us much about research quality and usefullness.

2 comments:

Nature Network and the blogs on it have covered this whole "how useful are bibliometric indices" issue quite extensively, Paul - see here and here for some links.. and there a doubtless a lot more.

I think one of the (not so trumpeted) reasons why there is a move to accept REF is that RAE, for all the sound and fury, rarely causes dramatic changes in overall University finances, and really does not ever "buck" the generally (tacitly) agreed University pecking order in the UK terribly much. A really cynical view would be that, post the grading exercise we have all just endured, the Govt fixes the "gearing" (how much £££ each grade is worth) and the "cut-off" (grade below which you earn nothing) - these are the things that don't get announced for several more months - so that the Univs end up with allocations which reflect where they were already seen to stand in the general scheme of things.

If you accept this viewpoint, then a system which will largely preserve the status quo (as one based on bibliometrics and research funding stats is likely to do) can be viewed as likely to produce the same result as now but with less running around in circles.

Of course, it will still put the Univs and their staff under a clear "selection pressure" to well in the indices that are being counted. An interesting question is whether that "selection pressure "wil be different from that currently pertaining under the RAE.

Nature had an editorial on metrics in their first issue of the year. Their line is that there is an "indispensable and central role for expert peer-review in the evaluation of research.

There is also a Nature article on the allocation of research funding: there is some concern that good research is actually spread much more widely around the UK universities than the 25 or so institutions that will end up getting most of the money.

Day job...

Fieldwork in Egypt

Welcome to Hawk/Handsaw...

This is the blog of Paul Wilson. You can find science- and pseudoscience-related things here, as well as occasional posts on my current research. There is also some stuff about what I do in my spare time (mostly cycling and complaining about politics).