Wednesday, 2 March 2016

On the need for clarity of purpose in the REF and TEF

The UK’s Research Evaluation Framework (REF) has come in for a lot of criticism. It is now
under review by a panel chaired by Nicholas Stern, with a call
for evidence that closes later this month. At the same time, we have a Green
Paper setting out plans for a Teaching
Excellence Framework (TEF). This is motivated in part by the view that the
attention given to research and teaching has got out of balance. REF has
provided universities with strong incentives to put resources into research,
and teaching has consequently been neglected, goes the argument (though see here). So what do we
need to even things up? A TEF.

The problem for both REF and TEF is that, at the end of the
day, they aim for a single scale on which universities can be rank ordered so
we can compare quality. But everyone agrees that the things we are measuring,
research and teaching excellence, are complex and multifactorial.

There are basically two ways forward. Option A is to use
some kind of proxy measure, recognising its limitations but taking the view
that it is good enough for purpose. Option B involves trying to measure the
complex multifactorial construct in all its richness.

There are a number of factors that influence choice of
approach. Because everyone recognises that things are complex, Option A is
unlikely to be acceptable to the academic community. Simple measures are often
easy to game. On the other hand, the complex multifactorial measures of Option
B can be debated endlessly, often involve elements of subjective judgement, are
not immune to gaming, can be extremely expensive to administer, and can be hard
to integrate into a single ranking.

James
Wilsdon has noted with regard to the REF, before deciding which system of
measurement to use, we have to have a clear idea of what we are trying to
achieve. As far as the REF goes, its
purpose has changed and mutated over the years. It started out with a pretty
simple goal: to find a formula to determine allocation of quality-related (QR)
funding from central government to universities. However, as Wilsdon notes, it
has subsequently been used for four additional purposes: to demonstrate
accountability, to provide a measure of reputation, to influence research culture,
and as a tool within universities for managing academics. He notes that: “If
all we want from the REF is a QR allocation tool, then we can certainly do that
in an algorithmic, metric-based way”(i.e. Option A). But he argues the REF needs
to fulfil the other functions too, and, as was amply demonstrated in his report
the Metric Tide, for
those other purposes, a simple metrics-based system is inadequate.

I agree with much of what Wilsdon says, but I think we could
save ourselves a lot of trouble by reverting to the original purpose of the
REF, i.e. treat it purely as a mechanism for allocating funding. As I have
argued previously, if that is all you want to do, then you don’t even need to
bother with metrics of the kind discussed in his report. A simple measure
of the number of active researchers present in a department gives a
remarkably high correlation with the amount of QR funding received – and this
works well for most subjects in arts and humanities as well as sciences.

But what about gaming? When I proposed this idea a couple of
years ago, people said, wouldn’t universities just designate the departmental
cleaner as an active researcher, or take on more research staff? I don’t see
these problems as insuperable. It would be important to specify stringent criteria
for research staff to meet: these would include terms of employment (casual
staff would be excluded), as well as evidence of research activity. If one counted only those staff who had been employed at the
institution for some minimum period, such as 3-4 years, this should prevent institutions catapulting in overseas researchers on Mickey
Mouse contracts, or taking on short-term staff to give a temporary blip in
researcher numbers.

A more serious objection to my proposal is that there is no
explicit measure of research quality – an institution could take on a large number
of weak researchers and look as good as a competitor with an equal number of
excellent researchers. But would this happen? Remember, researchers would need
to be on the institutional payroll for a period of 3-4 years prior to the
evaluation, so the institution would need to commit to the expense of employing
them. This would not be worthwhile if staff then failed to meet the criteria set
for research-active staff. Academics who did not count as active researchers would end up being a net cost to the institution.

I’m not saying that it would be easy to fine-tune such a
system to avoid gaming or unintended consequences, just that it could be done, and I suspect would be much less difficult than devising an entirely separate system for evaluating research quality.

My case falls apart if, like Wilsdon (and many other people who
have been involved in REF) you think REF should fulfil additional purposes.
Then, because no one measure is suitable for all purposes, you need something
much more complicated. But I do agree with Wilsdon that, if that’s what you
want, you need to be clear about it – and about the need for a diverse set of
measures appropriate to different goals.

What about TEF? Well, when you dig beneath the surface, you
find that the parallels between REF and TEF are purely superficial. The purpose
of TEF is not to allocate funding – there is no funding to allocate. The stated
purposes are as complex and multifactorial as the notion of teaching excellence
itself: to help students select courses, to increase access of
under-represented groups to higher education, to provide a basis for allowing
universities to raise fees, and to provide criteria for ‘new entrants’ (i.e.
private institutions) that wish to enter the higher education market. According
to a recent BIS
Select Committee report, it’s also intended to provide incentives: “to ensure that higher education institutions
meet student expectations and improve on their leading international position.”
Quite what it means to improve on a leading international position is not specified.

In attempting to develop a measure that will cover all these
functions, those promoting the TEF have tied themselves in knots, as
illustrated by this wonderfully circular statement from the same Select Committee report:

In the absence of any
agreed definition or recognised measures of teaching quality, the Government is
proposing to use measures, or metrics, as proxies for teaching quality.
Therefore the challenge is to identify those metrics which most reliably and
accurately measure teaching quality, as opposed to other factors that
contribute to the results achieved by students.

This is worrying. The only positive thing one can say is
that there are signs that government may be starting to recognise some of the
problems. The Select Committee report cautions the need not to rush into a TEF,
and notes reservations both about the measures proposed and the proposed link
between TEF and fee-raising powers. The report concludes by encouraging academics
to work with BIS to develop appropriate metrics for TEF – the impression is
that government is aware if they get it wrong then universities may just decide
not to play ball. One of the members of the Select Committee, Amanda
Milling, wrote in the Times Higher that “the higher education sector has a responsibility to engage with TEF to
make it work.”

But do we? I would argue that the responsibility lies with the Minister, to make a proper case for the TEF.

As the Select Committee report points out: “It is important to note the high quality of
teaching generally available in our higher education system at present…..The
debate around teaching excellence should therefore be viewed within the context
of enhancing an already excellent system or, as the Minister for Universities
and Science put it, ‘to continue to make a great sector greater still’”.
These weasel words mean that if universities resist TEF, they can be accused of
complacency. But where’s the evidence that TEF will ‘make a great sector
greater still’? A considerable amount of time and money will be sucked up by
this exercise, which has multiple confused aims and has potential to tie up a
great sector in pointless bureaucracy and waffle. The whole idea is seriously
misconceived and has been rushed through without adequate justification or
cost-benefit analysis.

We are now being told that TEF will be introduced by degrees, with measures being developed over time, but I am not reassured. If the government wants academics on side, it needs to demonstrate more coherent arguments, with clear specification of the goals of the TEF, and evidence of validity of the measures it proposes to achieve those goals. And most of all, it needs to show us that more good than harm will result from this exercise.