Add new comment

Recently I had a lot of conversations about evidence. First, one of the periodic retreats of Oxfam senior managers reviewed our work on livelihoods, humanitarian partnership and gender rights. The talk combined some quantitative work (for example the findings of our new ‘effectiveness reviews’), case studies, and the accumulated wisdom of our big cheeses. But the tacit hierarchy of these different kinds of knowledge worried me – anything with a number attached had a privileged position, however partial the number or questionable the process for arriving at it. In contrast, decades of experience were not even credited as ‘evidence’, but often written off as ‘opinion’. It felt like we were in danger of discounting our richest source of insight – gut feeling.

In this state of discomfort, I went off for lunch with Lant Pritchett (right – he seems to have forgiven me for my screw-up of a couple of years ago). He’s a brilliant and original thinker and speaker on any number of development issues, but I was most struck by the vehemence of his critique of the RCT randomistas and the quest for experimental certainty. Don’t get me (or him) wrong, he thinks the results agenda is crucial in ‘moving from an input orientation to a performance orientation’ and set out his views as long ago as 2002 in a paper called ‘It pays to be ignorant’, but he sees the current emphasis on RCTs as an example of the failings of ‘thin accountability’ compared to the thick version.

In a forthcoming paper (which I will definitely link to when it’s published), Lant defines thick accountability as ‘an “account” in the sense of a justificatory narrative of my actions, the story of my actions I tell to those whose opinion of me is important (including myself, but including family and kinsmen, friends, co-workers, co-religionists, people I respect and desire admiration from) that explains why my actions are in accord with, and deserving of, a positive view of myself. In contrast, thin accountability is “accounting”, which is that small part of the account about which objective facts can be established.’ He sketched out the inevitable 2×2 matrix for me.

Thin accountability

Low performance

e.g. fragile states

Thin accountability

High performance

e.g. post office and road-building

Thick accountability

Low performance

e.g. families and other non-performance oriented institutions

Thick accountability

High performance

e.g. just about any complex institutional ecosystem

The challenge in most development work is to move from top left to bottom right. There are occasions when thin accountability/high performance works – typically routine functions like delivering mail or building roads. But anything involving the messiness of people and institutions requires thick accountability, involving deep bonds of trust and reciprocal relationships that are likely to be defined by a setting’s unique history and geography – what he calls ‘folk practices, from which formal organizations can (re)emerge’.

He argues that the randomistas just don’t get this. His critique of RCT culture ranged pretty wide:

The politics of RCTs: ‘RCTs are a tool to cut funding, not to increase learning.’ ‘Randomization is a weapon of the weak’ – a sign of how politically vulnerable the argument for aid has become since the end of the Cold War. ‘Henry Kissinger wouldn’t have demanded an RCT before approving aid to some country.’ And I can’t see the military running RCTs to assess the value for money of new weaponry before asking for more cash (mind you, if they did, that might at least save some money on Trident….).

The lack of interest in theory: ‘the randomistas are going back to alchemy – atheoretic experimentation’.

RCTs test at most a few project variants using ‘project vs non-project’, whereas interventions are typically multiple, overlapping and synergistic (i.e. the whole cannot be reduced to a sum of parts).

No-one evaluates the evaluators. At the very least, given how much RCTs cost, you need to know that the findings are useful elsewhere (so-called ‘external validity’). But once you have multiple RCTs on the same issue (and their spread is starting to produce such comparable studies), you find very little external validity – the results of an RCT in one country and time are not replicated elsewhere (with the possible exception of deworming in schools, but even that iconic RCT story is contested). This is the big contrast with real science, where replicability is a key condition of validity.

In another forthcoming paper, he argues instead for ‘structured experiential learning’, which involves rigorous and intelligent conversation, rather than the illusory certainty of numbers. Get people in a room, agree what the problem is, agree to try out some experiments to solve the problem, and set up rapid feedback to identify failure and/or build on success. In another recent paper, he calls this ‘Problem Driven Iterative Adaptation (PDIA)’. It sounds very similar to the conclusions of the Africa Power and Politics Programme, which I reviewed recently. In yet another paper (he’s horribly prolific), he also draws a neat distinction between experiments and experimentation:

‘Perhaps surprisingly, the experimentation and experiments approaches are not at all the same. I argue that experiments, while a terrific method for generating PhD dissertations and published papers, will have impact on development and development practice only insofar as they are embedded in an experimentation approach (which they are often not).’

The feeling I got from these conversations was of two tribes encamped and preparing for battle. That line from Henry V comes to mind: ‘from camp to camp, through the foul womb of Boston night, the hum of either army stilly sounds.’ On one side are the ‘best fit’ institutionalists and complexity people, with their focus on path dependence, evolution and trial and error. On the other are the ‘universal law’ experimentalists, offering the illusory certainty of numbers, and (crucially) comfort to the political paymasters seeking to prove to sceptical publics that aid works. It’s hard to see how they can both be right, or happily coexist for long. Time for a wonkwar on this blog, I think…..