Evaluating uncertainty

How do you assess an intervention where the outcomes are long-term and hard to predict, there is little ‘hard’ or definitive ‘objective’ data on which to judge progress in the meantime, and anyway the relationship between effort and results is non-linear and the programme’s influence is at best indirect and small (but hopefully important nonetheless). In short, how do you evaluate uncertainty?

A lot has been and is being written about applying PDIA approaches to programmes, particularly the sort I characterise above. I won’t talk here about the (significant) implications for the design of monitoring and evaluation activities more generally. Instead, I wanted to share some thoughts on a particular method that strikes me as highly relevant in such settings: influence maps.

The context is the use of influence maps in some recent work with IOD PARC – a great team comprising Dr Stuart Astill, Enrique Wedgewood-Young and Dr Sheelagh O’Reilly. We’ve been evaluating a DFID initiative called the South Asia Water Governance Programme (SAWGP), which supports knowledge sharing and deliberative dialogue to promote transboundary cooperation between the seven countries covering the Indus, Ganges and Brahmaputra river basins. Without going into the details, you will have a sense of the evaluation challenges.

Influence maps are an interesting and insightful tool for compiling and analysing actors’ beliefs about what’s happening and what’s important, using Bayesian techniques of inference (hence the more general term “Bayesian belief networks”).

Bayesian networks have been around some time, in medicine, engineering, environmental studies and, of course, IT. I first got involved with them some 16 years ago, working with Bob Burn, a statistician from University of Reading. We were exploring approaches for evaluating long-term research into natural resource management. But they also appear to be an idea that has somewhat come of age – judging by recent articles in the journal publication “Evaluation” (e.g. this, and this).

But to my knowledge, they have not yet been widely applied as a mainstream monitoring and evaluation tool in international development and, for that reason, DFID deserves credit for allowing IOD PARC to apply the technique as a central element in the SAWGP evaluation – genuinely innovative.

I’ll try to outline the general approach, and why they are potentially so useful in contexts characterised by high degrees of uncertainty. For those interested in learning more, I have written a slightly longer paper that provides more explanation – available on the IOD PARC website.

The first step is to elaborate and ‘map’ the programme theory of change. In the most uncertain settings, there may be little in the way of formal, explicit ‘theory’, and the job is to unearth and structure (in a causal diagram) the actors’ mental models of how change should occur. This is standard practice for evaluators familiar with theory-based approaches. However, one tip: ignore the programme (initially at least) and instead focus on the important factors that matter in the operating environment and the causal relationships between them. Once you have those down, you can then add where the programme intervenes.

The process of drawing out and elaborating key changes to the status quo that need to happen, ordering them causally and thinking about the role of the programme is an incredibly valuable exercise in itself – in building an shared understanding but also clarifying differences in view.

So, now you have a ‘map’ that comprises a series of boxes (or nodes) setting out the key changes required, ordered and linked by arrows to convey the (purported) causal relationships. So far, so familiar.

The next step is to define the conditional probabilities for each of the nodes identified in the map. This process uses a Bayesian way of thinking about probabilities – essentially, what’s the chances of this node happening, if other things have or haven’t happened (see box). When outlined on paper, the approach can seem clunky and artificial, but in practice it is quite intuitive to do (and fun!) once people get their heads around a Bayesian way of thinking.

Once you have a compiled the model, you can manipulate it easily to analyse in surprising richness what actors’ beliefs reveal about causality and what’s important for success. You can explore “what if” questions, by examining different outcomes for different nodes, and trace the effect on overall chances of success. You can look at the what the model says about the influence of individual nodes (including the programme’s interventions) and simulate ‘ex post’ outcomes to see what the most likely combination of explanatory factors are. You can compare the influence of different factors with the allocation of programme resources/effort, to identify avenues to explore from a value for money perspective…

The options are extensive but, of course, all the results will typically reflect the subjective beliefs of the actors involved. Even if you conduct the exercise with more than one person and get broadly similar responses, it doesn’t make the model “right” in conventional, objective terms – all actors may be subject to the same levels of ignorance, group-think may be rife… But in the case of programmes where decision-making is fundamentally based on managers’ subjective interpretation and judgement, is understanding and systematically analysing those beliefs of any less interest to an evaluator? And is it really sensible to dismiss the results obtained as somehow ‘not valid’ because of their subjective underpinnings?

The outline above assumes a one-time exercise. Even a single snapshot can provide significant insight. However, it gets even more interesting if the exercise is repeated. We have been lucky to do this in the SAWGP evaluation – with the first exercise in 2016, a repeat exercise in 2017 and a third and final exercise scheduled for later this year. (A big thanks here is due to the participating SAWGP implementing partners for their generous contribution of time and assistance in the work).

The addition of a longitudinal dimension to the analysis, examining what has changed, why, and how that affects actors’ beliefs about the future, opens the door to even more rigorous assessment, allowing the evaluator to link specific, observed changes in the context to changes in actors’ probabilistic assessments.

Of course, as with any ‘new’ approach, there are what we might politely term “areas for further research”. These include: ways of eliciting conditional probabilities that minimise bias; ways (and value) of estimating distributions around central estimates, how to aggregate/combine multiple actors’ views, how to define and ensure consistent interpretation of ‘occurrence’ and ‘non-occurrence’ of nodes means, understanding the sensitivity of results to model design and in particular the level of detail.

And given that influence maps are fundamentally linear models, they cannot accommodate multidirectional, feedback loops and as such are not the ‘answer’ to evaluation complexity. But for me they are a valuable tool. I aim to continue exploring and developing the approach in my work and will keep sharing as my learning proceeds.

If you are interested, do get in touch – either with questions or your own experiences/perspectives. I strongly believe the technique has potential for wider application in more mainstream settings – as a tool to augment conventional contribution analysis. With this in mind, I hope to write a piece soon explaining the approach in a bit more detail. Watch this space…