monitoring

i have been thinking a lot about ‘theories of change’ this week (just did some presenting on them here!). actually, i have been thinking more about ‘conceptual models,’ which was the term by which i was first introduced to the general idea (via vic strecher in conceptual models 101) and the term i still prefer because it implies more uncertainty and greater scope for tinkering than does ‘theory.’ (i accept that ‘theory of change‘ has been branded and that i have to live with it but i don’t have to like it. when people start calling them “tocks,” it’ll be a really, really bad day. i can deal with the acronym “ToCs” but please, world, don’t pronounce it “tocks” or switch to writing “tox” or something else dreadful.)

regardless of the term, the approach of thinking seriously about how behavioral, social and economic change will happen is really important — and often overlooked during the planning stages of both projects/programs/policies and evaluations. (too often, the intricacies of how change actually happened (or didn’t) are left to academic speculation in the discussion section of an evaluation paper — a certainly not informed by talking systematically to those people who were intended to benefit from the program).

i think there is growing recognition that building a theory of change is something that should happen, at least in part, backwards (among other places where this is discussed is in ‘evidence-based policy‘ with the idea of a ‘pre-mortem‘ and ‘thinking step-by-step and thinking backwards‘). that is, you start with the end goal (usually some variant of ‘peace,’ ‘satisfaction,’ ‘wellbeing,’ ‘capabilities,’* etc) in mind and work backwards as to how you are going to get there. actually, it’s a bit more like the transcontinental railroad, where you start from both ends (where you are and where you want to get) and build backwards and forwards until the ideas meet in the middle and you have a sense of what needs to be done and what assumptions underlie one step translating to the next.

in teaching us about not only conceptual models but grant writing, vic used the analogy of an island. the island was where you wanted to get — the state of the world as things would be once your intervention was rolled-out, fully operational and lasting change affected. it wasn’t enough to just say that people would have more money or would be healthier. you had to describe how the state of the world would look, feel, and operate. how would someone’s day look in the new state of the world? what would be different about the way they undertook their daily activities, or indeed what their daily activities would be? then, once you had the new state of the world/island in mind, you could make sense of where you were currently (through one of those ex ante ‘needs assessment‘ things i so rarely hear about in planning development projects or building theories of change) and what needed to be done to build a bridge from where you are to the island.

some of this work in understanding where people are and where ‘they,’ and therefore, ‘we’ want to get is meant to be generated through the nebulous terms “stakeholder engagement” and “formative work.” i think we discuss much less how formative engagement and stakeholder work (probably not a great sign of specificity that all the words can be mixed up so easily) actually translates into a robust theory of change. in this regard, i have learnt quite a bit from product and engineering books like the inmates are running the asylum. these are books about product and service design and the ‘user experience’ — far-out concepts we probably (almost certainly) don’t spend enough time thinking about in ‘development’ and something that would probably really benefit our theories of change in detailed and ‘best-fitting’ a particular situation… not to mention, you know, benefit the beneficiaries.

one of the tools i like best is what is, effectively, imaginary prospective users — in cooper‘s terminology, ‘personas.’ here’s the idea, as i see it translating to development and theories of change. we know stakeholders are important but they cannot (realistically or effectively) all be in the same room, at the same table, at the same time. nor can they all be called up each time we make a small tweak in program design or the underlying assumptions. and, it is likely the intended beneficiaries that are hardest to call up and the most likely not to be at the table. but we can use personas to bring them to the table, so that what happened in ‘the field’ most certainly does not stay there.

let’s say that for a given project and evaluation, widowed women are a key sub-group of interest.

forget widowed women.

start thinking about “mary.”

mary is a widowed woman.

her husband had been a carpenter and died of c cause. she lives in x place while her n children live in z other places and provide her with s amount of support. mary can be a composite of widowed women you did meet in the field during deep, household level needs assessment and formative in-depth interviews with intended beneficiaries. that’s how you might have a picture of mary and know that she lives in h type of house, with e regular access to electricity and have g goats and l other livestock. it’s how you know she’s illiterate and has a mobile phone onto which she never adds credit. it’s how you know what time she wakes up, what her morning chores are, who she talks to, when and whether she has time to go to the market, how she gets her information, what aspects of her environment will enable change and which will hinder it, and so on.

so, all potential beneficiaries can’t be at the table but personas of key subgroups and heterogeneities of interest can be. if everyone in the room for the design (intervention and evaluation) process is introduced to the personas, then they can speak up for mary. she still gets a voice and the ability to ask, ‘what’s in all this for me?’ will she be able to deal with an extra goat if she gets one as part of a livestock program? does she have the means of transport to collect cash as part of a transfer program? is her neighborhood safe for walking so she can follow up on the health information you provide? is mary going to give a hoot about the sanitation information you provide her?

mary’s obstacles need to be dealt with in your program design and the places where mary might have trouble engaging with the program need to be put into your theory of change and monitored as part of your M&E (& e) plan. will mary help you think everything? no, of course not — she’s good but she’s not that good. but it’ll probably be nearer to something that can actually work (and don’t forget that street-level workers, other implementers and high-level stakeholders should have personas too!).

please invite mary to the table when you’re designing your intervention and constructing your theory of change. it doesn’t replace the need for actual monitoring and actually asking for beneficiary, implementer and stakeholder feedback.

but have mary describe to you how her life will be different (better!) with your program in place, how the actual structure of her day and decision-making have changed now that she’s on the aforementioned goal island. you’ll be a little closer to making it so.

this post is massively indebted to danielle giuseffi, who introduced me to some of the books above and with whom i have discussed building models more than anyone else! still one of my favorite business-partners-in-waiting, d-funk, and i still like our behavioral bridge.

*yes, i know that ‘capabilities’ were initially from amartya sen and that i should have linked to this. but for planning approaches, i find the 10 laid out by nussbaum more accessible.

i have been saying for some time that my next moves will be into monitoring and vital registration (more specifically, a “poor richard” start-up to help countries to measure the certainties of life: (birth), death, and taxes. (if village pastors could get it done with ink and scroll in the 16th c across northern Europe, why aren’t we progressing with technology??! surely this is potentially solid application of the capacity of mobile phones as data collection and transmission devices?).

i stumbled onto a slightly different idea today, of building backwards from well-financed evaluation set-ups for specific projects to more generalized monitoring systems. this would be in contrast to the more typical approach of skipping monitoring all together or only working first to build monitoring systems (including of comparison groups), followed at some point by an (impact) evaluation, when monitoring is adequately done.

why don’t more evaluations have mandates to leave behind data collection and monitoring systems ‘of lasting value,’ following-on an impact or other extensive, academic (or outsider)-led evaluation? in this way, we might also build from evaluation to learning to monitoring. several (impact) evaluation organisations are being asked to help set up m&e systems for organizations and, in some cases, governments. moreover, many donors talk about mandates for evaluators to leave behind built-up capacity for research as part of the conditions for their grant. but maybe it is time to start to talking about mandates to leave behind m&e (and MeE) systems — infrastructure, plans, etc.

a potentially instructive lesson (in principle if not always in practice) is of ‘diagonal’ health interventions, in which funded vertical health programs (e.g. disease-specific programs, such as an HIV-treatment initiative) be required to also engage in overall health systems strengthening (e.g.).

still a nascent idea but i think one worth having more than just me thinking about how organisations that have developed (rightly or not) reputations for collecting and entering high-quality data for impact evaluation could build monitoring systems backwards, as part of what is left behind after an experiment.

without too much detail, i’ll just note that i spent more time in the hospital in undergrad than i would have preferred. often times, i, being highly unintelligent, would wait until things got really bad and then finally decide one night it was time to visit the ER – uncomfortable but not non-functional or incoherent. on at least one occasion – and because she’s wonderful, i suspect more – alannah (aka mal-bug, malice, malinnus) took me there and would do her homework, sometimes reading out loud to me to keep me entertained and distracted. in one such instance, she was studying some communications theories, one of which was called or nicknamed the onion theory of two-way communication. the basic gist is that revealing information in a conversation should be a reciprocal unpeeling. i share something, shedding a layer of social divide, then you do and we both feel reasonably comfortable.

it didn’t take too long to connect that this was the opposite of how my interaction with doctor was about to go. the doctor would, at best, reveal her name and i would be told to undress in order to be examined, poked and prodded. onion theory, massively violated.

i mention all this because i have just been reading about assorted electronic data collection techniques, namely here, via here. first, i have learned a new word: ‘paradata.’ this seems useful. these are monitoring and administrative data that go beyond how many interviews have been completed. rather, they focus on the process of collecting data. it can include the time it takes to administer the questionnaire, how long it takes a surveyor to locate a respondent, details about the survey environment and the interaction itself (i’d be particularly interested in hearing how anyone actually utilizes this last piece of data, in particular, in analyzing the survey data itself. e.g. would you give less analytic weight to an interview marked ‘distracted’ or ‘uncooperative’ or ‘blatantly lying?’).

the proposed process of monitoring and adjustment bears striking resemblance to other discussions (e.g. pritchett, samji and hammer) about the importance of collecting and using monitoring data to make mid-course corrections in research and project implementation. it does feel like there is a certain thematic convergence underway about giving monitoring data its due. in the case of surveying, it feels like there is a slight shift towards the qualitative paradigm, where concurrent data collection, entry and analysis and iterative adjustment are the norm. not a big shift but a bit.

but on the actual computer bit, i am less keen. a survey interview is a conversation. a structured conversation, yes. potentially an awkward conversation and almost certainly one that violates the onion theory of communication. but even doctors – some of the ultimate violators – complain about the distance created between themselves and a patient by having a computer between them during an examination (interview), as is now often required to track details for pay-for-performance schemes (e.g.). so, while i appreciate and support the innovations of responsive survey design and recognize the benefits of speed and aggregation over collecting the same data manually, i do wish we could also move towards a mechanism that doesn’t have the surveyor behind a screen (certainly a tablet would seem preferable to a laptop). could entering data rely on voice more than keying in answers to achieve the same result? are there other alternatives to at least maintain some semblance of a conversation? are there other possibilities to both allow the flexibility of updating a questionnaire or survey design while also re-humanizing ‘questionnaire administration’ as a conversation?

i spent the beginning of the week in brighton at the ‘big push forward‘ conference, on the politics of evidence (#evpolitics) which mixed the need for venting and catharsis (about the “results agenda” and “results-based management” and “impact evaluation”) with some productive conversation, though no immediate concreteness on how the evidence from the conference would itself be used.

in the meantime, i offer some of my take-aways from the conference – based on some great back-and-forths with some great folks (thanks!), below.

for me, the two most useful catchphrases were trying to get to “relevant rigor” (being relevantly rigorous and rigorously relevant) and to pay attention to both “glossy policy and dusty implementation.” lots of other turns-of-phrase and key terms were offered, not all of them – to my mind – terribly useful.

there was general agreement that evidence could be political in multiple dimensions. these included in:

what questions are asked (and in skepticism of whose ideas they are directed), by whom, of whom, with whom in mind (who needs to be convinced), for whom – and why

the way questions are asked and how evidence is collected

how evidence is used and shared – by whom, where and why

how impact is attributed – to interventions or to organizations (and whether this fuels competitiveness for funds and recognition)

whether the originators of the idea (those who already ‘knew’ something was working in some way deemed insufficiently rigorous) or the folks who analyze evidence receive credit for the idea

questions and design.in terms of what evidence is collected and what questions are asked, a big part of the ‘push back’ relates to what questions are asked and whether they help goverments and organizations improve their practice. this requires getting input from many stakeholders on what questions are important to ask. in addition, it requires planning for how the evidence will be used, including what will be done if results are (a) null, (b) mixed, confused or inconclusive, and (c) negative. more generally, this requires recognizing that policy-makers aren’t making decisions about ‘average’ situations but rather decisions for specific situations. as such, impact evaluation and systematic reviews need to help them figure out what evidence applies to their situation. the sooner expectations are dispelled that an impact evaluation or a systematic review will provide a clear answer on the what should be done next, the better.

my sense, which was certainly not consensus, is that to be useful and to avoid being blocked by egos, impact questions need to shift away from “does X work?” to “does X work better than Y?” and/or “how an X be made to work better?” this also highlights the importance of monitoring and feedback of information into learning and decision-making (i.e.).

two more points on results for learning and decision-making. first, faced with the assertion that ‘impact evaluation doesn’t reveal *why* something works,’ it is unsatisfactory to say something along the lines of ‘we look for heterogenous treatment effects.’ it absolutely also requires asking front-line workers and program recipients why they think something is and is not working — not as the final word on the matter but as a very important source of information. second, as has been pointed about many places (e.g.), designing a good impact evaluation requires explication of a clear “Theory of Change” (still not my favorite term but apparently one that is here to stay). further, it is important to recognize that articulating a ToC (or LogFrame or use of any similar tool) should never be one person’s all-nighter for a funding proposal. rather, the tool is useful as a way of collectively building consensus around mission and why & how a certain idea is meant to work. as such, time and money need to allocated for a ToC to be developed.

collection.as for the actual collection of data, there was a reasonable amount of conversation about whether the method is extractive or empowering, though probably not enough on how to shift towards empowerment and the fact that extractive/empowering are not synonymous with quant/qual. an issue that received less attention than it should have was that data collection needs to align with an understanding of how long a program should take to work (and funding cycles should be realigned accordingly).

use.again, the conversation of the use of evidence was not as robust as i had hoped. however, it was pointed out early on (by duncangreen) that organizations that have been comissioning systematic reviews in fact have no plan to use that evidence systematically. moreover, there was a reasonable amount of skepticism around whether such evidence would actually be used to make decisions to allocate resources to specific organizations or projects (for example, to kill or radically alter ineffective programs). rather, there is a sense that much impact evaluation is actually policy-based evidence-making, used to justify decisions already taken. alternatively, though, there was concern that the more such evidence was used to make specific funding decisions, the more organization would be incentivized to make ‘sausage‘ numbers that serve no one. thus, the learning, feedback and improving aspects of data need emphasis.

empowerment in the use of data (as opposed to its collection) was not as much a part of the conversation as i would have hoped, though certainly people raised issues of how monitoring and evaluation data were fed-back to and used by front-line workers, implementers, and ‘recipients.’ a few people stressed the importance of near-automated feedback mechanisms from monitoring data to generate ‘dashboards’ or other means of accessable data display, including alternatives to written reports.

a big concern on use of evidence was ownership and transparency of data (and results), including how this leads to the duplication/multiplication of data collection. surprisingly, with regards to transparency of data and analysis, no one mentioned the recent reinhart & rogoff mess, nor anything about mechanisms for improving data accessibility (e.g.)

finally, there was a sense that data collected needs to be useful – that the pendulum has swung too far from a dearth of data about development programs and processes to an unused glut, such that the collection of evidence feels like ‘feeding the beast.’ again, this loops back to planning how data will be broadly used and useful before it is collected.