The program costs of impact evaluation

I was at a workshop last week where I was moderating a discussion of the practicalities of doing impact evaluations in conflict and post-conflict settings. One of the program-implementation folks made clear that working with the impact evaluation was a strain -- as she put it this "was pulling our field staff through a keyhole". Which got me thinking about the costs that we, as impact evaluators, can cause for a program.

Now, before I get to the list, we know there are a lot of benefits to impact evaluation. And what also occurred to me as I worked through the list was that some of these things can benefit the program, even if they are a slight distortion of resources. And, as a third consideration, it's also important to keep in mind that different methods incur different costs (matching techniques, for example, seem to skip a chunk of these costs -- but they probably make it up in the extra surveys you will have to do to get a sample you can match from). But let's get to the list:

1. A big potential cost is excess recruitment. This really binds when there is a set of program eligibility criteria. For a bunch of methods, we will need to collect data on these criteria for both the treatment and the control group pre-program. Collecting the data on the control group is where the cost comes in. And the more work there is to collect these data, the bigger this cost will be. One of the examples being talked about this workshop was a psychological therapy program. Folks are screened for this (which obviously makes sense) -- but doing this screening for both the treatment and the control was not trivial. Now, if you are doing randomized phase in, the bonus of this is that these folks will all be pre-identified when phase 2 of treatment comes around. But, other than that, this is a cost that comes with the evaluation.

2. Holding back the enthusiasm of program staff. In one program that I worked with, the field staff were so eager to get going that they started running out to communities before treatment and control were assigned. These guys were particularly gung-ho, but the general point here is that in cases where you are doing an RCT, with assignment based on data that is being gathered, there is likely to be something of a program delay as the data is entered for the assignment to treatment and control to be made. Now, the good news is that new computer assisted data collection tools skip the data entry step, so they should help reduce or eliminate this particular cost. But, if you are pulling your data from excess recruitment that took place through the project's infrastructure, this might be less likely to be computerized.

3. Collecting more data through the project's monitoring system. I've worked on evaluations where the fact that we were doing an impact evaluation added some variables to the monitoring indicators the project was tracking. These can range from more frequent or detailed measures of people who enroll in the program to detailed attendance measures (name, id number, and present/absent for each session). Now given that these focus on steps in the causal chain that underpins the program, this strikes me as sometimes being a less obvious "cost" -- knowing these things in the absence of an impact evaluation might have been useful for the program. And the extra supervision, support, and scrutiny that the evaluation team brings to the monitoring system may (in some cases) improve the overall quality of the system.

4. Multiple arms. The logistical work to make sure that multiple treatment arms are rolled out to the right beneficiaries clearly presents a cost to the program. And one thing to think about is that this cost can end up biting at a critical time: as a program is trying to get activities off the ground, the extra work to make sure that these folks get variant A, those folks get variant B, and the folks over there get variant C may really strain logistical capacity that needs to be making sure anything works.

5. Stretching out the program with sample size. Take the case of a program that was going to target everyone in a defined neighborhood. Now the evaluator rolls in and everyone agrees the need for a control group. And the idea here is to half the folks in the original neighborhood and half in the next one over. Clearly, logistical costs have gone up. Another variant of this comes from randomized phase-in. In one program I worked with, we were evaluating the program in two large districts, a significant distance apart. The program folks treated half of the sub-districts in one catchment area, then half in the other. Then, they had to go back to the first district, and treat the control group and then, on to the second for the same. Clearly, it would have been cheaper for them to hit each contiguous sub-district as they went. But, in this case, we discussed the costs and agreed that it was worth it for the lessons we would get from the evaluation.

6. Distorting the program effort towards the component being evaluated. This issue also came up in the workshop, where one of the program management folks felt that a disproportionate amount of program supervision and other attention was going towards the one (relatively small) component that was the subject of the evaluation, while other, larger, interventions were getting less attention than they might in the absence of the evaluation. This can come from the incentives program folks have to get good results or, as I have seen in practice, from a fascination with the evaluation process and the ability to learn. Either way, this represents a potential distortion (more than a straight cost) for the program and raises the question about what you are evaluating (I'll skip the Heisenberg references on this one).

These are some of the costs that are likely to arise for the programs we work with. Obviously, they are things that will hopefully come up in discussions about the design of an evaluation and thus avoid more difficult discussions later on. Further thoughts on other costs, as well as mitigation measures are most welcome. Finally, one other thought to ruminate on – given that there are likely to be some positive costs to the program from participating in an impact evaluation, this is one driver of the selection bias into what gets impact-evaluated…

Comments

My first thought - roll-out to different groups in conflict and post-conflict settings, even if explicitly randomized, raises the risks of widening gaps between the "have" and "have-not" groups and therefore exacerbating the conflict. Clear communication and public (participatory) randomization might help mitigate these risks.

Do you have any good examples of evaluations where the issues related to impact evaluation and conflict were carefully thought through and addressed?

Vivek,
i think the principles you lay out make sense. And in the evaluations we do in these contexts, these are things we strive for. In addition, one of the points that came up in the workshop (where the particular context was therapy for surviors of violence) was that the control group didn't get nothing -- they got a different type of therapy (and it was clear ex ante which one dominated). My guess is that this is likely a common course of action where you are in post-conflict settings and resources are not constrained.
However, I would like to hear more about this -- so if anyone has examples of how they handled this please do email me and I can do a later post that does a better job of answering Vivek's question.

One of the arguments made by Nick York in a previous blog in favor of impact evaluation was that it was a global public good. So this might beg the question of the ethics of imposing costs on the poorest post conflict environments in favor of a global public good that will benefit other countries including less needy communities and countries.

In a similar vein, the ethical issues of imposing participation costs or excluding treatment from the control groups would seem to be more compelling in a post conflict environment where the priority might be more immediate.

And on a practical level, it is harder to administer impact evaluation where the administration is weak or absent. I think a lot of the pioneering work in impact evaluation was done in middle income countries.

Would you think there is a case for steering clear of fragile and post conflict environments (particularly if the evaluation process itself might exacerbate conflict)?

I am guessing this is not always the right answer because with increasing shares of resources going into difficult environments we would need the evidence to know what is working and not working in those very environments.

All this is, or may be, true. Sure enough, these should be taken into consideration. But it is truer that bringing about an intervention with no good evidence about its potential impact is all too frequent. Money, good will and effort are wasted so frequently when doing development work. In post conflict settings (actually, in all settings) no intervention should be implemented without a reasonable good impact evaluation, save for those instances for which hard evidence has already being accumulated. Given that cases in which no good evidence is available far exceeds cases where that evidence exist, impact evaluation should a feature of all development work , instead of an afterthougt. In other words, we should discuss whether NOT to conduct impact evaluation instead of it should be conducted.