"Inequality at birth is neither just nor unjust. What's just and unjust is the way institutions deal with it" - John Rawls

"I always had a certain dislike for general principles and abstract prescriptions. I think it's necessary to have an "empirical lantern" or a "visit with the patient" before being able to understand what is wrong with him. It is crucial to understand the peculiarity, the specificity, and also the unusual aspects of the case" - Albert O. Hirschman

Pages

The fifth caveat of evaluations - Average effects can be misleading

There was a very remote village without access to good educational facilities. A small for-profit company had developed a technology tool which can teach children without the need of a teacher. It set up learning centers in couple of villages. Some interested parents joined their children in these centers and over a period of time it was observed that the students were getting very good at Maths, both from the observations of the center staff and also in their school grades.

Now, the company wanted to get a rigorous evaluation done of its program. An external evaluation agency did a Randomized Controlled Trial in the same location and reported that there isn't any significant impact of the program on the test scores of the students.

What do we make of this evidence?

It is a possibility that the students who were attending the center initially were the early adopters, who were willing to invest and were intrinsically motivated. It is argued that these students would anyway learn with the help of any product. Hence, the initial anecdotal evidence reflects much about the motivation of students and not about the product.

This line of argument misses out an important factor. The intrinsically motivated students were of the same nature, with the similar motivation, even before the program was launched in that area. Why weren't they able to learn and weren't scoring good then?

The point being, even if the students are motivated, they need access to resources to channelize the motivation. The program acted as one such source to such motivated students deprived of any learning opportunities, which resulted in their higher test scores. It is a different debate if other learning products increase the scores as effectively, for the same group of students.

The correct interpretation of this program should be, it helps motivated students who were earlier deprived of learning opportunities but doesn't seem to have effect on others. Just because the program didn't help all types of students as reflected by the average effect, doesn't mean that we disbandon the program. If motivation turns out to be the hindrance, then this should be worked on but not stop the program.

RCTs tell us the overall impact and not the impact on a particular child. In cases where the number of non-motivated students outnumber the motivated ones, average dilutes the effect. Hence, it is important to segment the effects some times. The segmentation is obvious in this case but needn't be so in all cases.

Instead of just asking 'Does this program work?', it would be better to also ask, 'On what type of students does the program work?'. The average effects as reported by RCTs can be misleading.Update:
Michael Clemens and Justin Sandefur wrote this amazing article on worm-wars explaining the results of replication study of the famous deworming paper by Michael Kremer and Ted Miguel. The example discussed here also demonstrates that average effects can be misleading.

I am reproducing the relevant text and images from the CGDEV article here below.

"The
school that got treated (deworming) is the black house in the center. Each circle around
the black house is some other school that didn’t get treated. The number on
each of those other schools is the spillover effect from treatment at the
school in the center. For example, the number could be the percentage increase
in school attendance at eachuntreatedschool due to
spillover effects from the treated school.

Looking at the map, in this schematic example, it’s obvious that
there is a spillover effect from treatment. You don’t need any statistics to
tell you that. Schools near the treated school have big increases in
attendance, schools far away don’t. It’s obviously very unlikely that’s this
pattern is just coincidence.

In the schematic picture above, using the made-up numbers there,
the average spillover effect inside the green circle is 1.6. Suppose that, due
to statistical noise, we can only detect an effect above 1; so this short-range
effect is easy to detect.

The average effect in the 3km to 6km is only 0.25. That’s below
our detectable threshold of 1, so we can’t distinguish it from zero.
Furthermore, in this example, the average spillover effect atall76 schools inside 6km
is just 0.6 — a statisticalgoose egg.

How would you report a correction to this mistake? There are two
ways you could do it, ways that would give opposite impressions of the true
spillover effects.

You could simply state that when you correct the error, the
average spillover effect on all 76 schools in the correct 6km radius is 0.6,
which is indistinguishable from zero. That’s an accurate statement in
isolation. This is essentially all that is done in the tables of the published
version ofthe replication
paper. On that basis you could conclude, as that paper does, that
“there was little evidence of an indirect [spillover] effect on school
attendance among children in schools close to intervention schools.” Strictly
on its own terms, that is correct. That’s the average value in all the circles
in that picture.

But wait a minute. Look back at our schematic picture. It’s
obvious that thereisa spillover effect. So
something’s incomplete and unsatisfying about that portrayal. First of all, the
average spillover inside the 3km green circle is 1.6, which in this example we
can distinguish from zero. So it’s certainly not right to say there is “little
evidence” of a spillover effect “close to” the treatment schools.

So how could you report this correction differently, in a way
that shows the obvious spillover effect? Using the same hypothetical data from
the figure above, you could show this:

This picture shows, again for our schematic example, the average
cumulative spillover effect out to various distances from the treated school:
all the schools out to 1km away, all the schools out to 2km, all the schools
out to 3km, and so on.

Here, there’s a big spillover effect nearby the treated school.
That effect peters out as you expand the radius. In this example, it gets
undetectable (falls below 1) once you consider all the schools within 5km,
because the overall average starts to include so many faraway, unaffected
schools."

Learning: As explained earlier, average effects of RCTs alone can be misleading and one should be mindful of this. This highlights the importance of estimating heterogeneous treatment effects, which helps in uncovering the underlying phenomenon.