Evaluating the Millennium Villages Project

I’m a postdoc working with Andy and Jeff Sachs on the evaluation of the Millennium Villages Project, a ten-year economic development project operating in ten sub-Saharan African countries. Our evaluation protocol was recently accepted by The Lancet (full text here, and the accompanying technical paper here). We welcome your thoughts!

I remember the last round of papers from the MDV evaluations. I was never super-convinced by the use of DHS-based data for the comparison group in the Lancet paper, which seems to be the basis of part of this evaluation too. Now that said, I do understand it may be the best way forward, but that survey is trickier than it appears and I thought I’d throw out some of my concerns with using the DHS in this manner. I’m sure you’ve noted most of these, but in case any of these thoughts are helpful – some DHS-related comments….

General points:

1- the DHS does not claim to be in any way representative at the PSU/cluster level, it is something they emphasize repeatedly on the User Forum. I’m sure you have some nifty smoothing you can do, but any cluster-level estimate is going to be very, very noisy. In the case of some of the anthropometry and mortality data, there might be some additional concerns that make the raw cluster-level estimates almost completely useless (like age-profile effects and birthdate mis-reporting).

2- There is also the issue of timing brought up last time around by Bump, Clemens, Demombynes and Haddad– these DHS surveys will be taken in different years at different times of the year, and both of these factors are likely to matter. In particular, with things like sanitation/water access, you don’t want to compare a MDV in 2014 with a DHS cluster from 2011… access to water/sanitation is changing very quickly in some places (the Bump+ critique on a different outcome). Additionally, with things like child weight, these do truly vary over the seasons (height not so much) and so comparing Spring/Fall surveys may not make sense.

Some specific comments on outcomes:

1 – Outcome 1.8 (anthropometrics): the age distribution of children in the cluster is going to strongly determine the mean stunting/wasting rates. With, say, 30 -50 kids per cluster, if you happen to catch a bunch of newborns in that cluster you will have very low stunting rates. If you catch kids over 2, the estimate will be much higher. “Average stunting rate” makes some sense when comparing kids of the same age distribution, but really only “age-specific stunting rate” makes total sense for kids in developing countries…. Maybe “over-2” stunting rate is a reasonable compromise, since the loss of height relative to the reference population seems to occur mostly before age 2 (this holds mostly for standardized weight measures too).

My totally personal (and possibly no one else’s) preference here would be to look at how the intervention bends the outcome-age profile for children. Is there an immediate improvement? A slowly accumulating effect? A decrease in the rate/duration of loss relative to the healthy reference population? This would show precisely how the intervention is affecting child development in a more complete manner and also negate the possibility that the result is either temporary (fades over age) or spurious (due to differing age distributions of children). Also, I’d like to see how you might do that.

2 – Outcome 2.1 and 3.1: I don’t see how you get this from the DHS… there is no schooling data on children between the ages of 5 and 15, right? How can you get contemporary primary school enrollment rates?

3 – Outcome 4 (mortality): if the MDGs are just 10 years old now, it means you can only estimate the impact on U5 mortality rate from the first 5 years of the program, yes? And then only if you have DHS surveys taken right now and limit yourself to the children born exactly 6 years ago. Also, there are the age-reporting problems that Bruno Schoumaker has noted and other problems with parents remembering when kids died (G-d this job is depressing sometimes). In general, I think sticking to U1 mortality in the 5 (or even 3) years preceding the survey makes sense. But if you have to use DHS surveys from 3 years ago and compare them to the same time period in the MDVs, you will likely under-estimate the full treatment effect since the program was still gearing up.

We use only pre-treatment DHS data and only to supplement the matching phase (which is otherwise based on geographic characteristics). We do not use DHS to measure the outcomes, for some of the reasons you’ve mentioned. Our project is collecting its own end-line (2015) survey data in both the treatment and control areas.

I believe there is a variance, not bias problem (i.e. a precision, not a representativeness problem), but either way I agree with you that it is a problem! Figure 2 on p.15 of our technical paper has intervals of uncertainty around the DHS indices for exactly this reason. Creating indices rather than using the raw DHS variables helps reduce the variance slightly (since we’re taking averages).

We only use DHS data for the matching, and for that the DHS round is the same between the treatment and control areas within each country (e.g. the Ghana 2003 DHS was used in both the treatment and control areas to conduct the matching). Your concern about seasonality is of course very very valid. Even the 2003 Ghana DHS could have been rolled out such that the treatment and control areas were surveyed at slightly different times. However, for the end-line surveys (the ones measuring outcome data) we are conducting them simultaneously in the treatment and control areas.

You make a great point about the age distributions and stunting. I think doing an age-distribution adjustment would probably be wise, and we can certainly look into that!

Your comment about mortality is also important, since mortality estimates use birth history data from the previous few years, thereby taking a kind of average over the years 2010-2015. The methods still
use information from children born in 2014, as they will contribute hazard rates to children age 0-12 months.

1. Link up Sections 2.1 on selection of MV and 6.2 on matching. If you know the selection mechanism exactly then the problem is not unconfoundedness so much as possible lack of overlap (bc of deterministic assignment). In practice, the discussion of 2.1 makes it seem selection criteria were not explicit. Still, you want to make sure to match for whatever explicit criteria were used.

1. You are correct in noting that the three guidelines in Section 2.1 are not explicit selection criteria. However, as you say, it is still important to use these criteria in the matching, to the extent possible.

(a) “located in areas of severe chronic malnutrition” – in our matching, we use enhanced vegetation index (a geographic variable) and a DHS health index that includes measures of malnutrition (e.g. length for age z-score for children).

(c) “and recommended by expert committees, including government officials” – this one is tricky. We can’t go back in time and ask the government officials from 2004 which other areas they recommend. We did dig through old emails and for two or three countries we found some shortlists of areas that the government recommended. Unfortunately, whenever shortlists did exist, those other candidate areas were in different districts and agroecological zones from the treatment areas. A case can be made that those would have been good matched controls regardless (if one really believes government recommendation is the most important confounder to adjust for), but we decided to instead focus the matching on geographic and DHS (health, education, wealth) characteristics.

2. I had never seen this Spirit checklist before! It looks like a good guideline (though our study isn’t strictly a clinical trial, so it would have to be slightly adjusted). Thanks for linking!