GiveDirectly Three-Year Impacts, Explained by the Authors

[Update: 11:00 AM on 4/23/2018. Upon the request of the guest bloggers, this post has been updated to include GiveDirectly's updated blog post, published on their website on 4/20/2018, within the text of their post rather than within Özler’s response that follows.]

First, some background: our study was a cluster level randomized trial. We had a treatment group (receiving cash transfers), a spillover group (neighbors of the treatment group), and a pure control group (living in entirely separate villages). In our study of the short-term impacts of cash transfers, the primary comparison is between the treatment and spillover group - we explain why below. In both the short and long-term studies we show all possible comparisons (treatment to spillover, treatment to pure control, and spillover to pure control) to the interested reader.

To clear up several points in Berk Özler’s discussion of our long-term study:

The reason for a partial baseline

Berk raises the concern that our pure control group did not complete a baseline, and was selected at a different time than the other participants. The lack of baseline is not a problem for unbiased inference -- the treatment effect is identified without it, and we were not sure we had enough funds to do the additional 1,000 surveys. Given the cost of randomizing at the village level is practically zero; we decided to see if we could afford just the endline.

The second concern is valid: we did not identify individuals who would be part of the pure control group at baseline (only the villages). They were selected later using the same criterion. This creates the possibility that people fulfilling this criterion at different times might not be comparable.

2. Why the partial baseline does not strongly bias the results

In the short-term study, we show that this possible bias from applying the selection criterion late is not very important. We do this by finding out, based on new data collection, how many people we missed in the pure control villages because we applied the eligibility criterion at a different time. That number is five households for the entire sample, which is small relative to our sample of about 1,500 households. To be conservative, we additionally correct for the omission of these five households through bounding techniques. Not surprisingly, the results when applying the bounding techniques are not very different from the uncorrected results.

3. The within vs. across-village debate

Thus, the across-village comparisons in our first paper are not in strong doubt. Because spillovers are mostly small, this also means that the within-village treatment effects can be considered reasonable estimates of the treatment effect, with the benefit of higher statistical power.
The interpretation that we tried to "get away" with the within-village comparison in the first study to "prime" the reader for a particular interpretation of the long-term study is unreasonable. Not only was the short-term study long complete before the long-term study was written, but as Berk acknowledges, we presented both within- and across-village comparisons in the first study. The greater emphasis on the within-village results for reasons of statistical power was our best scientific judgment, and we still believe it is correct -- Berk also appears to have no concerns with this. The possible interpretation that we were interested in misleading readers is not justified.

4. The Haushofer, Reisinger, & Shapiro paper.

In 2015 we wrote another paper using the data from our short-term study. In that study, we used a different approach, comparing villages in which a larger share of households received transfers to villages in which few households received transfers. We used this different methodology because we were interested in how changes in village-level inequality might impact residents, and in the analysis discovered some evidence for negative spillovers (for specific forms of psychological well-being). Berk rightly notes that the spillover estimates are different from those in the short-term impact paper, which is a consequence of this different methodology. We find some signs of negative spillovers, which were reinforced in our long-term study.

We agree with Berk’s interpretation of the results. By this we mean that a) the long-term study shows some evidence of negative spillovers, even if it is not conclusive, and b) GiveDirectly’s blog post is selective and does not provide a balanced interpretation of the results presented in our paper. To those points of agreement, we would add:

Attrition casts some doubt on the spillover results

In the long-term study, we faced differential attrition: in the pure control group, we were less successful in re-surveying households than in the treatment and spillover groups. When adjusting for different attrition, the spillover results are less robust. It is this concern, not the differential timing of the selection of pure control households, which casts doubt on our long-term spillover results. We therefore take the possibility of negative spillovers seriously, but hesitate to draw strong conclusions.

2. Our study does not exist in a vacuum

It is essential that our results be considered alongside the (at least) 165 other robust studies of cash transfers. Policy decisions should not be based on the particular design or econometric specification of one study, or a blog post by a single charity.

Taking a step back, what do our results imply?

Even if one accepts the least rosy interpretation, the results do not say that cash transfers aren’t “good.” Too much solid evidence shows positive impacts on consumption, nutrition, happiness, etc., and no robust increase in alcohol or tobacco consumption or a decline in labor force participation in low-income contexts.

Our suggestive findings of spillovers raise important practical and ethical questions about operating with limited resources: do spillovers really “stop” at the village level? It is ok to increase inequality by helping some and not others? What if you hurt others, but help some even more? Though abstract, these are the sorts of questions every aid organization must explicitly or implicitly address, and our results reinforce that and perhaps provide some insight.

Perhaps most importantly, the suggestion that cash transfers have limited long-term impacts invites a debate on the role of cash transfers: are they a “development” intervention, with the aim to boost people out of poverty with a single cash injection? Or an effective tool that must be used continuously to right the indignities of global inequality?

Berk Özler responds:

First of all, I thank the authors for reaching out to Development Impact to post a response. Not only did they explain various points about their work and give their interpretation of the evidence, but were also generous in voluntarily revising their original submission in response to a few comments from me. The end result is what I consider to be a good bookend to the debate that ensued over the past few weeks: as you can see, our posts have a lot of agreement – on methods and interpretation…

I would like to clarify only one thing and that is regarding the following paragraph that may be interpreted to be about (bad) intentions:

“The interpretation that we tried to "get away" with the within-village comparison in the first study to "prime" the reader for a particular interpretation of the long-term study is unreasonable. Not only was the short-term study long complete before the long-term study was written, but as Berk acknowledges, we presented both within- and across-village comparisons in the first study. The greater emphasis on the within-village results for reasons of statistical power was our best scientific judgment, and we still believe it is correct -- Berk also appears to have no concerns with this. Thepossible interpretation that we were interested in misleading readers is not justified.”

I did say that they “got away” with estimating ITT using within-village comparisons. I also said that the result of this was that future readers were “primed” to draw inference using the same definition. But, I did not say that Drs. Haushofer and Shapiro “did” the former to cause the latter. As they state above, that would have required them to time travel to 2018, see the three-year results, go back to 2015 and decide which estimand to use so that the results could be interpreted in the best light in the future. That would not only be a ridiculous suggestion, but it would be assigning malicious intent to the authors. In this blog, we think it best for the discussions to stick to the facts in evidence and not speculate about people’s unknown motivations for writing what they wrote. I am optimistic that few, if any, of our readers got the impression of a suggestion that the authors were trying to mislead readers, but if anyone has then they should dispense with it now.

We have a minor disagreement on the chosen method in HS (2016). The authors acknowledge that they have a cluster-RCT. It is not in dispute that the standard way to define ITT in a cluster-RCT is across villages. That’s why the within-village controls are called the “spillover group.” The authors are correct that within-village comparisons have at least as much statistical power than across-village ones, but at the cost of potential bias due to interference across individuals within villages. In this sense, it is similar to the OLS vs. IV tradeoff. This tradeoff is actually apparent in Appendix Table 38, where we can see the statistical significance (and even sign) of two indices change when switching from within-village to across-village estimates – not because of a drastic power loss, but rather due to sizeable changes in the coefficient estimates, which indicates some spillovers. The authors rule out spillover effects of greater than 0.22 SD in their short-term effects paper and deem that to be small, but reasonable and informed people could disagree. I recently had an editor ask me to cite the fact that we were not powered to detect 0.10 SD effects on our primary outcome as a “weakness of the study” in the study limitations section at the end of the paper.

But, the authors are absolutely right that I do not think this is a big deal for the HS (2016) paper. The within- and across-village findings that matter are quite similar to each other; the choice ends up being innocuous. That I personally would have based the main discussion on across-village estimates and retired the within-village ones into the appendix does NOT mean that the authors’ choice is incorrect: It’s their best scientific judgment and they’re standing by it.

This choice, however, did not end up being as innocuous for the interpretation by some of the longer-term effects presented in HS (2018) – even though the authors could not have easily foreseen this back in 2016. Had it not been for this choice, there would have been no need to include the following two sentences in the abstract of HS (2018):

“Using a randomized controlled trial, we find that transfer recipients have higher levels of asset holdings, consumption, food security and psychological well-being relative to non-recipients in the same village. The effects are similar in magnitude to those observed in a previous study nine months after the beginning of the program.”

The within-village comparison yields a completely different estimand in the presence of spillovers. And, later in the abstract, the authors state: “We do find some spillover effects.” So, presenting the within- and across-village estimates as legitimate alternatives to each other is what can cause the selective reading of the program impacts, as it happened in the original GiveDirectly blog post from February. These two estimands are not legitimate alternatives to each other in the three-year impacts paper: when they are not equal to each other, only one of them is the ITT effect, but the place to adjudicate that is not the abstract.

So, when I say the audience has been "primed" to think of the within-village estimates as the program impact, this is not an accusation to the authors. They could not have foreseen the future, and they cannot control what other people write or say about their work. But, a benign methodological choice in 2016 led to an abstract in 2018 that presents comparisons of within-village estimates as ITT over time, which may have led some to claim – incorrectly – that the effects in the longer-run are large and sustained. Perhaps, the abstracts of future versions of HS (2018) can be edited to make the main takeaways clearer.

Comments

Thanks to the authors for coming on here and clearing up some of the confusion.

Huge thanks to Berk for his illuminating and educational series of posts on this topic.

1. Berk on this -

"These two estimands are not legitimate alternatives to each other in the three-year impacts paper: when they are not equal to each other, only one of them is the ITT effect, but the place to adjudicate that is not the abstract."

The one that IS the ITT effect is the across-village, correct?

2. I think "yes." And thus if Berk if we take your summary (shared on Twitter) at face value -

a. There is no stat. sig. treatment effect (other than on assets), and
b. There may be negative spillovers (with a wide confidence interval)

How do you reconcile that with -

"Even if one accepts the least rosy interpretation, the results do not say that cash transfers aren’t “good.” Too much solid evidence shows positive impacts on consumption, nutrition, happiness, etc."

i.e. What is your most updated view on the efficacy of unconditional cash transfers? Meaning - the results from HS18 do not say that cash transfers AREN'T "good," per se. But they also don't say they ARE "good." And so I'm curious about what your take is there.

Thanks. On (1), yes. On (2), my statement is just about HS(18), while the quote from HS' guest blog post is more generally about cash transfers. So, they don't really need to be reconciled per se.

There is too solid evidence, true, on consumption, mental health, nutrition, but they're all contemporaneous with transfers (or measured soon afterwards), and may not last after transfers stop (Sandefur covered this bit well in his CGD blog post last wee). To me, that's not really news - the protection role is proven: and, some people think that's all you should expect from cash transfers. Some, however, were hoping for more: one-off programs should get you over a hump, to a much better equilibrium...

My feeling on the latter is that we have a recent, still small, body of evidence that suggests that UCTs to general populations (rather than selected ones, such as microenterprise owners, etc.) may speed convergence to a slightly better steady-state equilibrium, but perhaps will not get them over a poverty trap by removing credit constraints. There may be exceptions (target population being suitable, adding intensive training to cash a la TUP programs, etc.), but we don't have the evidence that they have sustained, long-term effects that are large.

Final question - to anyone, really, but Berk would welcome your thoughts. Of course if Jeremy or Johannes want to jump in - please do!

Berk, I was interested in your response to the chart in the CGD blog post that looked at 235 vs 217 vs 188, and its accompanying interpretation, which I think you disagree with. But ... how DO we make sense of the negative spillovers - both a) technically and b) substantively?

Technically, let me just make sure I have it right.

Treatment - 235
Within-village Control (Spillover Group) - 188
Across-village Control (Should be used used to measure ITT) - 217

Let's cross out the "across-village control."

Now left with -

Treatment - 235
Within-village Control (Spillover Group) - 188

Why do we have to assume that the negative spillovers WITHIN the within-village sample apply more broadly?

If we don't have to assume that -

a. what is the best PLAUSIBLE guess as to what happened to those households that (may have) experienced spillovers?

b. is there available data on what happened to the other households WITHIN the village, who were neither the a) treatment group, b) spillover group?

Why do we have to assume that the negative spillovers WITHIN the within-village sample apply more broadly?

BO: We don't have to assume that. We simply don't know.

If we don't have to assume that -

a. what is the best PLAUSIBLE guess as to what happened to those households that (may have) experienced spillovers?

BO: I don't know. But, for one example, you could imagine that poor people are engaged in similar small enterprises in these villages and removing the credit constraint to some allows them to enter the market (and perhaps even grow a bit), but at the cost of losses to existing enterprises.

b. is there available data on what happened to the other households WITHIN the village, who were neither the a) treatment group, b) spillover group?

BO: Not in this study. Some studies do this, some don't. I'll have a post on this tomorrow...

I want to start by thanking Berk, Jeremy, and Johannes for hosting this incredibly interesting analysis over the last few posts. I can say that it's spurring a lot of really interesting debate at IDinsight and among the rest of the cozy SF development community.

I've been spending some time poring through the papers myself, and I was hoping that someone here could answer what I think is a more straightforward technical question on measurement of spillovers. In H+S '16 Table 3 (which is also Table 37 in the appendix), Column 1 reports spillover effects for the entire sample. Table 38, Column 4 in the appendix also reports spillover effects for the entire sample, so I would have expected the estimates to be exactly the same. And indeed they are for most outcomes, making it seem like they are running an identical specification.

However, there is one critical difference: the estimate of spillovers on assets. For this, the point estimate in Table 3/37 is 1.00, while in Table 38 it is 104.56. Yes, that is a massive difference, going from a precisely-estimated zero to a big positive spillover. Quite possible I'm missing something, but does anyone else pouring through these tables understand this?

Then, fast-forward to H+S '18, Table 7, column 5. Here the point estimate for spillovers on assets is 1.34. So I'm not really sure how to interpret this compared to H+S '16. Does this mean that whatever is happening with asset spillovers, it's staying pretty constant (as suggested by Table 3 in H+S '16), or am I to believe that there were initially positive spillovers that have disappeared 3 years later?

Thanks. It is a puzzle, isn't it? I first thought it was thatchd vs. ALL, but that's not it. It cannot be baseline controls, because pure controls don't have them. So, while I have not figured it out yet, I can offer a tidbit from the paper. "In addition, the magnitudes and significance levels obtained in this within-village analysis are broadly similar to those found when comparing treatment to pure control households (Online Appendix Table 38), with three exceptions. First, the treatment effect on assets is larger when estimated across villages than within villages." So, it looks like they prefer the 104.56 to the zero...

I'll let you know if I figure it out. I'll also forward your question to Johannes to see what his answer is...