Estimated effect of early childhood intervention downgraded from 42% to 25%

Last year I came across an article, “Labor Market Returns to Early Childhood Stimulation: a 20-year Followup to an Experimental Intervention in Jamaica,” by Paul Gertler, James Heckman, Rodrigo Pinto, Arianna Zanolini, Christel Vermeerch, Susan Walker, Susan M. Chang, and Sally Grantham-McGregor, that claimed that early childhood stimulation raised adult earnings by 42%. At the time, I wrote,

Overall I have no reason to doubt the direction of the effect—psychosocial stimulation should be good, right?—but I’m skeptical of the 42% claim, for the usual reasons of the statistical significance filter. . . . There’s nothing wrong with speculation but at some point you’re chasing noise and picking winners, which leads to overestimates of magnitudes of effects.

So those are my thoughts. My goal here is not to “debunk” but to understand and quantify.

The paper (which at time was in preprint form) seemed potentially important, and I sent the following email to the first author:

Dear Dr. Gertler:

I read with interest your recent paper on the Jamaica experiment, and I had some thoughts; see here:
http://www.symposium-magazine.com/childhood-intervention-and-earnings/

And, for further background, see here:
http://andrewgelman.com/2013/11/05/how-much-do-we-trust-this-claim-that-early-childhood-stimulation-raised-earnings-by-42/

I suspect that, because of selection issues, your estimate of a 42% effect is an overestimate. Do you have any thoughts on this? In any case, I hope these comments will be helpful in your future work in this area.

Yours,
…

I received no response, which is fair enough: I’m sure the guy is busy and he certainly has no obligation to respond to emails. But, just in case, I passed him the message two more times, once through a friend who works in his department (and my friend confirmed that Gertler did receive the message) and once in response to a message that someone else had sent to Gertler, cc-ing me. I was frustrated to not hear anything back—after all, if someone says your estimate is too high, that’s a big deal, no?—but at least I was happy that the message got through.

In the Friday (May 30) edition of the journal Science, researchers find that early childhood development programs are particularly important for disadvantaged children in Jamaica and can greatly impact an individual’s ability to earn more money as an adult. . . . Results from the Jamaica study show substantially greater effects on earnings than similar programs in wealthier countries. “We now have tangible proof of the potential benefits of early childhood stimulation and the importance of parenting in a developing country. . . .” Gertler said.

Annoyingly enough, the press release did not link to the published article—what’s with that, anyway???—but I did a google search and found it. From the abstract:

A substantial literature shows that U.S. early childhood interventions have important long-term economic benefits. However, there is little evidence on this question for developing countries. . . . the intervention increased earnings by 25%.

Cool! They listened to me! Or maybe it was just a coincidence. But, in any case, the estimated effect went down from 42% to 25%. We hear a lot about the “decline effect,” but in this case we’re seeing a decline from the pre-publication to the publication version of the same paper. That’s good news.

I’m not actually sure what Gertler et al. did to make the estimate fall from 42% to 25%. Part of it seems to be that they’re only looking at a subset of the data (“full-time jobs”) that happens to show a lower point estimate. But, compared to their earlier analysis, all the numbers seem to be lower, so they must have changed the analysis in some other way. I don’t see any discussion of the earlier 42% number in the article but maybe this comes up in the supplementary material.

Some other things

From the press release: “This study adds to the body of evidence, including Head Start and the Perry Preschool programs carried out from 1962-1967 in the U.S., demonstrating long-term economic gains from investments in early childhood development.” But, as I wrote on an earlier post on the topic, there is some skepticism about those earlier claims:

The most famous evidence on behalf of early childhood intervention comes from the programs that Heckman describes, Perry Preschool and the Abecedarian Project. The samples were small. Perry Preschool had just 58 children in the treatment group and 65 in the control group, while Abecedarian had 57 children in the treatment group and 54 in the control group. In both cases the people who ran the program were also deeply involved in collecting and coding the evaluation data, and they were passionate advocates of early childhood intervention.

Murray continues with a description of an attempted replication, a larger study of 1000 children that reported minimal success, and concludes:

To me, the experience of early childhood intervention programs follows the familiar, discouraging pattern . . . small-scale experimental efforts staffed by highly motivated people show effects. When they are subject to well-designed large-scale replications, those promising signs attenuate and often evaporate altogether.

Heckman replies here. I’m not convinced by his reply on this particular issue, but of course he’s an expert in education research so his article is worth reading in any case.

The other thing that’s buggin me is the following juxtaposition:

From the published article: “A substantial literature shows that U.S. early childhood interventions have important long-term economic benefits.”

From the press release: “Results from the Jamaica study show substantially greater effects on earnings than similar programs in wealthier countries. Gertler said this suggests that early childhood interventions can create a substantial impact on a child’s future economic success in poor countries.”

I don’t get it. On one hand they say they already knew that early childhood interventions have big effects in the U.S. On the other hand they say their new result shows “substantially greater effects on earnings.” I can believe that their point estimate of 25% is substantially higher than point estimates from other studies, or maybe that other studies showed big economic benefits but not big gains on earnings? In any case I can only assume that there’s a lot of uncertainty in this estimated difference.

Applying the Edlin factor?

My economist friend Aaron Edlin asked if I had a rule for routinely scaling down published estimates. I hemmed and hawed and wouldn’t give a general number. But I did write that I think an Edlin factor or 1/5 to 1/2 is probably my best guess for that Jamaica intervention example. In this case, Gertler et al. already went most of the way. The question is: had they originally reported 25% rather than 42%, what would I have said? I’m not sure.

Well, I haven’t been to Jamaica, but from a brief check of the stats the poverty rates look fairly bad. So it’s certainly not implausible that outcome heritability is much lower over there and that family environment accounts for a much greater percentage of the variation in adult outcomes than is the case in the US or Europe. If these programs are going to work anywhere, I’d imagine Jamaica would be the place.

The Abecedarian Project is a strange one. As I understand it, it was originally supposed to investigate whether or not educational & social intervention could prevent mild to moderate learning disability. The problem is that both the control group (mean IQ 89) and the intervention group (mean IQ 93) had IQs perfectly within the normal range, especially for the African-American population that the project was aimed at. So over time, it seems to have switched focus towards seeing what happened to fairly normal kids if you gave them extra stimulation anyway. Apparently the final test score difference between the groups was exactly the same as that at 6 months of age (though measuring intellectual development at that age is a very dubious business). This has led some to suggest it’s just an example of unhappy randomisation, helped out by small sample size (see Bacharach & Baumeister, http://libres.uncg.edu/ir/asu/f/Bacharach_V_2000_Early_Generic_Educational_Intervention.pdf)

Regardless, I think it’s fairly clear that results from Abecedarian should be taken with a huge pinch of salt. Anyone who places too much weight on its findings is probably selling something.

It’s also not generally the case that large-scale early childhood interventions do not work. The Chicago Child-Parent Center program is large scale, as are the various other state and local pre-K programs that have been studied, such as programs in Tulsa, Boston, and North Carolina.

In general, school teaching can be thought of as a very unglamorous form of show biz, which involves stand-up performers (teachers) trying to make powerful connections with their audiences (students). We are not surprised that some entertainers are better than other entertainers, nor are we surprised that some entertainers connect best with certain audiences, nor that entertainers go in and out of fashion in terms of influencing audiences.

In other words, if you think of entertainment as social science experiments, they have poor replicability. But that doesn’t mean that the original Elvis (the initial experiment) didn’t have a big impact on fans, just because all the Elvis impersonators (the replication experiments) seem pretty ho-hum these days. Similarly, just because it’s hard to replicate the results of successful educational and early childhood interventions doesn’t mean they didn’t originally actually happen.

The analysis is pretty complex but I’m guessing that the change in handling of the missing migrants (the treatment group was more likely to migrate, and among migrants the controls were more likely to be lost to follow up) may have led to the change. I would not be surprised if the migrants had salaries far higher than non migrants based on what I have heard from the Jamaican students I have; even working as off the book nannies in New York they are better off than at home. I’m also not surprised that so many of the treatment group at age 22 are both working full time and in college. It makes me wonder if the earlier analysis handled salaries of full time students differently than the final analysis did (22% of treatment still in school versus 4% of control). Twenty two is still too young to characterize these data as being about adult earnings if there are still over a fifth of of subjects in one group still in school. The payoffs from college attendance for even a small portion of either group are still in the future, but likely to be meaningful (and not just monetarily either, as as we saw in Changing the Odds and Passing the Torch). It will be extremely interesting to see where they are in five years.

I really wish they discussed the distribution of the 9 deaths across the groups. Among other things Jamaica has a very high homicide rate; not so high that you’d expect a homicide among 127 subjects but still, they were probably higher risk than others of the same age. From what I can tell lost to follow up and dead were treated as the same for the purposes of imputation. Maybe that is something that changed also.

From the Abstract:
“Site selection bias” occurs when the probability that partners adopt or evaluate a program is correlated with treatment effects. I test for site selection bias in the context of the Opower energy conservation programs, using 111 randomized control trials (RCTs) involving 8.6 million households across the United States. Predictions based on rich microdata from the first ten replications substantially overstate efficacy in the next 101 sites. There is evidence of two positive selection mechanisms. … While it may be optimal to initially target an intervention toward the most responsive populations, these results show how analysts can be systematically biased when extrapolating experimental results, even after many replications

“To me, the experience of early childhood intervention programs follows the familiar, discouraging pattern …small-scale experimental efforts staffed by highly motivated people show effects. When they are subject to well-designed large-scale replications, those promising signs attenuate and often evaporate altogether.”

From the Abstract:
“Site selection bias” occurs when the probability that partners adopt or evaluate a program is correlated with treatment effects. I test for site selection bias in the context of the Opower energy conservation programs, using 111 randomized control trials (RCTs) involving 8.6 million households across the United States. Predictions based on rich microdata from the first ten replications substantially overstate efficacy in the next 101 sites. There is evidence of two positive selection mechanisms. … While it may be optimal to initially target an intervention toward the most responsive populations, these results show how analysts can be systematically biased when extrapolating experimental results, even after many replications

Very interesting. A similar phenomenon is how when running certain experiments to show a treatment leads to “activation” (eg phosphorylation) of a protein/ signaling pathway they do it under serum starved conditions which lowers the activity of a cell in general.

That’s very interesting; basically you’re providing an actual explanation of why scale up often doesn’t work. In a way it reminds me of the “ready to change” factor in the evaluation of addiction treatment. What’s interesting is the idea that you might be able to model what settings a given intervention is likely to have the biggest impact in. It doesn’t necessarily mean “nothing works” and it’s not some mysterious consistent false positive internally. At the same time, a lot of times you’ll see interventions criticized for not being effective for the hardest cases, like school interventions working with parents are motivated/have some college or with kids who are already relatively successful in school. On the one hand, nothing wrong with having a program that works for them especially where they are in awful schools where despite their advantages they will still likely end up not graduating from high school or without college skills. On the other, the kind of program in the article discussed is likely only to be most helpful in very high need cases with diminishing returns as the situations improve even mildly. These children had stunted growth and were ages 0-2 in a very high poverty and high violence country. They were already behind their peers. They are probably among those children most likely to benefit from almost any intervention. 22% still in school versus 4%, that’s totally astonishing to me.

Andrew did you read the supporting materials? The 25% in the abstract and results table shorter paper is for all jobs but the text reports “substantially larger” impact on full time non temporary work which is shown in the original paper and the supporting materials for this paper. Still above 40% in that scenario in the supporting materials. Of course the college attenders more often have work study , part time or student type temporary jobs.

[…] good news, whether it be the effectiveness of the latest cost-saving teaching innovation or the claim, based on a study of 130 kids in Jamaica, that early childhood intervention can raise children’s […]