Birthdays and heat waves

I mentioned the birthdays example in a talk the other day, and Hal Varian pointed me to some research by David Lam and Jeffrey Miron, papers from the 1990s with titles like Seasonality of Births in Human Populations, The Effect of Temperature on Human Fertility, and Modeling Seasonality in Fecundability, Conceptions, and Births.

Aki and I have treated the birthdays problem as purely a problem in statistical modeling and computation and have not looked at all at work of demographers in this area. So it was good to learn of this work.

Wilde et al. report that babies born 9 months after hot weather have better educational and health outcomes as adults, and they attribute this to a selection among fetuses, by which the higher temperature conditions make fetal development more difficult so that the weaker fetuses die and it is the stronger, healthier ones that survive. As is typically the case, I’m suspicious of this sort of bank-shot explanation.

Wilde et al. talk about the causal effect of temperature but I’m guessing it can all be explained by selection effects of parents, that different sorts of people get pregnant at different times of the year, with no causal effect of temperature at all. Yes they run some regressions controlling for family characteristics but I get the impression that the purpose of those regressions was just to confirm that their primary findings were OK: As sometimes happens in this sort of robustness analysis, they weren’t looking to find anything there, and then they successfully didn’t find anything. Not what I’d call convincing. The whole thing just seems like massive overreach to me. Also seems odd for them to talk about temperature “shocks”: It’s hardly a shock that it gets warm in the summer and cold in the winter.

I’m not saying that temperature at conception can’t have any effect on fetal health; I just don’t find the particular argument in this paper at all convincing. It’s the learning-through-regression paradigm out of control.

P.S. It’s April, and it just happens that the next available day on the blog is in August. What better time to post something on the effects of heat waves?

P.P.S.See here for further discussion by Joshua Wilde, the first author of the paper I write about above.

If there is anything here, and it may well be total baloney, I don’t think it’s the actual temperature of the gametes or the fetus that’s the issue, it’s probably more the stress hormones and other responses to high temperature that the mother experiences (changes in respiration rate, changes in blood flow, changes in activity level, whatever) I suppose with males there’s some issue of temperature affecting the quality of sperm. Males have external gonads in large part because of temperature control.

There’s also definitely the issue of exposure to sunlight, vitamin D is produced in the skin through a reaction that requires UV light exposure. How would you tease this out? Without vit D measurements you couldn’t.

> Wilde et al. talk about the causal effect of temperature but I’m guessing it can all be explained by selection effects of parents, that different sorts of people get pregnant at different times of the year, with no causal effect of temperature at all.

Wouldn’t the “different times of the year” effect be captured when they “[use] region-month fixed effects to control for permanent geographic or seasonal characteristics which may affect these outcomes directly, allowing [them] to identify the effects using only the random variation in temperature”?

No, in fact region-month fixed effects impose a discontinuous in time structure on a continuous process (the seasons) so they may be entirely discovering the residual error in approximating a continuous process by a series of step functions. See:

Imagine if you were calculating the importance of some signal in a spatial region, and it looked like that “e” and you were trying to pick out the importance of a single pixel near the “boundary” (single month 9 months before birth) and the real reason it was “important” was just related to the noise that makes this image all blocky and if you did your calculation with the underlying smooth measurement, it’s the *difference* between the underlying smooth measurement and the blocky one which you are “detecting” as your “signal”

It’s just an analogy so it’s not perfect, but it does give the general sense of the idea.

I don’t see how your previous comment was about interactions between parental characteristics and time of year. I understood you argued that the dependence of the outcome on the temperature was just a seasonal effect. If the issue is just that “different sorts of people get pregnant at different times of the year”, and assuming the mixture doesn’t change from year to year, I don’t understand why the first order effect (on the average outcomes) wouldn’t be captured. Of course a more complicated model may be required if we don’t think that those fixed effects are constant. And later they also include additional explanatory variables (mother education, family wealth) together with the temperature, maybe they didn’t find anything because there was no big effect left to be found.

If there is an interaction between parental characteristics and time of year of births, then you can’t just take the coefficient of temperature in a regression and take it as a causal effect. The problem is that you’re comparing apples to oranges, as it were. As I wrote above, “I’m guessing it can all be explained by selection effects of parents, that different sorts of people get pregnant at different times of the year, with no causal effect of temperature at all.” Controlling for predictors of region x month won’t fix this. It’s a subtle point, though, and not well explained in textbooks, so I can see how people can be confused here. “Correlation does not imply causation” is not just a slogan; it’s a real thing!

Maybe we are talking about different things. I thought you were saying that “different sorts of people get pregnant at different times of the year, with no causal effect of temperature at all” and “it can all be explained” by this.

If “it can all be explained” by this, every August in Kinshasa the same kind of people will be getting pregnant, with no causal effect of temperature at all, and then the average outcomes of the people born in August in Kinshasa will be the same every year. And the variation around this average outcome shouldn’t be correlated with the temperature either. (While the average outcome of people born in August and people born in February will be different, not because of the difference in temperature but because different sorts of people get pregnant at different times of the year.)

Otherwise I agree that correlation doesn’t imply causation (and I’m pretty sure that it’s not the future outcomes of children which are causing the current temperatures).

Teasing the two apart requires more than “there’s a statistically significant temperature coefficient” in the vicinity of conception.

Just consider the alternative hypothesis: “everything that happens in the vicinity of conception is important” then, you’d expect integrate(foo(t)*Beta* normal_pdf(t,-9,1),t,-15,0) to “show an effect” for all functions foo: temperature, barometric pressure, calorie intake, stock market volatility, the price of chicken, duration of average phone call, consumption of alcohol, number of hours spent driving…

basically anything that has any effect at all on the mother averaged within a smooth window of a month of conception would show up amplified in the baby’s health status. That’s why they tell you to start doing everything you can to improve your health several months before you conceive.

Of course, what’s really going on is that there are probably two or three different kinds of “physiological stress responses” and these stress responses respond to lots of different stimuli. So you don’t wind up with an additive model. For example, if you choose “temperature, calorie intake, financial stress” you’ll find effects for all three, but if you add in “relationship stress, test taking stress, and exercise activity” you’ll find that they don’t show additional effects.

On the other hand, if you start with “relationship stress, test taking stress, and exercise activity” you’ll find that adding “temperature, calorie intake, and financial stress” adds nothing….

The underlying model is: “35,000 different things all combine to produce a small (maybe 3 or so) dimensional internal state that affects baby health”

which means, if you look one-at-a-time you can write 35,000 different academic papers!

Yes, I agree with both of you. Carlos is right that I was wrong in my simple reasoning regarding different sorts of people having babies at different times of year. And I agree with Daniel that the real-world story has gotta be so complicated that there are just too many ways for this sort of observational regression to go wrong.

Andrew – What are you doing? You can’t admit you were wrong! I mean, you were, but you can’t admit it! You’ll get drummed right off the Ted Talk circuit!

You know, like how untenured faculty can’t just hop on the internet and criticize papers written by people in their field. Even if people in Africa often don’t really know what year and month they were born in. Someone powerful might get mad if I did that! So I just shut up about it.

I’d want them to control for parental job type. Many academics have quite a bit of additional “recreational” time in the summer while those while parents engaged in more outdoorsy or manual type jobs may find themselves with more down-time (or at least with lower levels of exhaustion) during colder months.

How much better? Ok, so I think, let’s look up the original source linked to.

The most the abstract will deign to tell me is this: “…individuals conceived during heat waves have higher educational attainment and literacy, fewer disabilities and lower child mortality” Great! But still no sign of that pesky effect size.

OK, I will persevere and try the conclusion. Surely the effect size will be prominently advertised there. After all that’s the conclusion. No luck. The best I can get are vague generalizations: “temperature extremes at the time of conception are associated with better human capital outcomes later in life”.

But that elusive effect size is still not found.

Doesn’t it matter whether mortality of this “hot” cohort was measured to reduce by 0.0001% or 10%??!! Why do papers and all the reporting about papers hide this crucial detail deep within the bowels of these papers?

That is pretty much why I wrote earlier that I had no interest in bothering with this paper (yet I still apparently read these comments…). If you won’t even bother to show me a propaganda plot[1] so I can quickly see what your point is, it seems likely you never looked at plots of the data yourselves or you are hiding something. Ie either you are incompetent or malicious, either way why pay attention to you? Also, I read a lot of scientific literature from the early 1900s where they managed to include hand drawn plots, so it is not a “people trained during a different era of technology” issue.

Well, they do mention the effect sizes in the main text. E.g., “For example, a one standard deviation increase in temperature nine months before birth increases years of schooling by 0.06 years, which corresponds to a 1.15 percent increase from the mean. Similarly, the probability of being literate increases by 0.97 percent, while the probability of reporting any disability falls by 4.6 percent.”

Academics are a pretty small fraction of the population even in countries like the US, moreso for countries in sub-Saharan Africa, so I don’t think that would be very relevant.

Also, the paper talks about “heat waves”, not summer vs winter, probably because these are tropical countries where the mean temperature doesn’t vary much from summer to winter (e.g., Uganda’s mean monthly temperature varies from about 21.5 C in July to 23.5 C in February and March).

Two things jumped out at me: 1) It would be more interesting if they did the same thing with cold by looking at groups living not in Sub-Saharan Africa but in the Arctic and 2) I have trouble with the findings because they say there was no effect of in utero heat waves. Why? Two reasons: 1) because the fetus is carried for 9 months and undergoes weather for that period so why is there no effect and 2) because conception is interior to the body and though the body may be stressed by heat (or cold) the internal temperature isn’t going to vary much so I can’t see how one could claim an effect at conception unless there’s also an effect during pregnancy (and even then), meaning I would think the conception effect might be the weakest in a mental model of effects of heat on the entire conception through birth process.

My first thought (and this is pure supposition) was that potential parents who can afford air conditioning would be more willing to participate in the heat generating activities leading to conception during a heat wave, while potential parents not able to afford air conditioning may choose to wait for things to cool off. Then the parents who could afford air conditioning would also be more likely later on to afford tutors, college tuition, health insurance, better nutrition, etc.

So some socioeconomics (along with job type as suggested by mark) would really need to be adjusted for.

I happened to be pointed to your post this week on our paper “Heat Waves at Conception and Later Life Outcomes” by a colleague who was concerned that you had misrepresented what we did in the paper. I read your post and I just wanted to clarify our methodology because it seems like your critiques were based on an inaccurate understanding of what we did.

First, including the region-month fixed effects does control for the average parental characteristics in each region in each month. If every August, individuals in Kigali who have babies in June have on average one extra year of education, and that pattern held every year, then this is a fixed characteristic of the region-month which does not vary. As a result, it would get sucked up into the region-month fixed effect.

Second, including the region-month fixed effects also means that the changes in temperature year to year actually are shocks. If, on average, it is 80 degrees in Kigali a summer month, and 20 degrees in a winter month, then those average temperatures are fixed for every region month and it gets sucked up into the fixed effect. We could have just included in our regression the deviation from the average temperature to get at the shock, but our methodology is equivalent. In fact, our methodology is better because including region month fixed effects also controls for every other thing that is fixed across region months / seasons – like the fact that different types of parents conceive at different times of the year. So you are right . . . it’s hardly a shock that it gets warm in the summer and cold in the winter. But that is not where our identification is coming from. They are literally coming from temperature deviations from the region-month averages – or shocks.

In the literature on the effects of temperature in economics (which is extremely large – see this recent review article for more details http://economics.mit.edu/files/9138) this is the standard methodology in the field. In fact you can’t get anything published if you DON’T include region-month fixed effects. See the canonical panel model in equation 3 of the linked review article.

Third, when we control for parental characteristics, we are not testing for whether different types of parents conceive at different parts of the year – that is already controlled for using the fixed effects. We are asking whether the fact that different types of parents do select into conception based on how hot it is outside (which we did find — see next paragraph) fully explains our estimates. We were looking to see if our statistically significant coefficients on temperature went away, and they didn’t. We weren’t looking to find nothing and didn’t find something as you allege – we were controlling for potential confounders and still found our main result.

Fourth, this paper was written using data from sub-Saharan Africa. Temperature varies very little season to season anyway. But more importantly, the health environment is vastly different there than in the US. Temperature is correlated with famine and malaria – very real reasons why temperature might cause in utero conditions to be very, very different during heat waves as opposed to in the developed world, where the only risks are hyperthermia. So this paper isn’t really about temperature per se – it is more about in utero health conditions more generally (including disease and nutrition), and the effect that increased heat shocks caused by climate change might have in the developing world.

Finally, the purpose of the paper was to show that this correlation between temperature and outcomes exists since it will inform the debate on climate change. Pure and simple. Explaining why, as you alluded to, is almost impossible since there are a million different things which might cause this correlation. Our strategy was to be agnostic about why this correlation existed, and to take several hypotheses (including the one you mentioned) and to provide what evidence could be provided on the matter – it was never intended to be a conclusive explanation. We don’t need a conclusive explanation to show that temperature matters.

You are right – we found that including parental characteristics didn’t control for selection into conception based on temperature shocks. But we did find that parents who conceived during heat waves were wealthier. We also found that the fall in sexual activity was greater for uneducated women – meaning that this parental selection WAS taking place and we documented it in the paper. We included this all in the paper because we didn’t care what explained the correlation – we just wanted to know what the data said about potential mechanisms that could be tested in some manner. At the end of the day, we didn’t find strong evidence that our main effects were being caused by parental selection (even though we did find evidence of parental selection), and we said so. There is a difference between not finding strong evidence and conclusively saying that something is not happening. We also state that the fetal loss story is our preferred (not definitive) explanation because we find more evidence for it.

I thought part of Andrew’s worry was that only controlling for region-months leaves behind any average region-by-time variation that is within months.

This would seem to be of a piece with his comments on the Case & Deaton life expectancy work, in that by adopting the standard econometric approach of lots of fixed effects for standard buckets (which often on the surface looks like avoiding bias at all costs, including variance), bias can be left behind.

I think that there is no within-month variation, because the temperatures are averaged up to the month of birth. I agree it could be a problem if they had daily- or weekly-level temperatures with month FE, but I don’t think that is what they did here. Maybe I’m wrong.