Does having sons make you more conservative? Maybe, maybe not. A problem with controlling for an intermediate outcome

Several studies have been performed in the last few years looking at the economic decisions of parents of sons, as compared to parents of daughters. For example, Tyler Cowen links to a report of a study by Andrew Oswald and Nattavudh Powdthavee that “provides evidence that daughters make people more left wing. Having sons, by contrast, makes them more right wing”:

Professor Oswald and Dr Powdthavee drew their data from the British Household Panel Survey, which has monitored 10,000 adults in 5,500 households each year since 1991 and is regarded as an accurate tracker of social and economic change. Among parents with two children who voted for the Left (Labour or Lib Dem), the mean number of daughters was higher than the mean number of sons. The same applied to parents with three or four children. Of those parents with three sons and no daughters, 67 per cent voted Left. In households with three daughters and no sons, the figure was 77 per cent.

I’ve seen some other studies recently with similar findings–a few years ago, a couple of economists found that having daughters, as compared to sons, was associated with the probability of divorce, I think it was, and recently a study by Ebonya Washington found that for Congressmembers, those with daughters (as compared to sons) were more likely to have liberal voting records on women’s issues.

Controlling for the number of children: an intermediate outcome

A common feature of all these studies is that they control for the total number of children. This can be seen in the quote above, for example: they compare different sorts of families with 2 kids, then make a separate comparison of different sorts of families with 3 kids.

At first sight, controlling for the total number of children seems reasonable. There is a difficulty, however, in that the total number of kids is an intermediate outcome, and controlling for it (whether by subsetting the data based on #kids or using #kids as a control variable in a regression model) can bias the estimate of the causal effect of having a son (or daughter).

To see this, suppose (hypothetically) that politically conservative parents are more likely to want sons, and if they have two daughters, they are (hypothetically) more likely to try for a third kid. In comparison, liberals are more likely to stop at two daughters. In this case, if you look at data on families with 2 daughters, the conservatives will be underrepresented, and the data could show a correlation of daughters with political liberalism–even if having the daughters has no effect at all!

A solution

A solution is to apply the standard conservative (in the statistical sense!) approach to causal inference, which is to regress on your treatment variable (sex of kid) but controlling only for things that happen before the kid is born. For example, one could compare parents whose first child is a girl to parents whose first child is a boy. One can also look at the second birth, comparing parents whose second child is a girl to those whose second child is a boy–controlling for the sex of the first child. And so on for third child, etc.

The modeling could get interesting here, since there is a sort of pyramid of coefficients (one for the first-kid model, two for the second-kid model (controlling for first kid), and so forth). It might be reasonable to expect coefficients to gradually decline (I assume the effect of the first kid would be the biggest), and one could estimate that with some sort of hierarchical model.

Summary

I’m not saying that all these researchers are wrong; merely that, by controlling for an intermediate outcome, they’re subject to a potential bias. Also they could redo their analyses without much effort, I think, to fix the biases and address this concern. I hope they do so (and inform me of their results).

It’s an interesting example because we all know not to control for intermediate outcomes, but the total # of kids somehow doesn’t look like that, at first.

6 Comments

Andrew, excellent post … we experience a similar phenomena when we model consumer choices … we see family size outcomes unless we employ what you refer to as imtermediate effects … in doing so we see another set of atttudinal variables appear

Good point! But how do you post from the future (your blog says that the post is from Dec 27, which is tomorrow!)? Does your trackback system work…it berated me for pinging too quickly, when I sent only the one ping all day. Here's a manual link, noting that the striking, but unsurprising thing to me is that the report seems to show that people with more children vote Left more than the general electorate. This is, of course, even more poorly controlled than the study itself!

In answer to your first question: I like to have one entry per day, and I'd already posted one thing for today, so . . .

Also, we've never been able to get the trackback feature working.

Finally, your thoughts on party switchers are interesting too. It seems to me that many of these questions are answerable from the researchers' dataset, so I'm hoping that now that we've pointed out some of these issues, they will do more with it.

Just to through another factor into the mix: I think there is evidence for at least some control of sex ratio in humans, so these results might be explainable by individuals that are more conservative biasing themselves towards male offspring. Something like socioeconomic status might cause the bias, but I've no idea if it would be strong enough.

Correlation is not the same as causation! (hmm, where have I heard that before?)

Thank you for your comments on our paper, and I think you are making a very good point here. I've managed to look at the effect of the first born (with no other child presence in the household) and finds that a new born daughter – controlling for individual fixed effects – still tilts the parents a bit to the left of the centre, whilst the coefficient on the new born son has an opposite sign. The standard error is a little higher than usual (although the coefficient size remains the same as in the full sample), but this is expected as the sample size is a lot smaller than before.

"Being born to an older mother, being the youngest child in a large family, being born in poverty, being born to a family with a low educational level are all risk factors for developmental delays."

This is slightly different from your other examples. But it seems that a variable identifying the "youngest child in a large family" would be difficult to use for prediction, and dubious for effect estimation. Perhaps parents of a child with developmental delays tend not to have more children. Any thoughts?