Bogus Complaints about the use of Discrete Variables

Orazio Attanasio and Valérie Lechene (A&L) have an excellent article in the latest Journal of Political Economy that exploits the randomized rollout of PROGRESA to test the collective model of household consumption. Their analysis rests of the fact that once we condition on total consumption, the only way PROGRESA should plausibly affect the shares of consumption allocated to specific goods is through increasing women’s bargaining power (PROGRESA transferred money directly to children’s mothers). They make a similar argument for another variable, the relative strength of the two spouses’ family networks – these two variables, which affect consumption shares of different goods only through bargaining power, are called “distribution factors”. The collective model of household consumption states that however resources are allocated within the household, there is no waste; this is equivalent to saying that there is a unique index, called the Pareto weight or the sharing rule, that governs how all the distribution factors affect the shares of spending allocated to different goods. All distribution factors, then must enter the demand system proportionally, so we can effectively condition on one of them and explain away all the others. If demand shares still depend on a second distribution factor after we appropriately condition on the first, we can reject the collective model. They find that the collective model does not fail this test, while the simplistic unitary model is easily rejected (since PROGRESA changes consumption patterns, conditioning on total expenditure).

A&L exemplifies what we want to accomplish through conducting field experiments in economics: they combine a deep understanding of the institutional and cultural context of the experiment with an equally thorough analysis of what various economic models tell us about what should happen. As a result, A&L aren’t just estimating a parameter consistently or measuring the impact of a program, they are advancing our knowledge of how consumers make decisions – and in an empirically credible fashion. It’s also extremely well-written; I can hardly do it justice via a brief summary.

The only arguable shortcoming of the paper is that they make much of the fact that one of the distribution factors they rely on, the relative strength of the two spouse’s family networks, on is continuous. Continuity of at least one distribution factor is a formal requirement for the mathematics of their argument to go through. The problem with this claim is that it is false. The number of family members has only a finite number of points of support, thus leading to a finite number of potential values for the variable. The same even applies to their alternative measure, which uses the total consumption of each spouse’s family network. Money is “more” continuous than counts of people, sure – but it is not actually continuous. This doesn’t really undermine their argument, which is that you can’t use the PROGRESA treatment if a continuous variable is needed. PROGRESA treatment is, by definition, binary, and hence discrete. It definitely seems more valid to use something that is arguably a discretized proxy for an underlying continuous variable: although we observe only discrete ratios of numbers of people, that in principle could be measuring a variable that is actually continuous.

Unfortunately, stating that their alternative variable seems more valid is about as far as I think we can go. I’m not aware of any proof that having a “mostly continuous” variable is “good enough”, nor even that having things be “more continuous” is “better”.* This is a very general problem: most of economic theory, and most of the math underlying econometrics, technically requires the variables we are working with to be continuous. But all of the variables actually used in empirical economics are discrete: the minimum granularity of money is cents (or arguably mills); for time, we never measure anything below seconds.**

None of this means that the mathematical and statistical tools we use don’t work. On the contrary, they seem to work just fine even when things are obviously discrete. The canonical example of ignoring discreteness is the “linear probability model”, which has been rehabilitated in the eyes of economists (in particular Josh Angrist). We seem to have learned, as a discipline, that if the marginal effects computed by a probit are meaningfully different from those that come out of an LPM, the solution is to fix your specification rather than to trust that the error term is normally-distributed. I’ve personally learned that pretending things are continuous is also fine in other contexts – for example by learning how to implement a count model on some of my data only to find that its estimates of marginal effects were identical to OLS to the 4th decimal place.

Pointing out discreteness as a statistical concern, or an issue with someone’s model, is usually just a cheap “gotcha”. Yes, it’s technically a problem. But it’s technically a problem with every economics paper that uses continuity – which is a lot of papers. As a discipline, economics seems to be strikingly inconsistent on whether we worry about continuity. We usually ignore it when working with discrete quantities like money or hours worked or years of education or test scores, and nobody complains. There’s no good reason to criticize the use of variables that are discrete at the level of whole numbers while not objecting equally to the use of variables that are discrete at the level of hundredths of a whole number.

* That doesn’t mean that there is no such proof. However, if such a proof exists, Attanasio and Lechene don’t cite it, and neither do other researchers who insist that relatively more-discrete variables are more problematic.
** Broadly-informed readers might also note that according to the best of our knowledge, almostnothing is actually continuous, which doesn’t do much to limit our ability to use calculus to understand the physical world.