Angus Deaton, 2015 Nobel Winner: A Prize for Structural Analysis?

Angus Deaton, the Scottish-born, Cambridge-trained Princeton economist, best known for his careful work on measuring the changes in wellbeing of the world’s poor, has won the 2015 Nobel Prize in economics. His data collection is fairly easy to understand, so I will leave larger discussion of exactly what he has found to the general news media; Deaton’s book “The Great Escape” provides a very nice summary of what he has found as well, and I think a fair reading of his development preferences are that he much prefers the currently en vogue idea of just giving cash to the poor and letting them spend it as they wish.

Essentially, when one carefully measures consumption, health, or generic characteristics of wellbeing, there has been tremendous improvement indeed in the state of the world’s poor. National statistics do not measure these ideas well, because developing countries do not tend to track data at the level of the individual. Indeed, even in the United States, we have only recently begun work on localized measures of the price level and hence the poverty rate. Deaton claims, as in his 2010 AEA Presidential Address (previously discussed briefly on two occasions on AFT), that many of the measures of global inequality and poverty used by the press are fundamentally flawed, largely because of the weak theoretical justification for how they link prices across regions and countries. Careful non-aggregate measures of consumption, health, and wellbeing, like those generated by Deaton, Tony Atkinson, Alwyn Young, Thomas Piketty and Emmanuel Saez, are essential for understanding how human welfare has changed over time and space, and is a deserving rationale for a Nobel.

The surprising thing about Deaton, however, is that despite his great data-collection work and his interest in development, he is famously hostile to the “randomista” trend which proposes that randomized control trials (RCT) or other suitable tools for internally valid causal inference are the best way of learning how to improve the lives of the world’s poor. This mode is most closely associated with the enormously influential J-PAL lab at MIT, and there is no field in economics where you are less likely to see traditional price theoretic ideas than modern studies of development. Deaton is very clear on his opinion: “Randomized controlled trials cannot automatically trump other evidence, they do not occupy any special place in some hierarchy of evidence, nor does it make sense to refer to them as “hard” while other methods are “soft”… [T]he analysis of projects needs to be refocused towards the investigation of potentially generalizable mechanisms that explain why and in what contexts projects can be expected to work.” I would argue that Deaton’s work is much closer to more traditional economic studies of development than to RCTs.

To understand this point of view, we need to go back to Deaton’s earliest work. Among Deaton’s most famous early papers was his well-known development of the Almost Ideal Demand System (AIDS) in 1980 with Muellbauer, a paper chosen as one of the 20 best published in the first 100 years of the AER. It has long been known that individual demand equations which come from utility maximization must satisfy certain properties. For example, a rational consumer’s demand for food should not depend on whether the consumer’s equivalent real salary is paid in American or Canadian dollars. These restrictions turn out to be useful in that if you want to know how demand for various products depend on changes in income, among many other questions, the restrictions of utility theory simplify estimation greatly by reducing the number of free parameters. The problem is in specifying a form for aggregate demand, such as how demand for cars depends on the incomes of all consumers and prices of other goods. It turns out that, in general, aggregate demand generated by utility-maximizing households does not satisfy the same restrictions as individual demand; you can’t simply assume that there is a “representative consumer” with some utility function and demand function equal to each individual agent. What form should we write for aggregate demand, and how congruent is that form with economic theory? Surely an important question if we want to estimate how a shift in taxes on some commodity, or a policy of giving some agricultural input to some farmers, is going to affect demand for output, its price, and hence welfare!

Let q(j)=D(p,c,e) say that the quantity of j consumed, in aggregate is a function of the price of all goods p and the total consumption (or average consumption) c, plus perhaps some random error e. This can be tough to estimate: if D(p,c,e)=Ap+e, where demand is just a linear function of relative prices, then we have a k-by-k matrix to estimate, where k is the number of goods. Worse, that demand function is also imposing an enormous restriction on what individual demand functions, and hence utility functions, look like, in a way that theory does not necessarily support. The AIDS of Deaton and Muellbauer combine the fact that Taylor expansions approximately linearize nonlinear functions and that individual demand can be aggregated even when heterogeneous across individuals if the restrictions of Muellbauer’s PIGLOG papers are satisfied to show a functional form for aggregate demand D which is consistent with aggregated individual rational behavior and which can sometimes be estimated via OLS. They use British data to argue that aggregate demand violates testable assumptions of the model and hence factors like credit constraints or price expectations are fundamental in explaining aggregate consumption.

This exercise brings up a number of first-order questions for a development economist. First, it shows clearly the problem with estimating aggregate demand as a purely linear function of prices and income, as if society were a single consumer. Second, it gives the importance of how we measure the overall price level in figuring out the effects of taxes and other policies. Third, it combines theory and data to convincingly suggest that models which estimate demand solely as a function of current prices and current income are necessarily going to give misleading results, even when demand is allowed to take on very general forms as in the AIDS model. A huge body of research since 1980 has investigated how we can better model demand in order to credibly evaluate demand-affecting policy. All of this is very different from how a certain strand of development economist today might investigate something like a subsidy. Rather than taking obversational data, these economists might look for a random or quasirandom experiment where such a subsidy was introduced, and estimate the “effect” of that subsidy directly on some quantity of interest, without concern for how exactly that subsidy generated the effect.

To see the difference between randomization and more structural approaches like AIDS, consider the following example from Deaton. You are asked to evaluate whether China should invest more in building railway stations if they wish to reduce poverty. Many economists trained in a manner influenced by the randomization movement would say, well, we can’t just regress the existence of a railway on a measure of city-by-city poverty. The existence of a railway station depends on both things we can control for (the population of a given city) and things we can’t control for (subjective belief that a town is “growing” when the railway is plopped there). Let’s find something that is correlated with rail station building but uncorrelated with the random component of how rail station building affects poverty: for instance, a city may lie on a geographically-accepted path between two large cities. If certain assumptions hold, it turns out that a two-stage “instrumental variable” approach can use that “quasi-experiment” to generate the LATE, or local average treatment effect. This effect is the average benefit of a railway station on poverty reduction, at the local margin of cities which are just induced by the instrument to build a railway station. Similar techniques, like difference-in-difference and randomized control trials, under slightly different assumptions can generate credible LATEs. In development work today, it is very common to see a paper where large portions are devoted to showing that the assumptions (often untestable) of a given causal inference model are likely to hold in a given setting, then finally claiming that the treatment effect of X on Y is Z. That LATEs can be identified outside of a purely randomized contexts is incredibly important and valuable, and the economists and statisticians who did the heavy statistical lifting on this so-called Rubin model will absolutely and justly win an Economics Nobel sometime soon.

However, this use of instrumental variables would surely seem strange to the old Cowles Commission folks: Deaton is correct that “econometric analysis has changed its focus over the years, away from the analysis of models derived from theory towards much looser specifications that are statistical representations of program evaluation. With this shift, instrumental variables have moved from being solutions to a well-defined problem of inference to being devices that induce quasi-randomization.” The traditional use of instrumental variables was that after writing down a theoretically justified model of behavior or aggregates, certain parameters – not treatment effects, but parameters of a model – are not identified. For instance, price and quantity transacted are determined by the intersection of aggregate supply and aggregate demand. Knowing, say, that price and quantity was (a,b) today, and is (c,d) tomorrow, does not let me figure out the shape of either the supply or demand curve. If price and quantity both rise, it may be that demand alone has increased pushing the demand curve to the right, or that demand has increased while the supply curve has also shifted to the right a small amount, or many other outcomes. An instrument that increases supply without changing demand, or vice versa, can be used to “identify” the supply and demand curves: an exogenous change in the price of oil will affect the price of gasoline without much of an effect on the demand curve, and hence we can examine price and quantity transacted before and after the oil supply shock to find the slope of supply and demand.

Note the difference between the supply and demand equation and the treatment effects use of instrumental variables. In the former case, we have a well-specified system of supply and demand, based on economic theory. Once the supply and demand curves are estimated, we can then perform all sorts of counterfactual and welfare analysis. In the latter case, we generate a treatment effect (really, a LATE), but we do not really know why we got the treatment effect we got. Are rail stations useful because they reduce price variance across cities, because they allow for increasing returns to scale in industry to be utilized, or some other reason? Once we know the “why”, we can ask questions like, is there a cheaper way to generate the same benefit? Is heterogeneity in the benefit important? Ought I expect the results from my quasiexperiment in place A and time B to still operate in place C and time D (a famous example being the drug Opren, which was very successful in RCTs but turned out to be particularly deadly when used widely by the elderly)? Worse, the whole idea of LATE is backwards. We traditionally choose a parameter of interest, which may or may not be a treatment effect, and then choose an estimation technique that can credible estimate that parameter. Quasirandom techniques instead start by specifying the estimation technique and then hunt for a quasirandom setting, or randomize appropriately by “dosing” some subjects and not others, in order to fit the assumptions necessary to generate a LATE. If is often the case that even policymakers do not care principally about the LATE, but rather they care about some measure of welfare impact which rarely is immediately interpretable even if the LATE is credibly known!

Given these problems, why are random and quasirandom techniques so heavily endorsed by the dominant branch of development? Again, let’s turn to Deaton: “There has also been frustration with the World Bank’s apparent failure to learn from its own projects, and its inability to provide a convincing argument that its past activities have enhanced economic growth and poverty reduction. Past development practice is seen as a succession of fads, with one supposed magic bullet replacing another—from planning to infrastructure to human capital to structural adjustment to health and social capital to the environment and back to infrastructure—a process that seems not to be guided by progressive learning.” This is to say, the conditions necessary to estimate theoretical models are so stringent that development economists have been writing noncredible models, estimating them, generating some fad of programs that is used in development for a few years until it turns out not to be silver bullet, then abandoning the fad for some new technique. Better, the randomistas argue, to forget about external validity for now, and instead just evaluate the LATEs on a program-by-program basis, iterating what types of programs we evaluate until we have a suitable list of interventions that we feel confident work. That is, development should operate like medicine.

We have something of an impasse here. Everyone agrees that on many questions theory is ambiguous in the absence of particular types of data, hence more and better data collection is important. Everyone agrees that many parameters of interest for policymaking require certain assumptions, some more justifiable than others. Deaton’s position is that the parameters of interest to economists by and large are not LATEs, and cannot be generated in a straightforward way from LATEs. Thus, following Nancy Cartwright’s delightful phrasing, if we are to “use” causes rather than just “hunt” for what they are, we have no choice but to specify the minimal economic model which is able to generate the parameters we care about from the data. Glen Weyl’s attempt to rehabilitate price theory and Raj Chetty’s sufficient statistics approach are both attempts to combine the credibility of random and quasirandom inference with the benefits of external validity and counterfactual analysis that model-based structural designs permit.

One way to read Deaton’s prize, then, is as an award for the idea that effective development requires theory if we even hope to compare welfare across space and time or to understand why policies like infrastructure improvements matter for welfare and hence whether their beneficial effects will remain when moved to a new context. It is a prize which argues against the idea that all theory does is propose hypotheses. For Deaton, going all the way back to his work with AIDS, theory serves three roles: proposing hypotheses, suggesting which data is worthwhile to collect, and permitting inference on the basis of that data. A secondary implication, very clear in Deaton’s writing, is that even though the “great escape” from poverty and want is real and continuing, that escape is almost entirely driven by effects which are unrelated to aid and which are uninfluenced by the type of small bore, partial equilibrium policies for which randomization is generally suitable. And, indeed, the best development economists very much understand this point. The problem is that the media, and less technically capable young economists, still hold the mistaken belief that they can infer everything they want to infer about “what works” solely using the “scientific” methods of random- and quasirandomization. For Deaton, results that are easy to understand and communicate, like the “dollar-a-day” poverty standard or an average treatment effect, are less virtuous than results which carefully situate numbers in the role most amenable to answering an exact policy question.

Let me leave you three side notes and some links to Deaton’s work. First, I can’t help but laugh at Deaton’s description of his early career in one of his famous “Notes from America”. Deaton, despite being a student of the 1984 Nobel laureate Richard Stone, graduated from Cambridge essentially unaware of how one ought publish in the big “American” journals like Econometrica and the AER. Cambridge had gone from being the absolute center of economic thought to something of a disconnected backwater, and Deaton, despite writing a paper that would win a prize as one of the best papers in Econometrica published in the late 1970s, had essentially no understanding of the norms of publishing in such a journal! When the history of modern economics is written, the rise of a handful of European programs and their role in reintegrating economics on both sides of the Atlantic will be fundamental. Second, Deaton’s prize should be seen as something of a callback to the ’84 prize to Stone and ’77 prize to Meade, two of the least known Nobel laureates. I don’t think it is an exaggeration to say that the majority of new PhDs from even the very best programs will have no idea who those two men are, or what they did. But as Deaton mentions, Stone in particular was one of the early “structural modelers” in that he was interested in estimating the so-called “deep” or behavioral parameters of economic models in a way that is absolutely universal today, as well as being a pioneer in the creation and collection of novel economic statistics whose value was proposed on the basis of economic theory. Quite a modern research program! Third, of the 19 papers in the AER “Top 20 of all time” whose authors were alive during the era of the economics Nobel, 14 have had at least one author win the prize. Should this be a cause for hope for the living outliers, Anne Krueger, Harold Demsetz, Stephen Ross, John Harris, Michael Todaro and Dale Jorgensen?

For those interested in Deaton’s work beyond what this short essay, his methodological essay, quoted often in this post, is here. The Nobel Prize technical summary, always a great and well-written read, can be found here.

Like this:

Related

“Should this be a cause for hope for the living outliers, Anne Krueger, Harold Demsetz, Stephen Ross, John Harris, Michael Todaro and Dale Jorgensen?”
You forgot Sandy Grossman, one of the finest financial economists alive. He would also be the first non academic to win the prize, I guess.

Absolutely he is a contender, but like Dixit, his “top 20” paper is coauthored with an existing Nobel winner. The list I gave is just economists with a “top 20” paper who are alive and whose coauthor, if any, has also not won.

Great post as always. But I found something confusing in your analysis of how the Deaton and Muellbauer work might serve as an alternative to randomista inference. Did you not indicate that by Deaton and Muellbauer’s own analysis the linearized AIDS model failed to account for aggregate outcomes (“aggregate demand violates testable assumptions of the model”)? That being the case, and given the impracticality of fitting the richer model (“this can be tough to estimate”), I’m not seeing what the model-based approach would even be here, regardless of whether you take it to be an alternative for a randomista approach or not.

I think the reason people like randomista inference is because it generates relatively well identified causal facts. Of course, this leaves lots of room for interpretation, and theory has an important role in such interpretation (e.g., in helping us to sort out what might matter among the many factors that define the localness of the estimate).

Are theorists better off using poorly identified estimates that fail to characterize their “localness” to ground their analysis? That makes no sense. (Remember characterizations of localness are things that randomistas themselves discovered and typically insist upon in discussing empirical results.) Perhaps, it is counterproductive to assume that causal identification and theoretical elaboration are at odds with each other?

Andrea basically took the words out of my mouth, but
1) No one is against well-identified causal relationships. The question is merely whether they are a gold standard, or whether they are worse than other forms of inference because they can only answer limited questions, because they are in practice misinterpreted, or because they do not connect literatures and advance science as cleanly as more theoretically informed work. I think Deaton argues, in multiple papers, for problems on all three grounds.
2) Theory and causal identification are absolutely not at odds with each other. Indeed, theory is useful precisely because it helps us move from LATEs to more interesting parameter estimates.
3) On AIDS, the point of that paper initially was that people were making claims about how aggregate demand operated, and in particular that there were symptoms of irrationality, which were not justified given the model they were using. In particular, there are very few aggregate demand functional forms which are consistent with utility maximization, feasible to estimate using OLS or similar techniques, and general (at least approximately). In the British data Deaton and Muellbauer examine, even permitting this general aggregate demand there are anomalies, and knowing this guides empirical investigation about what types of data to collect. In other contexts, where things like credit constraints aren’t first order, AIDS (and translog, and their extensions) are practical and more theoretically justified demand functions which you may actually want to use when, e.g., evaluating a new tax. Note in particular that AIDS informs, on the basis of theory, which parameters you may find interesting – it provides something more than a treatment effect to estimate, and these parameters permit inference on quantities like welfare.

Re moving from LATEs: In important ways, *estimators* for structural parameters don’t differ from those of reduced form parameters and therefore inherit similar, formal, “localness” properties. This can be a problem: if you are trying to tie together results from different estimates to do welfare analysis, the result may be incoherent, like putting an apple peel over an orange…. I suppose that one would hope that an analysis yields estimates of deep parameters that are invariant, in which case localness ceases to be an issue. This is a hope that can be hard to justify on the basis of evidence. I don’t think it is so straightforward to sidestep localness with structure.

One advantage of the structural parameters can is that they are usually identified by the entire data distribution rather than only, essentially, by conditional means as in standard regression (this is by no means always true).

But even if this wasn’t the case, you seem to be missing the point: welfare analysis is conditional on the model. It’s not straightforward but structure is really the only way. I’m not advocating that it’s superior, only that some questions can ONLY be answered by adding structure. Other questions are better answered without structure. Claims of superiority by either approach are misplaced. So is the pretention that without structure you get model-free interpretation of your LATE coefficients is far fetched at best, in general wrong. The best explanation I’ve seen of this is in a paper in JEL 2000 by Rosenzweig and Wolpin.

Good point, Cyrus, and of course the assumption of invariance matters, but I agree we could use more econometric research on how we should handle “almost-invariant” parameters and whether that matters. Andrea is also right that certain claims don’t make sense without a model, such as claims about welfare. A given distribution of data implies different parameters in different models (obviously), and it is very rarely the case that we can learn those parameters trivially by estimating a LATE; the fact that the LATE implicitly drops the ability to do inference off the conditional mean has to imply that any parameter which depends on more than the conditional mean cannot be constructed using methods whose whole purpose to credibly estimate that mean.

(Btw, I saw your AJPS with Peter recently and enjoyed quite a bit – I’d certainly never realized the weighting issue you point out.)

I appreciate your replies. This has been a useful conversation. It highlights especially the point that the LATE may be uninteresting not necessarily because of the “L” aspect (which, in many cases I think is unavoidable) but because the “ATE” component may be too blunt. A model can help to establish estimable parameters of interest that draw on more information. (Heckman, Smith, and Clements 1997 describe limits to trying to do this via a purely “statistical” approach—that is, free of a substantive model.

Cyrus, the “model-based approach” is: “it takes a model to beat a model”. You get answers that are conditional on the model and if the model is impractical to estimate… maybe you just have to wait until it becomes practical.

But Deaton’s point is that RCTs are not the holy grail of inference: the *interpretation* and extension/generalization of their results is not model-free as sometimes implied. You’re right that causal identification and theory do not have to be at odds with each other, but there is not theory-free empirical research.

It goes without saying, but obviously I agree completely. The most RCTish of all the RCT papers eventually makes some sort of claim beyond the mere existence of a LATE: something about generalizability, or welfare, or links to previous literature, or causes. We can either do these exercises in a handwaving way or we can do them in a way informed by 250 years of economics research.

I don’t think this is even that controversial: scholars at top institutions essentially all are willing to use whatever technique is appropriate for answering a given question. The problem, if I read Deaton correctly, are the people who think economics is simply RCT-style medicine, full stop, and who *can’t* write a model or do non-ATE types of inference even if they want to. There are many economists getting PhDs today for whom this is true. If you look at, e.g., Deaton’s work on “dollar a day” and PPP, you will find a guy who is very worried about the sloppy and theoretically unjustified use of empirical economic data.

Based on your conversation, it seems that one could make the case that all empirical economics (at least microeconomic topics) should be based on RCTs whether we want to estimate a local treatment effect or a structural parameter. Deaton’s criticisms of RCTs are directed at a line of work that doesn’t really exist. Increasingly, structural work and RCTs are put together, particularly when researchers aim at generalizing their conclusions. I saw his metaphor of RCTs based on the game “angry birds” and found it very weak. While his work is admirable, I fail to see how his opinion on the topic is useful or even original.

Great conversation. A lot of the criticism of RCTs pertains to most empirical work, but RCTs take the brunt of it because of they way they are promoted. Part of this promotion could come from how poorly evaluations were conducted by governments and NGOs in the past–by any standard. Researchers may promote RCTs in a way that, to quote Dean Acheson, is “clearer than the truth” to get organizations away from these poorly conducted studies. (I wouldn’t argue this is appropriate however!)