Monday, September 28, 2015

A bit of pushback against the empirical tide

There has naturally been a bit of pushback against empiricist triumphalism in econ. Here are a couple of blog posts that I think represent the pushback fairly well, and probably represent some of the things that are being said at seminars and the like.

First, Ryan Decker has a post about how the results of natural experiments give you only limited information about policy choices:

[T]he “credibility revolution”...which in my view has dramatically elevated the value and usefulness of the profession, typically produces results that are local to the data used. Often it's reasonable to assume that the "real world" is approximately linear locally, which is why this research agenda is so useful and successful. But...the usefulness of such results declines as the policies motivated by them get further from the specific dataset with which the results were derived. The only way around this is to make assumptions about the linearity of the “real world”[.] (emphasis mine)

Great point. For example, suppose one city hikes minimum wages from $10 to $11, and careful econometric analysis shows that the effect on employment was tiny. We can probably assume that going to $11.50 wouldn't be a lot worse. But how about $13? How about $15? By the time we try to push our luck all the way to $50, we're almost certainly going to be outside of the model's domain of applicability.

I have not seen economists spend much time thinking about domains of applicability (what physicists usually call "scope conditions"). But it's an important topic to think about.

Ryan doesn't say it, but his post also shows one reason why natural experiments are still not as good as lab experiments. With lab experiments you can retest and retest a hypothesis over a wide set of different conditions. This allows you to effectively test whole theories. Of course, at some point your ability to build ever bigger particle colliders will fail, so you can never verify that you have The Final Theory of Everything. But you can get a really good sense of whether a theory is reliable for any practical application.

Not so in econ. You have to take natural experiments as they come. You can test hypotheses locally, but you usually can't test whole theories. There are exceptions, especially in micro, where for example you can test out auction theories over a huge range of auction situations. But in terms of policy-relevant theories, you're usually stuck with only a small epsilon-sized ball of knowledge, and no one tells you how large epsilon is.

This, I think, is why economists talk about "theory vs. data", whereas you almost never hear lab scientists frame it as a conflict. In econ policy-making or policy-recommending, you're often left with a choice of A) extending a local empirical result with a simple linear theory and hoping it holds, or B) buying into a complicated nonlinear theory that sounds plausible but which hasn't really been tested in the relevant domain. That choice is really what the "theory vs. data" argument is all about.

Anyway, the second blog post is Kevin Grier on Instrumental Variables. Grier basically says IV sucks and you shouldn't use it, because people can always easily question your identification assumptions:

First of all, no matter what you may have read or been taught, identification is always and everywhere an ASSUMPTION. You cannot prove your IV is valid...

I pretty much refuse to let my grad students go on the market with an IV in the job market paper. No way, no how. Even the 80 year old deadwoods in the back of the seminar room at your job talk know how to argue about the validity of your instruments. It's one of the easiest ways to lose control of your seminar.

We've had really good luck placing students who used Diff in diff (in diff), propensity score matching, synthetic control, and even regression discontinuity. All of these approaches have their own problems, but they are like little grains of sand compared to the boulder-sized issues in IV.

He's absolutely right about the seminar thing. Every IV seminar degenerates into hand-waving about whether the instrument is valid. He doesn't mention the problem of weak instruments, either, which is a big problem that has been recognized for decades.

Now, Kevin is being hyperbolic when he categorically rejects IV as a technique. If you find a great instrument, it's really no different than regression discontinuity. And when you find a really good instrument, even the "deadwoods" in the back of the room are going to recognize it.

As for IV's weakness in the job market, that's probably somewhat due to the fact that it's been eclipsed by other methods that have not been around as long as IV. If and when people overuse those methods, it's highly probable that people will start making a lot of noise about their limitations. And as Ed Leamer reminds us, there will always be holes to poke.

Anyway, these posts both make good points, though Kevin's is a little over-the-top. Any research trend will have a pushback. In a later, more pompous/wanky post, I'll try to think about how this will affect the overall trend toward empiricism in econ... (Update: Here you go!)

24 comments:

I don't think Kevin is being hyperbolic and I think he is right to reject IV. There are arbitrarily many instruments you could have selected. Your mind rejects many of them subconsciously. Why? You don't know. The instruments you do select may be the result of your own brain fooling itself into selecting instruments that pass the tests given the data. And how do other researchers know if you ran the equivalent of batch file regressions on different sets of instruments until you found ones that rejected the nulls? The technique lends itself to intellectual dishonesty.

By random chance some instruments will test falsely as strong instruments and not be correlated with the errors. If you run through enough instruments, you will necessarily identify false positives. You may not even realize this is what you did.

You could say exactly the same thing about confounding variables, but practice of blindly conditioning on possible confounders remains very popular. Every decision to categorize a variable as confounding, instrumental, or irrelevant equates to assuming a causal model; there is nothing special about IVs in this respect. One has the choice of explicitly identifying one's causal assumptions or ignoring the issue and hoping nobody notices.

It is possible to find combinations of datasets and causal models that produce "Simposon's Paradox machines"; by progressively conditioning on factor 1, factors 1 and 2, factors 1, 2, and 3 one can reverse the conclusion at each step. It is common for researchers to say "our findings are robust to conditioning on factors A, B, and C", and critical attention usually focuses on unnamed factors D, E etc. But how many papers explicitly state that their findings hold up when conditioning on {A}, {B}, {C}, {A,B}, {A,C}, {B,C}, and {A,B,C}?

A particularly egregious recent example from health research claimed that "high" exercise was worse for health than "moderate" exercise - after conditioning on BMI, blood pressure, blood lipids, and a bunch of other stuff that most people would assume is instrumenta.

Ben, you think people really run through a whole bunch of candidate instruments? I think they're hard enough to find or think of that that probably only rarely happens (except in time series, when you're using lags as instruments...ugh).

I think that most of the time the instruments test as weak or as correlated with the error. They throw them out and try again. The trying again is a problem. The researcher has already seen the data and this will influence the choice. How many instruments are rejected subconsciously without running any tests? This is a problem. And how would anyone critiquing the results know if they had run through thousands of instruments?

Noah you are right that is part of effective rhetoric. Ed Prescott's black and white crusading style put you off. But to quote a wise man "maybe that's just how you have to be if you want to get things done in the sciences!"

The locality argument is a major issue with all of econ, in my opinion. I came over from engineering and spent the first year or so of grad school just being like, are you guys crazy? As someone who writes code, the first thing I do when I see a formula is plug in zero and see if it breaks. In econ, it always breaks. Are capital and labor substitutes? Not at the limit.

But I've come to accept the "at the margin" argument for everything. :) My chief complaint at this point is that economists don't make that clear when talking to the public. For example, should we have fiscal stimulus? Well, at the limit can we just do more and more fiscal stimulus until GDP grows at 10% per year? Probably not... so just explain that. Otherwise people reject the whole model out of hand because it doesn't pass those kind of common-sense tests.

"I have not seen economists spend much time thinking about domains of applicability (what physicists usually call "scope conditions"). But it's an important topic to think about."

Noah, this is completely ridiculous. Ironically with respect to your second item, IV is where this has been done the best--Angrist and Imbens showed that IV identifies a LATE (*local* average treatment effect) for compliers and not an ATE for the population, which directly codifies these "scope conditions." I'm sure there are people on the internet who don't realize you're completely out of your domain--is it too much to ask for you to even slightly qualify ridiculous sweeping statements like in most of this entire post?

Also, I don't think what you just said is a good example of thinking about scope conditions. Sure, we know the results are local. How local are they? How small is a small change? I haven't seen much effort put into answering those questions, and your comment fits with my own observations.

Is your problem with theory, theory itself or theory that has not been properly tested? I see two types of theory. Type 1 are models that make qualitative but not quantitative predictions. I see those models as pretty limited. One exception is that qualitative theory can sometimes help with identification. (For example, sign restrictions or Minnesota priors in VARs.) Clearly theory helps can also evaluate an empirical exercise. If the data says demand slopes up we should be skeptical of the results.The second type of theory are those that make quantitative predictions. These are the most useful and interesting, but are also the most abused. I see the problem is that we frequently write down models and then assume they are true to make empirical predictions or to answer policy questions. That is a problem with our methodology not with theory. We need to have a metric to evaluate a model before treating its predictions seriously. At the end economics is mostly about identification. Natural experiments are hard to come by and valid instruments are tough to find. Properly used theory is an adjunct to data and needs to be integrated into data.

To me, it seems that economists expect too much from empirical papers. Reinhart-Rogoff is a classic example. The profession literally took *one* paper, and treated it as the truth to the point where economists were torturing their data to get a similar result. Hard sciences are also limited by the conditions that they set on any given experiment. In a way, instruments as economists use them and control variables as scientists use them are analogous, although they obviously have a far greater control than we do. The problem is that we often fail to take the results of a body of literature to do meta-analytical work to see what conditions impact various theories, and recraft our theories to fit that. Ultimately, any science is only as good as the questions scientists choose to ask, and at present, most economists are trained not to ask certain questions or to regard results that don't agree with the theory as spurious.

OK, so empirical data of the kinds economists are generally able to collect can give one only limited information about the likely outcomes of the relevant alternative policy choices. But a priori theoretical reasoning unconstrained by empirical data gives one even less information about the likely outcomes of the relevant alternative policy choices. So I guess the conclusion is that the methods available to economics do not provide economists with a lot of information about the likely outcomes of the relevant alternative policy choices, and that economists therefore do not possess a lot of information about the likely outcomes of the relevant alternative policy choices.