‘Metrics Monday: You Can’t Compare OLS with 2SLS

Suppose you are interested in the effect of a treatment variable D on some outcome Y, and you have some controls X. You can thus estimate the following equation by ordinary least squares (OLS):

(1) Y = a + bX + cD + e.

As it so often is the case in the social sciences, the problem is that it is not true that E(D’e) = 0, i.e., D is endogenous to Y, and so estimating equation 1 by OLS means that the estimated coefficient–let’s call it c_{OLS}, for simplicity–is biased, meaning that it will not be equal to the true value c of the coefficient.

Suppose further that you have an instrumental variable (IV) Z for the (endogenous) treatment variable D. Assume Z is a valid IV: it explains enough of the variation in D (i.e., it is not weak) and, perhaps more importantly, it meets the exclusion restriction in that it only affects Y through D. You can thus estimate the following two equations by two-stage least squares (2SLS):

(2) D = f + gX + hZ + u, and

(3) Y = a’ + b’X + c’D + e.

Let’s re-label the coefficient c’ and call it c_{2SLS} for simplicity.

One thing I still read in manuscripts or hear in seminars way too often is people comparing c_{OLS} and c_{2SLS} as though they estimate the same thing.

It usually goes something like this: Someone presents OLS and 2SLS results, and then they (or someone in the audience) will compare the OLS and 2SLS coefficients. If the c_{OLS} > (<) c_{2SLS}, something like “Ignoring endogeneity concerns leads to overstating (understating) the relationship between D and Y.”

The problem is that you can’t compare OLS and 2SLS coefficients. At least not that way.

To see why you can’t compare them as though they estimate the same thing, recall that in the best-case scenario, 2SLS estimates a local average treatment effect (LATE) whereas OLS estimates an average treatment effect (ATE) if E(D’e) = 0.

The LATE is the effect of D on Y for those observations that were induced to take up treatment D by the instrument Z.

Not for the entire sample. No: It’s the ATE for the subset of observations which were induced to take up the treatment because of the value of the IV (i.e., the subset of compliers).

It’s perhaps easier to see who the LATE does not apply to. The LATE does not apply to those observations who would have taken up the treatment no matter what the value of the instrument was (i.e., the always-takers), nor does it apply to those observations who would have never taken up the treatment no matter what the value of the instrument was (i.e., the never-takers).

Who’s a complier? Who’s an always-taker? Who’s a never-taker? Generally, you can’t tell. You can tell when you randomly assign an observation to treatment and that observation does not take the treatment that that observation is a never-taker. Conversely, you can tell when you randomly assign an observation to control and that observation takes the treatment that that observation is an always-taker. But outside of this random assignment (say, with observational data), it can be impossible to neatly define those three populations.

So to compare c_{OLS} with c_{2SLS} is as misleading as comparing c_{OLS} and c_{OLS} estimated on an unknown sub-sample of your initial sample.

They will generally not be the same thing, and that is why external validity concerns are usually mentioned in the same breath as the LATE when someone estimates a LATE–because you may have the perfect random sample, but a LATE tends to only estimate the ATE for a subset of that sample which is generally unknown. To be valid, any comparison between OLS and 2SLS has to mention the caveat that the two estimands apply to different populations.

I thought this was entirely obvious, but I still see or hear the comparison being made way too often, and over the course of this series of posts, I have learned that the things are only obvious once you know them.