Wednesday, May 26, 2010

Unintentionally Harmful Econometrics

Labor economists joshua Angrist and Jorn-Steffen Pischke have a neat little book on econometrics: Mostly Harmless Econometrics. Alas, the link to anything written by Douglas Adams is a bit strained, but I appreciate their light-hearted approach. Basically, they outline the fundamental problem that vexes econometric research: the omitted variables bias.

Say you want to estimate how schooling affects earnings. People who go to school longer have higher earnings. But they also have greater discipline, come from better socio-economic backgrounds, and have higher IQ, all of which might be the true cause. If you don't include these in the regression, you attribute too much benefit to schooling via these omitted variables (that are often difficult--or in the case of IQ, taboo--to measure). So the book goes over all sorts of ways to tease out the true relationship. Foremost in this approach is the Freakonomics method of natural experiments. This is when some arbitrary event creates different samples where the variable of interest (schooling) is different, but the other factors (IQ, wealth, discipline) are the same.

Basically, you compare a group affected by some arbitrary process that stopped group B from getting more schooling that is otherwise indistinguishable from group A that got more schooling. The signature example in the book is the Angrist and Krueger (1991) paper, which takes advantage of the fact that many states required kids to go to school until the age of 16, and they had to start school in September at age 6. Because school years end in June, this means kids born just before school starts would be 16 just before another year started, while those born just after would be only 15, and have to at least start another year. This causes kids to get different amounts of education merely because their birth date,, which is independent of IQ, discipline, and socio-economic status. A 'natural experiment'.

The precise methods of finding 'identifying restrictions' is not so important. It's the mainstay of rigorous research, but ultimately, if your result shows up only via coefficient t-stats in 2-stage least squares, but not a graph, it isn't there. The earnings are what they are, what's the difference for the kids who went to school an extra year? Here, we see the problem with econometrics. The authors are very excited by the saw-tooth pattern that shows a significant birthday bump around December 31 for the Angrist and Krueger study, highlighting that staying in school an extra year provided a statistically significant bump in earnings.

But the effect is small, about 3% effect on wages via the extra year for those kids born too late to skip their last grade. Sure, with enough observations it is significant, but not enough to change the world. It became an inspiration for a generation of Harvard economists like Levitt because the approach seemed to solve a problem using both cleverness, and high-brow econometric techniques. Yet, it does not follow that staying in school an extra year at 16 implies going to college would also help. Very few effects are linear. I could take golf lessons and lower my score by 10 strokes in a few lessons; it would not, over a year, put me on the PGA tour. The authors seem oblivious to this point.

One could say, he has a long section on nonlinearity, and indeed he does. But the inordinate amount of attention he pays to his little 3% wage increase belies this knowledge. The net-net inference is clearly that the 3% wage increase has policy implications, otherwise he, and the labor economics community, would not cite it so much. In the end, he doesn't think nonlinearities are relevant to his

One could imagine them testifying before Congress that their research proves we should spend more money on college, but that presumes a lot of things. It assumes a linear extrapolation of their 16-year old findings. It assumes spending more money on college aid actually increases schooling, as opposed to what is charged by schools. It assumes people will study subjects that are actually skill related, as opposed to becoming film-studies majors who don't learn anything useful.

Where all this excitement with econometric technique, and natural experiments, over common sense leads is best examplified by this paper on the the impact of file-sharing on record sales by Oberholzer-Gee and Strumpf, published as the lead article in the Feb 2007 Journal of Political Economics (Levitt's journal) to great fanfare. Their measured relationship between the instrument (German students on vacation) and the variable that it is instrumenting for, American downloading, is seen to have a large effect, presumably because Germans download and share a lot music, and spend a lot of time sharing files when on vacation.

Anyway, Most of Oberholzer-Gee and Strumpf's work emphasizes econometrics, not the stuff Stan Liebowitz brings up, such as the proportion of school kids who actually download music as a percent of music downloads in Germany and their percent of US music sales. Vacation time correlates with traditional US seasonal patterns, as music sales spike before Christmas when German Junge are in school for reason unrelated to file sharing (stockings need CDs). Basically, the empirical issue is one of institutional detail, know about seasonality of music sales, school schedules, time of music downloads, and very little to do with identifying restrictions that are emphasized in their academic piece.

In a sense, econometrics as a science allows shoddy empirical work to hide behind pretentious techniques that try to avoid these issues. Vector auto-regressions, or natural experiments, do not obviate the need for common sense, and understanding of the subject to which the tool is being applied. The manifest failure to predict stock returns, business cycles, interest rates that became so apparent in the 1980s caused econometricians to search for small subjects where natural experiments exist, and then draw some grand extrapolation. People think about the question less than the method, and then assume that because the results are so clean, it will be profound. I think many econometricians wish that it were so, so they could focus on what they really like, math, while still saying something interesting about important issues like schooling.

If the only way to tease out, say, the risk premium, is to use a method-of-moments estimation with 3 identifying restrictions based on some fundamental utility function, but the bottom line is you can't say whether Coke is a riskier stock than GM, you don't have a result. It's academic.

Right on!! Academic like math but need to work on some empirical topic to justify the money they're getting, so they start talking about xyz without knowing anything about it.The need to study the substance of the question you are researching has been emphasized by D. McCloskey before.

Still, I like the books and it reads easy (relatively) and I still try to persuade myself I have not spent three years in vain and that econometrics is somehow useful :-)

Somebody could read your blog and then testify before Congress that there is no harm in financial institutions taking excessive risks to earn higher returns because your "research" shows that return is unrelated to risk. Do you want to be blamed for what I imagine someone might say given your "research."

And you ought to tell your readers that they worry about non-linearities in their book quite a lot.

I think you guys are reading too much calculated ambiguity into my 'one could imagine' phrase. This could mean, let me generate a logically possible but nonfalsifiable hypothetical, and say it as if it happened. That's a possible inference. However, it is also somewhat of a standard phrase intro, implying 'this happens a lot--enough to matter--but not all the time'.

Their discussion of nonlinearities I found incongruent with the massive amount of inference they, and the natural experiment community, perceive from their birthday-wage result. Sure, they know things aren't linear, but the bottom line is, if you ask them the value of schooling, economically, they will point to their 3% blip at 16 as if that implies something really generalizable. They know their logic, just have tendentious priorities (ie, result clever and elegant is #1, irrespective of other evidence outside their parochial experiment).

"think you guys are reading too much calculated ambiguity into my 'one could imagine' phrase."

No, we are identifying the fact that a large part of your article is a straw-man argument.

Really, the only thing I got from your article is the knowledge that someone named Eric Falkenstein doesn't know much about economteric methods, but likes to talk about them as if he does.

My favourite is this paragraph:"The precise methods of finding 'identifying restrictions' is not so important. It's the mainstay of rigorous research, but ultimately, if your result shows up only via coefficient t-stats in 2-stage least squares, but not a graph, it isn't there."

This says nothing other than "it's complicated so it is unreliable". Also, if there is a significant result, you can usually show it on a graph anyway (like Angrist and Krueger 1991). Basically, you said nothing valuable and just illustrated a general (but baseless) distrust of "high brow econometric methods". Very Palin-esque.

"The earnings are what they are, what's the difference for the kids who went to school an extra year?"

That is what they are trying to find out!

"Here, we see the problem with econometrics. The authors are very excited by the saw-tooth pattern that shows a significant birthday bump around December 31 for the Angrist and Krueger study, highlighting that staying in school an extra year provided a statistically significant bump in earnings."

Your comment about the saw tooth result reveals how little you understand IV. The sawtooth result says nothing about returns to schooling in and of itself. Rather, it shows that birthdays can be used to determine a control and treatment group from which you can do a comparison of earnings. The saw tooth patter shows that quarter of birth is related to number of years of schooling, not earnings directly.

Then there is this doozie:

"But the effect is small, about 3% effect on wages via the extra year for those kids born too late to skip their last grade. Sure, with enough observations it is significant, but not enough to change the world."

So results are only important if the found effects are large? What result would you have liked to see exactly? If it was 10% would you have been happier?

"Yet, it does not follow that staying in school an extra year at 16 implies going to college would also help."

You're right, it does not follow. It also doesn't follow that that the 3% bump means robins lay bluer eggs later in the month. So what? You are the one taking the result beyond the limits of proper interpretation. It says nothing about the result or their methods.

Finally your treatment of the O-G & S paper is laughable. The paper itself might be crap (I haven't read it) but the Liebowitz paper criticizes them for an improper implementation of the IV method. It is not an indictment of IV in general (which you seem to suggest).

Poor scholarship exists in every discipline, but your accusation that "econometrics as a science allows shoddy empirical work to hide behind pretentious techniques that try to avoid these issues" is totally baseless and is not supported by anything you have written in your post.

Also, Freakonomics is not an approach, it is the title of pop-economics book written for laymen. You did seem to express some distrust about using data from natural experiments, but didn't really give a reason.

Finally, just because you can list off technical sounding things like method of moments and t-stat does not indicate that you know anything about them.

To paraphrase: You allowed shoddy logic and argumentation to hide behind pretentious technical terms and totally avoided any relevant issue.

Wow, that was long.

PS. Econometrics is best described as a "social science" (and is treated as such by most econometricians). It is not a "science".

A blog is not an academic journal, obviously my critique was broader than warranted at some level (there's some good econometrics!). But ultimately, name one important economic fact that was discovered via IV, 2SLS, 3SLS, SUR, GMM or some other complicated technique.

Your critique was not broader than warranted, it was ill-informed incorrect and poorly argued.

Challenging me to name one important fact is juvenile. Econometric research is generally done to support some kind of policy design. Government policies (the ones we spend billions of dollars on) are informed by this research. Many people happen to think that this is particularly important part of economics.

Alas, the research doesn't lend itself well to factoids. I guess since such results will never make it onto a fact of the day calendar, the whole field is garbage.

That is not to say that such research is never publicly acknowledged as important. James Heckman won the Nobel Memorial prize for his work on empirical methods.

When I started commenting on this article I thought it was written by some cocky undergrad. Then I saw you were an author and was excited at the prospect of reading some kind of cogent defense of your blog.

But alas, I encountered a 6 six year old who knows a couple impressive words.

What a shame.

PS. GMM? Seriously? You realize OLS is a special case of the GMM. To dismiss GMM is to dismiss 99% of all the regressions ever done.

As a highly educated man, you should be embarrassed about the low standard of this blog entry.

GMM was very popular when I went to grad school in the early 90s. It extended OLS in a very complicated way (containing OLS as a special case as all complex refinements do), especially say evaluating Hansen-Jagannathan bounds in macro and finance. The bottom line is it hasn't delivered any great new insights. (It's now easy, as are many of these techniques, coming prepacked in software like EViews, so it's not difficult anymore, but still).

As an attorney would say to you petulant response to my question, "so your answer is, 'none'". That's the bottom line. The point of econometrics is not to get published--that's a means to an end--but to discover new, interesting, and important facts. The important fact here is, sophisticated econometrics is often counterproductive in discovering such insights, because people focus on their technique over the question, and then draw overbroad conclusions.

"The point of econometrics is not to get published--that's a means to an end--but to discover new, interesting, and important facts."

No. This is incorrect. The point of econometrics is to approximate unknowable values using available data. It is to further our understanding of cause and effect relationships. Some studies give more accurate but context specific results, some give less accurate, more generalizable results. None ever establish facts.

That you would demand a fact from a discipline dedicated to estimating things is hilarious.

"The important fact here is, sophisticated econometrics is often counterproductive in discovering such insights, because people focus on their technique over the question, and then draw overbroad conclusions." An attorney might say "that is not a fact".

There is good scholarship, appropriate applications of IV, RD and matching methods, and then there is poor scholarship, the inappropriate application of the aforementioned methods. To brand econometricians as blinded by the novelty of their approach is pretty funny considering you cite Liebowitz in your article (which demonstrates the exact opposite).

Really, when it comes to using poor methods to support overboard conclusions, your article stands out as a shining example.

Finally, demonstrating a familiarity with econometric methods is not a substitute for a logically consistent reasoning to support your argument.