Overfitting

One of the things I learned in math is that a polynomial of degree N can pass through N+1 arbitrary points. A straight line goes through any two points, a parabola goes through any three points, and so forth. The practical upshot of this is that if your equation is complex enough, you can fit it to any data set.

That’s basically what happened to the geocentric model: it started out simple, with planets going around the Earth in circles. Except that some of the planets wobbled a bit. So they added more terms to the equations to account for the wobbles. Then there turned out to be more wobbles on top of the first wobbles, and more terms had to be added to the equations to take those into account, and so on until the theory collapsed under its own weight. There wasn’t any physical mechanism or cause behind the epicycles (as these wobbles were called). They were just mathematical artifacts. And so, one could argue that the theory was simpler when it had fewer epicycles and didn’t explain all of the data, but also was less wrong.

Take another example (adapted from Russell Glasser, who got it from his CS instructor): let’s say you and I order a pizza, and it comes with olives. I hate olives and you love them, so we want to cut it up in such a way that we both get slices of the same size, but your slice has as many of the olives as possible, and mine have as few as possible. (And don’t tell me we could just order a half-olive pizza; I’m using this as another example.)

We could take a photo of the pizza, feed it into an algorithm that’ll find the position of each olive and come up with the best way to slice the pizza fairly, but with a maximum of olives on your slices.

The problem is, this tells us nothing about how to slice the next such pizza that we order. Unless there’s some reason to think that the olives on the next pizza will be laid out in some similar way on the next pizza, we can’t tell the pizza parlor how to slice it up when we place our next order.

In contrast, imagine if we’d looked at the pizza and said, “Hm. Looks like the cook is sloppy, and just tossed a handful of olives on the left side, without bothering to spread them around.” Then we could ask the parlor slice to slice it into wedges, and we have good odds of winding up with three slices with extra olives and three with minimal olives. Or if we’d found that the cook puts the olives in the middle and doesn’t spread them around. Then we could ask the parlor to slice the pizza into a grid; you take the middle pieces, and I’ll take the outside ones.

But our original super-optimal algorithm doesn’t allow us to do that: by trying to perfectly account for every single olive in that one pizza, it doesn’t help us at all in trying to predict the next pizza.

In The Signal and the Noise, Nate Silver calls this overfitting. It’s often tempting to overfit, because then you can say, “See! My theory of Economic Epicycles explains 29 of the last 30 recessions, as well as 85% of the changes in the Dow Jones Industrial Average!” But is this exciting new theory right? That is, does it help us figure out what the future holds; whether we’re looking at a slight economic dip, a recession, or a full-fledged depression?

We’ve probably all heard the one about how the Dow goes up and down along with skirt hems. Or that the performance of the Washington Redskins predicts the outcome of US presidential elections. Of course, there’s no reason to think that fashion designers control Wall Street, or that football players have special insight into politics. More importantly, it goes to show that if you dig long enough, you can find some data set that matches the one you’re looking at. And in this interconnected, online, googlable world, it’s easier than ever to find some data set that matches what you want to see.

These two examples are easy to see through, because there’s obviously no causal relationship between football and politics. But we humans are good at telling convincing stories. What if I told you that pizza sales (with or without olives) can help predict recessions? After all, when people have less spending money, they eat out less, and pizza sales suffer.

I just made this up, both the pizza example and the explanation. So it’s bogus, unless by some million-to-one chance I stumbled on something right. But it’s a lot more plausible than the skirt or football examples, and thus we need to be more careful before believing it.

Update: John Armstrong pointed out that the first paragraph should say “N+1”, not “N”.

Update 2: As if on cue, Wonkette helps demonstrate the problems with trying to explain too much in this post about Glenn Beck somehow managing to tie together John Kerry’s presence or absence on a boat, his wife’s seizure, and Hillary Clinton’s answering or not answering questions about Benghazi. Probably NSFW because hey, Wonkette. But also full of Glenn Beck-ey crazy.

4 thoughts on “Overfitting”

Slight nitpick: your dimensions are off-by-one in the opening graf. A polynomial of degree $N$ can match $N+1$ points. A line is degree-one, a parabola is degree-two, and so on. It doesn’t change your overall point, though.

It occurred to me while I was writing this that one way to ensure that you get all the olives on the pizza and I get none of them is to cut the pizza horizontally: you get the top half with all the toppings, and I get the bottom half of the crust.
I’ll leave it to the reader to figure out why this is a suboptimal solution that, I felt, added little of value to the discussion.