Please Do Not Throw Out the Record Books

It is the week of Thanksgiving, which means that we are in the midst of Rivalry Game season for college football. Of course, the most important game was played last Saturday, but I am given to understand that several other well-known contests will happen this Thursday through Saturday.

As with most facets of sports, we use plenty of cliches to describe our rivalries. One of my favorites is that we have to “throw the record books out,” because “anything can happen” in a rivalry game. The idea here is that underdogs are more likely to outperform expectations in a rivalry game setting. But is it true?

This cliche relies on the idea that we should throw the evidence of which team has been better over the course of the season out because one team, presumably the underdog, will be fired up and play differently than they had before. We will ignore the fact that the favorite, too, could be motivated, because cliches do not like to stand up to rigorous investigation. The second statement is definitionally true, but its triviality renders it uninformative. For instance, the statement “anything can happen, it’s a Tuesday,” is also true.

To test this lovely old cliche, I consulted Wikipedia’s list of college football rivalries. Wikipedia apparently thinks there are a lot more rivalries than I do, so I only included those that have been contested for a long time and are traditionally considered as such. It is fairly subjective, but like Potter Stewart, I think I know a rivalry when I see one. Clemson-South Carolina qualifies; Clemson-Boston College does not. My list ended up containing 48 annual games.

For an objective criteria of expectations, I used my friends at TeamRankings’ Predictive Power Rankings from the last eight seasons. Subtracting one team’s Power Ranking from another, plus or minus home field advantage, gives the expected point differential of the game. Initially, I had wanted to use point spreads, which tend to be the most accurate ex ante predictor, but it could be that oddsmakers or bettors are changing lines to reflect the rivalry game status. I used TeamRankings’ predictions because I needed the best system that has no knowledge of which games are rivalries.

My final sample consisted of 328 rivalry games from 2003 to 2011. The expected point differential ranged from 0.1 (several) to 35 (Vince Young’s 2005 Texas team over Texas A&M). I gave an underdog credit if they were able to outperform their expectation by even one point. The findings are summarized in the table below:

As you can see, underdogs have not outperformed their expectations in rivalry games. In fact, the slight majority of underdogs have done worse than expected. Even if you limit the sample to two touchdown underdogs, the percentage that outperform expectations is 47%. (Note that I took the Predicted Rankings from the week before the rivalry game was played, so I avoid any issues of using rankings that already contain the rivalry game outcome.)

But perhaps the cliche is true not because average underdog outperforms its expectation, but because there are more outliers (i.e. vast outperformance). For this to be true, the standard deviation of results from the expected margin would have to be greater in rivalry games than non-rivalry games. I compared the results of rivalry and non-rivalry November games, and found there to be no difference: the standard deviation from the predicted margin for rivalries was 15 points, and for non-rivalries was 16.2.

Rivalry games are fantastic, and one of the reasons I love college football. They are tradition-laden and frequently exciting. They are not, however, statistically different from “non-rivalry” games or more likely to produce more upsets. When you inevitably read or hear someone telling you to “throw out the record books” when two teams meet this weekend, remember that throwing out data is usually not a good idea, and this is no exception.

the cliche isn’t that the underdog might “outperform expectations” but rather that it might win at a greater rate than a similarly positioned underdog without the “rivalry factor”. To test that, you would need to compare the rate of ‘upset wins’ in rivalry games (probably controlling for differing point spread ranges) vs the rate of upset wins in the population at large within those ranges.