Remember that I likened the method of maximum likelihood to trying to twist a bunch of dials (one for each team) so that a particular quantity is as big as possible. If you're looking at a season of 1A college football, you've got 119 dials, the thing you're trying to maximize has about 800 parts to it, and each of the dials directly controls about 12 of those parts.

Suppose you're twiddling with the Florida dial. In that mess of 800 factors, you see (R_flo / (R_flo + R_vandy)). Turning up the Florida dial increases that piece. So turn it up. Likewise, cranking the Florida dial increases the (R_flo / (R_flo + R_arkansas)) bit, so you turn it up some more. But then you notice there's a (R_auburn / (R_auburn + R_flo)) piece in there. Turning up the Florida dial decreases this part. You could counteract that by turning up the Auburn dial, but you know you're going to have to pay a price for that eventually because of the (R_georgia / (R_auburn + R_georgia)) piece, among others.

The point is, there is a place which is "just right" for the Florida dial. They won a lot of games, many of them against good teams (this creates big denominators), so you want to turn their dial up. But you can't turn it up too much, or else it will turn down that Auburn/Florida piece, to the detriment of the entire product.

Now consider Ohio State's dial. Turn it up. Now turn it up some more. Now turn it up some more. Keep turning it up and, because the Buckeyes never lost a game, you'll never run into any problem. There's nothing stopping you from turning Ohio State's dial up to infinity. You can always make that product bigger by turning Ohio State's dial up. Their rating has to be infinite.

That's OK, you say. Ohio State was undefeated and should be ranked first, right? Right, but then note that the same thinking applies to Boise State. They must, in a sense, necessarily be tied with Ohio State with an infinite rating. Is that what we want? Maybe, and maybe not, but I'm pretty sure most people don't want a system that mandates that undefeated teams always rank at the top no matter what.

But the plot thickens. Michigan's only loss was to Ohio State. So the only way it hurts you to turn up Michigan's dial is because of this term: (R_osu / (R_osu + R_mich)). But if Ohio State's ranking is infinite, then you can turn up Michigan's dial without penalty. And since they won all the rest of their games, turning up the Michigan dial helps increase the product. So Michigan, it turns out, needs an infinite rating as well, though not quite as big of an infinite rating as Ohio State's [yes, I'm getting sloppy with the infinities here --- my goal is to give an impression of the way things work, not to be mathematically precise].

Now who else needs an infinite rating? Wisconsin, whose only loss was to Michigan. Once Michigan's dial is jacked up to a gazillion, it doesn't hurt you much to jack Wisconsin's up to a few million.

Rather than start talking about the technicalities of this infinity business, let's just summarize with this: the method of maximum likelihood, in its purest form, mandates that, no matter what the schedules look like, the top ranked teams must be those that have never lost, or have only lost to teams that have never lost, or have only lost to teams that have only lost to teams that have never lost, or ....

In many situations --- basketball, baseball, NFL --- this isn't generally a problem. For college football, it's a huge problem. It's certainly defensible to have Michigan ranked ahead of Florida. But even setting aside Boise State, I don't know too many people who think Wisconsin should be ranked ahead of Florida. Further, if you wanted to rank all 706 college football teams, then any undefeated Division III or NAIA team would have to rank ahead of Florida too.

In my opinion, maximum likelihood is one of the best rating systems around: it has a sound theoretical basis, is relatively easy to understand, and produces what most people consider to be sensible results in most cases. But all models break in some situations and this one unfortunately happens to break right when and where it's needed most: at the top of the standings of a typical college football season.

But there are some ways to fix it.

One way is simply to count a win as a 99% win and 1% loss. How do you do that? Well, the easiest way to think about it is to pretend that every game is 100 games, 99 of which were won by the winner and one of which was won by the loser. Now Ohio State isn't 12-0; they're 1188-12. But the point is that they are now in the denominator of a few terms for which they are not also in the numerator. So their rating won't be infinite. If you do this with the pre-bowl 2006 college football data, you knock Wisconsin down to #9.

This practicality, however, is gained at the expense of elegance. In particular, why 99%? Why not count a win as 94% of a win, or 63%, or 99.99%? The higher that number is, the more your rating system will depend on wins and losses. The lower it is, the more it will depend on strength of schedule. As soon as it gets below 94%, for example, Florida starts to rank ahead of Ohio State. [Astute observers will at this point suggest varying that percentage according to the margin of victory: a 1-point win could count as 60% of a win, for example, while a 28-point win could count as 99% of a win. This indeed can be done --- and I'll do it in a future post --- but for now I'm playing by BCS rules: only Ws and Ls.]

An arbitrary parameter just jars my sensibilities. It might "work" (depending on what you mean by "work"), but it ruins the nice clean description of this method. I have seen a couple of academic papers that employ more complicated fixes, but they also have a parameter and no objective basis for determining what that parameter ought to be.

What I prefer is the simple fix proposed by David Mease. He simply introduces a dummy team and gives every team a win and a loss against that dummy team. Problem solved; now no team is undefeated and no team will have an infinite rating. If you find this a cludgy or arbitrary solution that ruins the theoretical beauty of the method, then you can read Mease's paper, where he explains how the introduction of the dummy team can serve as a set of Bayesian priors. If you're into that kind of thing.

Mease's ratings are among my favorites and, if I were running the BCS, they'd be a part of it. Now back to Peter Wolfe, whose ratings are included in the BCS and who uses something he describes as a maximum likelihood method. He does not specify exactly how he fixes the infinite rating problem. I keep meaning to email and ask him, but for some reason I only remember to do so every year around early December, and I figure he's probably got enough emails to deal with in early December.

I have tried putting in a dummy team. I've tried counting wins as P percent wins for various values of P. But I can't replicate the order of Wolfe's rankings. That might have to do with the fact that Wolfe ranks all 706 college football teams, whereas I'm only ranking the D1 teams (with an additional "generic 1AA team" included to soak up the games against 1AA teams.). Or he might have some elegant fix that I'm not aware of. Maybe in February or March I'll remember to email him and ask.

This entry was posted on Friday, December 15th, 2006 at 5:37 am and is filed under BCS, Statgeekery.
You can follow any responses to this entry through the RSS 2.0 feed.
Both comments and pings are currently closed.

Consider a two-team league where A and B have played once so far, and A was victorious.

If we want to predict the result of their next game, estimating A's chance of winning by maximizing A/(A+B) (i.e., by setting B equal to zero) is clearly inferior to other options. [An obvious improvement for the NFL would be to look at recent instances of teams facing each other twice in the same season, and to set A equal to the percentage of sweeps and to set B equal to the percentage of splits.]

But BCS-style rankings aren't supposed to be predictive, so to point out a way to increase their predictive value is not necessarily to point out a potential improvement.

Suppose that A beat B because A got extraordinarily lucky. A recovered seven out of seven fumbles during the game; B uncharacteristically missed all seven of its field goal attempts; and A managed to squeak out a win at home by a very small margin despite being way on the short end of total yards, first downs, etc.

A very good predictive system may well make B the favorite in their next meeting.

But any retrodictive system must give A more credit for the outcome of the previous meeting than it gives B, and thus must crown A as the champion if the season ends after the first game.

That said, if sweeps occur 33% more often than splits (i.e., 60% to 40%), I think even a retrodictive system would benefit from treating each win as .6 wins and .4 losses rather than 1 win and 0 losses for purposes of the maximum likelihood formula. (I don't know if .6 and .4 are the right values, or are even very close; but assuming they are close to the ratio of sweeps to splits, does using those values produce any obviously absurd results in the NCAA or NFL this year? Is "Florida > Ohio State" obviously absurd (from a retrodictive standpoint)? I don't think it's always absurd to retrodictively rank a 12-1 team ahead of a 12-0 team if their strength of schedule is sufficiently disparate. [Note: I don't follow college football at all anymore, so I'm not even sure that Florida's schedule was more difficult, but I don't know why else this sort of modified maximum likelihood rating would put it ahead of Ohio State.]

Here is my proposition: The best (i.e., most fair) retrodictive system is simply the best predictive system that uses no input other than wins and losses. [This is not provable, of course. But I can't think of anything obviously wrong with it.]

Corollary: If the predictive value of the maximum likelihood model increases by counting a win as 0.6 wins instead of as one win, then the retrodictive value of that model also increases by making that same modification.

why in the world would this fella, Wolfe, what to tell you his little secret. Here's the deal. His made up arbitrary way is no better or worse than any other made up arbitrary way and as soon as he reveals the secret formula, pencil eared wise guys like you will critize it, probably rightly so - and ultimately, someone in charge will listen to enough pencil ear geeks and get rid of his formula as part of the equation. And then what? Well, he is back to being a geek like the rest of us - and he's already done that - he has no intention of going back there. Going over Tight End variances at 3:00 to win a lousy FFL league week 15 game? I don't think so - boooooooring.

have you considered that the guy carries over previous year data, perhaps even weighted ever so slightly? thats what world cup soccer and rugby does to get their official rankings. just a thought. even if a team goes 40-0, if you go back 41 games, they aint infinity no more. If he produces a ranking very early into the season, I'd look into it.

No, he doesn't post any rankings until later in the season. I think he just posted his basketball rankings for the first time this season recently. Would it be possible to count wins over better teams at a higher percentage than wins over losing teams, or something like that?