Ramblings of a software developer with a degree in bioinformatics. Agile development mixed with DNA sequencing - what could go wrong?

Wednesday, March 14, 2007

Sports and power ranking systems

I've had an interest for a long time in the science of sports team rankings, for various reasons, which was forcibly brought to mind when I was filling out my tournament bracket. I'm always extremely mediocre in such game-picking contests, and when one blogger whom I respect said something similar, I started to think about how rankings could be done for college basketball. I looked a little closer at the Pomeroy Rankings, which seem pretty nice, although if Ken reveals his exact formula for creating them, I couldn't determine it. IMO a ranking system can't be taken seriously if the method that is used isn't known. Take Jeff Sagarin: Everyone always prints his rankings up very seriously, but we don't know what he's doing, so he might as well be making them up and just pretending it's math.

But Ken has something much more valuable on his site than a ranking system: a game database. For the most part, this information is not available in any easy-to-get-at-form, so if you want to create the rankings, you have to get down and do the data entry every year, which is why I've never created any system that lasted more than a year. But now, with Ken's files, maybe something useful could be done.

So I did a little research, thinking that the most effective system probably was going to be some kind of balance between a single-game Pythagorean expectation and strength of the opponent, repeating until the numbers converged. I'm sure I read a paper about that some years ago, but I can't find it now. Instead, I found this, a technique which doesn't take into account the scores at all!

But it's interesting, because it's based on the age-old theory of game commutativity; to wit: my team beat team X and team X beat your team, so my team is better than yours. Yah. It's a principle that's been widely derided for years, and people make hobbies out of finding weird cycles of games proving that Prairie View A&M is really better than Michigan after all. But there's obviously a kernel of truth in it. The paper goes into a lot of detail about setting up the graphs and putting weights on things and, you know, math, but really the principle is pretty simple. It works like this:

For each game that my team wins, it gets partial credit for each win the team it beat has.For each game that my team loses, it gets partial debit for each loss the team it lost to has.

That's it. The questions are, do you want to go deeper and credit my team for a third or fourth level, and just how much credit do you give for each "indirect win"? The second question is easier for our purposes, because the authors of the paper do a lot more of that math stuff and come up with a simple equation for us:

Let k equal the average number of games played by each team.The credit is (2k) / ((k^2) - k ).

For a third level, you'd square the credit, etc. But do you want to do the third level? Say the credit is .1, or 10% of a win. For the third level the credit would be .01, which doesn't seem like much, but you're talking quite a few games, too. So I'm going to have to use Ken's game database and do some research on this. Any code I create will be open-source, of course. I won't be able to do anything useful before this year's games start, but next year, watch out!