Friday, August 10, 2007

As discussed previously, football predictions (mine and others') are largely based on the averages of past performances. The past performances themselves are above and below the average by various amounts, and the future performance is almost certainly going to be above and below that mean by a certain amount. But we don't know whether the performance will be above or below or by how much. That's the limitation of using averages. If we have all of the observations from the season so far, can we do a better job of estimating performance? In other words, we want to estimate the following:

P(Xw=xw | Xw-1=xw-1,Xw-2=xw-2,...,X1=x1)

I'm pretty sure the expected value of performance would still be the mean as it's the sum of all possible outcomes weighted by their probabilities, but there's still value in knowing the probability distribution of the possible outcomes. If Teams A and B averaged 4.0 yards a carry, you might think they're about as good. But if Team A has a 60% chance of averaging less than 4.0 yards a carry in a game, while Team B's chance is only 35%, then it's clear that Team B is more reliable. Team B would be the better choice to bet on. This would also be useful for fantasy football, I suspect. If there's a 5% chance, a running back is going to go for 220 yards in today's game, you might start him despite his expected performance (the mean) maybe being 95 yards. Depends on your risk-aversity, however.

My current idea as to the method of doing this would be a hidden Markov Model. Given a series of observations over time 1 through t-1, it estimates the probability of what you'll observe at time t. I'd break down performance into certain ranges (5.75-6 yards a pass attempt, 6-6.25, etc.) and use those as the evidence variables. I'm thinking the unobserved state variables would be the average performance broken down into ranges in a similar way.

Before going to all the trouble of implementing it, I was curious to hear people's reactions and thoughts and suggestions. Plus, I'm moving next week, so I'll take the time to develop the idea while I get settled in.

2 comments:

I have never attempted Markov chain models before, but I know they are very appropriate for baseball.

From your second paragraph, it sounded more like you were going for a Bayesian approach. Each team would have its own distribution of yard expectancies as a prior. Every subsequent data point would refine the expectancy. But that's seriously high-powered stuff.

Special Content

About the Author

My degree is in computer science, and the football research started as an independent study in artificial neural networks. As a lifelong NFL fan, I wanted to explore the relative importance of different factors in winning games. Since the research is still nascent, I wanted to put it out in the public domain and hopefully find others interested in teaming up. Once it becomes profitable, though... I just hope the mafia families running Vegas don't come to hurt me.