Wednesday, August 29, 2007

Preseason wins and losses are essentially meaningless, but maybe team performance, as measured by things like run efficiency and pass efficiency, still has some information whose accuracy carries over into the regular season. Intuitively, first-string players are still playing first-string players, so some true reflections of skill are bound to show up. So I decided to take preseason box scores from 1997-2006, and see what the stats show about teams' regular season performance. Please note that separating out first-stringer offensive stats only is too time-consuming (and possibly for something not that useful), so I'm just using total team stats for the game. Because of missing stats and the questionable usefulness of those stats, I've dropped punt returns, kick returns, and penalty first downs from my models.

Average preseason and regular season league averages

Stat

Preseason

Regular Season

R

3.8239

4.0708

P

5.5977

5.8866

SR

0.069901

0.068252

3C

0.37317

0.3777

PY

61.886

54.266

IR

0.026319

0.029888

FR

0.04069

0.031635

Correlation of preseason league average with regular season league average

Simply put, the preseason game is appreciably different than the regular season game. The first thing you notice is that the preseason favors defense. Yards per play is lower, sack rates are higher, fewer third downs are converted, and fumble rates are higher. On the other hand, interception rates are lower. More penalties are called. Perhaps coaches are more conservative on offense, saving most of their plays for the regular season. Maybe it's because coaches give almost all of their QBs playing time.

Meanwhile, pass yards per play, penalty yards per game, and fumble rate are the only three stats whose preseason averages correlate significantly with regular season averages, but only penalty yards has a significant p-value. The p-value is essentially the probability that a correlation coefficient that extreme could be achieved with entirely random inputs. Usually, the level for a stat to be considered significant is at 5% or less. Given the small sample size of postseason, we'll excuse the higher p-values of fumble rates and pass efficiency. Preseason average rushing efficiency, meanwhile, has essentially no correlation with regular season average rushing efficiency. If the preseason has little meaning on a league-wide level, then how does it fare on a team scope?

Correlation of preseason stats with regular season stats, Unadj. VOLA

Stat

Corr. coef.

P-value

RO

0.18156

0.0012547

RD

0.15345

0.0065266

PO

0.28506

2.8993e-007

PD

0.25502

4.8902e-006

SRM

0.10118

0.073865

SRA

0.24271

1.4125e-005

3CM

0.20165

0.00033031

3CA

0.17538

0.0018427

PY

0.22266

7.0842e-005

IRG

0.10879

0.054507

IRT

0.118

0.036931

FRG

0.047389

0.40343

FRT

0.11199

0.047738

O=Offense, D=Defense, M=Made, A=Allowed, G=Given, T=Taken

Offensive performance seems to correlate better overall between preseason and regular season than defensive performance, with turnover rates being the only exception. Surprisingly, almost all of the p-values are below 5%, with fumble rate given being the only significant exception. In other words, it's highly unlikely that random inputs could create similar correlation coefficients, so it's safe to assume that overall team preseason performance means something, just not much. If your team does well in the preseason, that's great, but it's hardly a guarantee of success. If your team does poorly, it's really not all that much to sweat about. Of course, this meets our expectations because second-string and third-string players get playing time they won't get in the regular season. If someone wants to take the time to sort through the box scores to figure out the efficiency stats for first stringers, they can be my guest. It's questionable how much the correlation coefficients would actually improve. Similar results can be seen with the correlation coefficients of preseason stats with regular season wins.

Correlation of preseason stats with regular season wins, Unadj. VOLA

Stat

Corr. coef.

P-value

RO

0.03494

0.53798

RD

0.049808

0.37983

PO

0.18824

0.00081724

PD

0.18979

0.00073831

SRM

0.087023

0.12445

SRA

0.14418

0.01065

3CM

0.14428

0.010596

3CA

0.21909

9.2999e-005

PY

0.054488

0.33663

IRG

0.11554

0.04107

IRT

0.029016

0.60908

FRG

-0.023628

0.67711

FRT

0.093994

0.096927

Out of curiosity, I decided to create a linear regression model of regular season win totals using the following preseason stats: pass efficiency, sack rates, and third down conversion rates. With 1997-2006 stats, I tested on each year in 1998-2006, using all previous years as training data. On average, the predicted win totals have a correlation of 0.30593 with the actual win totals, not very high. The yearly average of mean absolute error was 3.1519 games, about twice what it is when using regular season stats. The average R2 was 0.67495, which took me by surprise a little. 67.945% of the variance is accounted for by this data, compared to 79% for the regular season stats? I was expecting 40-50% tops.

What's more interesting about the model, however, is its ability to predict which teams will improve/decline the following season. In a manner similar to what I did here, I looked at which teams exceeded or fell short of their predicted win totals by more than the mean absolute error. Teams that outperformed their projected win total based on preseason stats are predicted to decline the next year, and teams that underperformed their projected win total are predicted to improve. Because there is some positive correlation between regular season and preseason stats, I expected some of the success using regular season stats to carry over. What I found, however, was that the model with preseason stats is slightly more accurate than the model with regular season stats. This might be a result of noise created by the extra inputs in the regular season stats model (e.g. kick and punt returns).

Preseason seems to have some useful meaning then. On the other hand, we're talking about a 6-game range that of which a team has to fall outside for it to be a faller/riser. If the projection is 8 wins (average), a team could be bad (5 wins) or very good (11 wins) and still be within the average error. The method predicts about 6-7 risers and 6-7 fallers every year, so it's accurately predicting 8-9 teams to improve/decline each year. That's pretty good, I think. Without further ado, here are the projected risers and fallers for 2007:

Risers

Houston Texans (9.7623 expected wins vs. 6 actual wins)

Jacksonville Jaguars (12.371 vs. 8)

Oakland Raiders (10.005 vs. 2)

Dallas Cowboys (15.354 vs. 9)

New York Giants (11.979 vs. 8)

Tampa Bay Buccaneers (8.6278 vs. 4)

Based on the accuracy and what other projection models have shown, I'd pick Jacksonville, Oakland, Dallas, and Tampa Bay as the ones to actually improve.

Fallers

New York Jets (4.9253 expected wins vs. 10 actual wins)

Baltimore Ravens (8.2891 vs. 13)

Kansas City Chiefs (2.9868 vs. 9)

Chicago Bears (8.6448 vs. 13)

New Orleans Saints (5.9049 vs. 10)

San Francisco 49ers (4.1093 vs. 7)

Seattle Seahawks (2.0842 vs. 9)

Of these, I'd pick the Jets, Ravens, Chiefs, and Bears to decline. It really could go either way with the 49ers and Seahawks. That division is chaos.

After jumping through some hoops, I do seem to have found some relevance to the preseason. But it's nothing you couldn't find using regular season performance. As intuition would tell you, preseason performance is only slightly indicative of regular season performance.

5 comments:

At first I thought you might be on to something big here. I thought we could isolate pre-season "starting squad" stats by using known starting RB and QB individual stats as proxies for team proficiency. Then regress those stats onto following regular season win totals.

We might get a fairly good projection of regular season performance. But then I thought, that's a lot of work and it's not likely to beat 1)conventional wisdom or 2)last years wins.

A second thought was we could look at the pre-season of "surprise" teams, such as last year's Saints, for indications they'd be very good in the regular season. If we saw similar patterns in other teams this pre-season, we might be able to make some "out on a limb" predictions.

Also, keep in mind a 4-game sample is affected very strongly by opponent strength.

I actually tried "starting squad" stats after week 3 and plug those into the regular season win totals regression model. That's Part I, which I plan on redoing now that the preseason's over. The sample size is not even 4 games. It's closer to one and change because most of the starters don't play that long, so clearly opponent effects and "luck" are going to play an inordinately large factor in the projections.

Will not plan on the purchase of a house inside Houston as an example with one of their particular loans, you must look not in the more intensely populated areas and in the direction of some of the smaller sized towns [url=http://www.jhgfpaydayloans.co.uk/]payday loans[/url] payday loan Besides the levy paperwork, there are additional money associated paperwork that you simply keep: http://www.ukbunnyloans.co.uk/

These people are capable of paying their lending products back devoid of falling further into debts Anthony Clarke Thus, people go on with the choice of choosing perfectly appropriate loan program for them

Special Content

About the Author

My degree is in computer science, and the football research started as an independent study in artificial neural networks. As a lifelong NFL fan, I wanted to explore the relative importance of different factors in winning games. Since the research is still nascent, I wanted to put it out in the public domain and hopefully find others interested in teaming up. Once it becomes profitable, though... I just hope the mafia families running Vegas don't come to hurt me.