Sunday, July 1, 2007

One question that's been bugging me for the last few months is why home teams did so well in 2005 and so poorly in 2006. When you consider the Saints home games in 2005, the numbers seem even stranger. Is the variance for all types of matchups, or is there one specific type of matchups that has more inherent variance? Since 2002, when the league expanded to 32 teams, the scheduling for the NFL became very simplified with 16 teams in each conference and 4 teams in each division. Now, each team in the AFC plays the teams in one division of the NFC once every four years and vice versa. So interconference schedules for each team changed drastically in terms of opponent quality and regions of the country visited (and thus climates experienced). The same holds true, though to a lesser extent, for intraconference, interdivision play, as teams still have to play at least one team from each other division within the conference a year.

So the hypothesis is as follows: The scheduling procedure implemented by the NFL since the league's expansion to 32 teams has made home field advantage less stable from year-to-year, increasing the varaince of home team winning percentage and average result (home team points - away team points). Interconference games have the most year-to-year variance, followed by interdivisional games. Intradivisional games should have the least variance. The interconference games are largely responsible for the aberrant numbers for the 2005 and 2006 seasons.

Home Win % (mean/std. dev.)

Interconference

Interdivision

Intradivision

1994-2001

59.199/4.74

60.466/4.42

58.137/3.57

2002-2006

60.625/7.19

56.875/2.40

56.25/2.21

Avg Result (mean/std. dev.)

Interconference

Interdivision

Intradivision

1994-2001

3.0489/1.8585

2.0519/0.910

2.7675/1.228

2002-2006

2.6875/2.5992

2.9896/0.824

2.0354/0.816

As predicted, the standard deviation for home team winning percentage increased overall, but by category, only the standard deviation for interconference games increased, though it did so significantly. Before the realignment in 2002, each division had 4 or 5 or 6 teams, so the number of intradivisional games played by each team was not always even. Even in 1995-1998, when each of the six divisions had five teams, interdivisional games were tougher to schedule. Of the 8 non-intradivisional games, 4 were interconference, and 4 were interdivisional, meaning no team would be playing against every team in any other division. So for example, though the AFC East might be matched up with the AFC Central, the Bills might get an easier schedule against than the Dolphins because the Bills get to play the Bengals. The scheduling might also have been tooled around with to give worse teams easier schedules, resulting in some teams not meeting each other for many years, whereas the new system guarantees that won't happen. The closer teams are in quality, the more variance one would expect in the outcome. Therefore, it makes sense that the new scheduling formula decreases variance for interdivision and intradivision games.

Nevertheless, in both time periods, variance was highest for interconference games, followed by interdivision games and then intradivision games. Suprisingly, home field advantage seems to have lost some value since the realignment. Fewer games are won by the home team and by fewer points. With fewer teams in the division, intradivisional games might involve more parity and thus more variance in outcomes. The slight uptick in home team winning percentage for interconference games might have to do with the imbalance between the conferences. Though fewer interdivisional games are won by the home team, the average result has increased in favor of the home team by nearly a whole point. The converse is true for interconference games. In both cases, I'm not really sure why that happens with the average result. At any rate, home field advantage ain't what it used to be, so I might have to go back and rerun experiments training only on 2002 and beyond.

All Games

Interconference

Year

Games

Home Win%

Avg Result

Games

Home Win%

Avg Result

1994

224

0.57143

1.4598

52

0.51923

0.05769

1995

240

0.6

2.025

60

0.65

3.0667

1996

240

0.62083

3.7208

60

0.6

3.4833

1997

240

0.60417

2.7958

60

0.61667

3.9167

1998

240

0.62917

3.5042

60

0.61667

3.4333

1999

248

0.59677

3.0645

60

0.56667

2.1167

2000

248

0.55645

2.8226

60

0.63333

6.4833

2001

248

0.55242

2.0444

60

0.53333

1.8333

2002

256

0.57813

2.2461

64

0.65625

3.9375

2003

256

0.61328

3.5313

64

0.65625

4.125

2004

256

0.56641

2.5078

64

0.5625

2

2005

256

0.58984

3.6484

64

0.65625

4.9219

2006

256

0.53125

0.84766

64

0.5

-1.5469

Interdivision

Intradivision

Year

Games

Home Win%

Avg Result

Games

Home Win%

Avg Result

1994

68

0.64706

3.3824

104

0.54808

0.90385

1995

60

0.56667

1.3

120

0.59167

1.8667

1996

60

0.63333

2.2

120

0.625

4.6

1997

60

0.61667

1.4333

120

0.59167

2.9167

1998

60

0.66667

2.6167

120

0.61667

3.9833

1999

58

0.60345

3.1379

130

0.60769

3.4692

2000

58

0.55172

1.2586

130

0.52308

1.8308

2001

58

0.55172

1.0862

130

0.56154

2.5692

2002

96

0.54167

2.1563

96

0.5625

1.2083

2003

96

0.60417

3.7083

96

0.59375

2.9583

2004

96

0.57292

3.0833

96

0.5625

2.2708

2005

96

0.57292

3.8646

96

0.5625

2.5833

2006

96

0.55208

2.1354

96

0.53125

1.1563

Now, let's look at how the numbers break down by season. In 2005, although 58.98% of games are won by the home team, which is about average, the average result is very high at 3.6484. Only 1996 had a higher average result, so it's at the extremes of what's been observed before. The interconference games had an average result of 4.9219. On average, the home team won those games by nearly 5 points, which is very high, but it is still within what has been observed before. In 2000, the average result was 6.48333. The average result of interdivisional games was the highest in 2005 at 3.8646, while the home field advantage in intradivisional games was slightly below average that year. In 2006, interconference games made all the difference. Only 50% of the games were won by the home team, but the average result was actually in favor of the away team at -1.5469. The numbers for the intraconference games, while well below average, did not set any record lows. Given this data, it is reasonably safe to say that the year-to-year variance in home field advantage is largely due to interconference games.

In theory, what's happening is that as interconference matchups are rotated, strong teams are getting matched up with weak opponents. So some of this variance should be predictable. In 2006, only 40.63% of home teams in interconference games had better records than their opponents in the previous season. In 2002-2005, the numbers were 46.88%, 48.44%, 43.75%, and 43.75% respectively. The correlation of this stat to the proportion of interconference games won by the home team in that year is very strong, 0.82427, though 2005 was still better than expected, given 2004. Five data points is too small to reasonably use linear regression, but we can still take a guess at how 2007 will turn out. It turns out that the 2007 stat matches 2003 at 48.44%, so expect home field advantage to return to at least normal levels in 2007.

So how does a prediction system like the spread handle interconference and interdivision games? Since 2002, the spread has had 63.75%, 63.96%, and 66.25% accuracy on interconference, interdivision, and intradivision games respectively, while the home team was favored in 67.50%, 68.54% and 65.83% of those games. The standard deviations of percentages of favorites being home teams are 4.65%, 2.16%, and 2.74%. So the spread does seem to be sensitive to the variance but not strong enough. If similar numbers hold for my linear regression model, then perhaps better opponent adjustments are needed. Given that 75% of the season is played intraconference, I'm wondering if stats should be adjusted based on conference averages rather than league averages. I know they do similar things for baseball. It's something I'll tinker around with in the future. In short, I just traveled a long, long road for a maybe. As usual, answering one question led to several new questions popping up.

No comments:

Special Content

About the Author

My degree is in computer science, and the football research started as an independent study in artificial neural networks. As a lifelong NFL fan, I wanted to explore the relative importance of different factors in winning games. Since the research is still nascent, I wanted to put it out in the public domain and hopefully find others interested in teaming up. Once it becomes profitable, though... I just hope the mafia families running Vegas don't come to hurt me.