Tuesday, November 01, 2005

If you follow baseball at all, you've probably heard of stats-guru Bill James and the many formulas he devised for analyzing baseball. In 1980, James suggested that a baseball team's winning percentage could be predicted based on the number of runs they scored and the number of runs they allowed. He devised a formula to calculate this, and called it the Pythagorean Method due to its resemblance to the famous geometry theorem. In its basic form, it looks like this:

Expected Wins = G * PF² / (PF² + PA²)

G=Total Games; PF=Points (or Runs) For; PA=Points Against.

The method isn't perfect, but it works remarkably well. Over a 162-game baseball season, the Pythagorean method predicts the final record for most teams within three games of their actual performance.

Several statisticians have taken the Pythagorean Method and applied it to other sports, including football. However, because a football season has vastly fewer games than baseball, the statistical sample size is a lot smaller, and thus the results aren't as precise. Despite the higher margin for error, it's still an interesting benchmark to gauge a team's performance against its expected win total.

One of the things that James realized was that the value of the exponent shouldn't necessarily be fixed at 2. As you change that value in the formula, moving it up or down, you can decrease the error rate and bring the projections closer to the actual results. For the sum total of Notre Dame football the most accurate exponent seems to be 1.8.I applied the Pythagorean Method against every Notre Dame football season from Rockne on. Hit the link for the complete table:

Some analysts like to explain the method as showing you which teams "over-achieved", but I'm not sure that nomenclature accurately reflects what the formula reveals. In my opinion, it's better to talk about teams as "over-performing" or "under-performing", because what the Pythagorean method really measures is how many games you were supposed to win based on a strict measurement of points scored and points given up; it's not a measurement of how good a team really is. Perhaps another way to talk about it is in terms of Fate: which teams were "luckiest", and which teams were snakebitten.

The funny thing is, you can spin this formula any number of ways. For instance, I'm sure no one's surprised to see Willingham's 2002 team on the Lucky end of things; that was a squad that won a lot of games with smoke and mirrors. Note, though, that Holtz's '88 team is even "luckier": wins of 19-17 and 31-30 probably had something to do with it. As you can see, the formula seems to imply less about the overall quality of the individual season, and more about the vagaries of lucky bounces, the agony of missed field goals, and the gray twilight of hard-fought, close games, where one team was just a little bit better that day.One thing that the method does show is that teams that under-perform their point differential tend to get better the following year, and teams that over-perform tend to get worse. This doesn't always hold true, but it's pretty common. For instance, check out the ten most under-performing teams from the Rockne era onwards:

Season

Coach

Games

PF

PA

Wins

Win%

Ex.W

Ex.W%

Diff.

Next Yr

Imprv.

1931

ANDERSON

9

215

40

6

.667

8.58

.954

-.287

.778

+.111

1965

PARSEGHIAN

10

270

73

7

.700

9.13

.913

-.213

.900

+.200

1981

FAUST

11

232

160

5

.455

7.27

.661

-.207

.545

+.091

1932

ANDERSON

9

255

31

7

.778

8.80

.978

-.200

.333

-.444

1925

ROCKNE

10

200

64

7

.700

8.86

.886

-.186

.900

+.200

1986

HOLTZ

11

299

219

5

.455

7.00

.637

-.182

.667

+.212

1922

ROCKNE

10

222

27

8

.800

9.78

.978

-.178

.900

+.100

1983

FAUST

12

316

177

7

.583

8.87

.739

-.156

.583

.000

1969

PARSEGHIAN

11

351

134

8

.727

9.35

.850

-.123

.909

+.182

1942

LEAHY

11

184

99

7

.636

8.29

.753

-.117

.900

+.264

Lots of improvement the following year. Likewise, the ten most over-performing, or "luckiest" teams usually suffered a letdown:

Season

Coach

Games

PF

PA

Wins

Win%

Ex.W

Ex.W%

Diff.

Next Yr

Imprv.

1933

ANDERSON

9

32

80

3

.333

1.45

.161

+.172

.667

+.333

1988

HOLTZ

12

393

156

12

1.000

10.09

.841

+.159

.923

-.077

1993

HOLTZ

12

427

215

11

.917

9.30

.775

+.142

.500

-.417

2002

WILLINGHAM

13

290

217

10

.769

8.16

.628

+.142

.417

-.353

1939

LAYDEN

9

100

73

7

.778

5.74

.638

+.140

.778

.000

1998

DAVIE

12

320

248

9

.750

7.35

.613

+.137

.417

-.333

2000

DAVIE

12

353

267

9

.750

7.48

.623

+.127

.455

-.295

1954

BRENNAN

10

231

115

9

.900

7.78

.778

+.122

.800

-.100

1989

HOLTZ

13

427

189

12

.923

10.56

.813

+.110

.750

-.173

1990

HOLTZ

12

359

259

9

.750

7.71

.643

+.107

.769

+.019

How lucky will Charlie be this year? Let's look at the season to date, and extrapolate scores of the remaining games.

Right now, we're about 11 points better on offense than our opponents usually give up, and about 7 points better on defense. Applying those averages against what our remaining opponents usually do gives us this set of predictions:

Opponent

Avg PA

Avg PF

ND

Opp

Tennessee

16

16.14

27

9

Navy

25.71

29.86

37

22

Syracuse

24.13

15.63

35

8

Stanford

29.14

27.71

40

20

So for the whole season, that gives us a total Points For of 401, and total Points Against of 231. Using those point totals, Pythagoras says we should expect 8 wins.

Season

Coach

Games

PF

PA

Wins

Win%

Ex.W

Ex.W%

Diff.

2005

WEIS

11

401

231

9

.818

8.03

.730

+.089

Eight wins expected, but we got 9. Pythagoras would say that Charlie was somewhat Lucky. Based on what I saw against Michigan State and Southern Cal, I might beg to differ.

All told, I think the Pythagorean Method falls into the category of "amusing statistical confection" rather than hardcore analysis. I wouldn't count on this to tell you anything earth-shattering, but it is kind of fun to look at the formula and the numbers it spits out. If you're interested in some further reading on the Pythagorean Method, here are a few links.