Most Recent Extra Points

Secret Sauce Revisited

by Danny Tuccitto

Almost six years ago, Football Outsiders published a column by Bill Barnwell titled, "Why Doesn't Bill Polian's S--t Work in the Playoffs?" Eight days ago in the comments section of our DVOA ratings column, reader pm asked if we would be running an update of Barnwell's research. Well, it just so happens that, as part of FO's 10th year anniversary, we were already planning on running a series of articles throughout the offseason updating some of the seminal research findings that helped propel the site to where it is today, and form the basis of our "Pregame Show" essay. So I thought, "What the heck! Give the people a taste of what's to come."

As the title of Barnwell's column lays bare, his research was a football adaptation of Baseball Prospectus' "Why Doesn't Billy Beane's S--t Work in the Playoffs?" essay from the book Baseball Between the Numbers. The central conceit of both was that the untimely demise of highly seeded teams might be explained by the idea that predictors of regular season success are different from (and often in direct conflict with) predictors of postseason success. Both articles attempted to find some kind of postseason "secret sauce" -- areas where a team should improve if it wanted better playoff results than its record would otherwise forecast. (As an aside, if only he had known in January 2007 that Polian would later call us morons, reveal that he doesn't get statistics, and actually win the Super Bowl, Barnwell might have recast the title role.)

But of course, two crucial aspects of Barnwell's analysis have obviously changed since 2007. First, historical DVOA was only available for the 1997-2005 era back then, and we've since added it for 1989-1996 and 2006-2012. That development gives us more data to work with, and also allows us to see whether or not predictors of postseason success have changed over time. Second, we've normalized DVOA to put it in the context of a season-specific league environment, so even the data available to Barnwell at the time is more valid now than it was back then.

Besides including more data, I also made a couple of methodological improvements, one of which addresses issues with our measure of playoff success, while the other simply indulges the "hardcore stats" side of my brain. With respect to the former, Barnwell used a measure of playoff performance called Playoff Success Points (PSP), which he adapted to the NFL from what Baseball Prospectus used in their analysis of MLB. All Barnwell's NFL version entailed was assigning two points to each team for a home playoff win, three points for a road playoff win, and five points for a Super Bowl win. Using this system, for instance, three teams earned the maximum 14 points by winning three road games and the Super Bowl: the 2005 Pittsburgh Steelers, the 2007 New York Giants, and the 2010 Green Bay Packers.

PSP was fine as a first step in this kind of analysis, and Barnwell freely admitted it wasn't ideal, welcoming future improvements. Well, the future is now, and so I'm going to fix its biggest flaw: the assumption that playoff games are created equally from a win probability perspective. For starters, the fixed ratio of road win points to home win points implies that home teams win 60 percent of the time in the playoffs, and that this is true of every game. However, even in the 1997-2005 data set Barnwell used, home teams won 67 percent of the time, and win probabilities based on the Vegas line averaged 65 percent, ranging from 34 percent for the New Orleans Saints against the St. Louis Rams in 2000 (which the Saints won) to 89 percent for the Minnesota Vikings against the Arizona Cardinals in 1998 (which the Vikings won). Furthermore, these probabilities change from round to round. For instance, over that same time frame, home teams had a line-based expectation of 61 percent in the Wild Card round (winning 67 percent of the time), an expectation of 68 percent in the Divisional round (winning 78 percent), and an expectation of 65 percent in the Conference Championship round (winning 44 percent).

Piggybacking off that idea, a related problem is that having a static five-point reward for winning the Super Bowl implies that, at the start of the playoffs, every team has the same chance of doing so. Now, Barnwell specifically addressed that critique in the original piece -- and kudos to him for acknowledging it -- but I still think it doesn't pass muster. His argument was that it's reasonable for a team that wins three road games but loses the Super Bowl to score the same PSP as a team that wins two home games and the Super Bowl. And it does seem reasonable at first glance -- even with the common sense knowledge that it's harder for a No. 6 seed to make the Super Bowl than it is for a No. 1 seed to go all the way. On second glance, though, the question becomes, "Well, how much harder is it, exactly?"

Spend way too many hours of free time figuring out the math, and you learn the answer: It's about three times harder, and that renders Barnwell's PSP proposition unreasonable. Without boring you with details, if you assume that every home team wins 60 percent of the time and that the Super Bowl is a 50-50 game (both of which are wrong for any specific matchup of two teams, but they're what PSP assumes), and you plot out every possible trajectory for the six seeds in a conference, it turns out that the No. 1 seed has an 18 percent chance of winning the Super Bowl, whereas the No. 6 seed has a 6 percent chance of even getting there. And if you change the home-team assumption to 67 percent (i.e., to something slightly more in line with reality), the likelihoods diverge even more: 22 percent for the No. 1 seed winning the Super Bowl, but only 4 percent for the No. 6 seed winning its conference.

The fix for this involves allowing win probability to vary across games. The solution I devised is based on that old statistics standby: observed minus expected. First, I went to Pro Football Reference (PFR) and got all the necessary data for playoff games from 1990 to 2012. (Even though we have DVOA stats for 1989, I'm starting with 1990 because that's the year the NFL added a sixth playoff team in each conference.) Then, I used the model that PFR introduced this season, which is based on the Vegas line, to calculate each playoff team's win probability for each of the games they played. Next, I simply subtracted the number of games each team was expected to win from the number of games they actually won to produce a statistic we'll call "Observed Playoff Wins Minus Expected Playoff Wins (OPWMEPW)." Just kidding, let's go with "Playoff Success Added (PSA)?" What? That acronym's taken? Alright, fine, then it's "Playoff Wins Added (PWA)."

According to PWA, here are the 12 most overachieving and 12 most underachieving playoff teams since 1990:

Biggest Playoff Overachievers1990-2012

Year

Team

Wins

Exp Wins

PWA

2007

NYG

4

1.18

+2.82

2012

BAL

4

1.61

+2.39

2011

NYG

4

1.72

+2.28

2000

BAL

4

1.85

+2.15

1997

DEN

4

1.86

+2.14

2005

PIT

4

1.87

+2.13

2001

NE

3

0.97

+2.03

2010

GB

4

2.07

+1.93

1990

NYG

3

1.29

+1.71

2006

IND

4

2.34

+1.66

2002

TB

3

1.45

+1.55

2008

ARI

3

1.50

+1.50

Biggest Playoff Underachievers1990-2012

Year

Team

Wins

Exp Wins

PWA

1996

DEN

0

0.82

-0.82

2007

IND

0

0.79

-0.79

2008

CAR

0

0.77

-0.77

2010

NO

0

0.77

-0.77

2010

NE

0

0.76

-0.76

1995

SF

0

0.76

-0.76

2012

DEN

0

0.75

-0.75

2009

SD

0

0.75

-0.75

1996

BUF

0

0.74

-0.74

2005

IND

0

0.74

-0.74

2011

GB

0

0.72

-0.72

1995

KC

0

0.72

-0.72

You'll recall that the 2005 Steelers, 2007 Giants, and 2010 Packers scored the maximum according to PSP. And not surprisingly, each of them appears in the top 12 according to PWA. However, it's clear from the table that the Giants' run was much harder than those of the Steelers and Packers. In each of their four games, the Giants were no better than a 3-to-2 underdog, and they became bigger underdogs with each successive round: 40 percent at Tampa Bay in the Wild Card round, 30 percent at Dallas in the Divisional round, 29 percent at Green Bay in the NFC Championship game, and 18 percent against New England in the Super Bowl. Meanwhile, the Packers were no worse than a 3-to-2 underdog, and actually were favorites in both the NFC Championship game and the Super Bowl. Pittsburgh was also a favorite in two games, including the Super Bowl.

The ability of PWA to quantitatively differentiate between PSP peers is also an advantage on the other side of the ledger -- perhaps even more so. That's because, according to PSP, every team that doesn't win a playoff game gets 0 points, and there were 117 of them from 1990-2012. In other words, PSP considers over 40 percent of playoff teams in the past quarter-century to be equally bad even though we know that a No. 1 seed losing to a No. 6 seed in the Divisional round is much worse of an outcome than a No. 6 seed losing in the Wild Card round. To wit, 10 of the 12 biggest playoff underachievers according to PWA were heavy home favorites after a first-round bye; and that's the way it should be. The only two exceptions are the 2010 New Orleans Saints, who infamously lost to Beast Mode and the 7-9 Seattle Seahawks, and the 1996 Buffalo Bills, who lost as an 8.5-point home favorite to the same Jaguars team that crowned the Broncos as PWA's top underachiever the following week.

So with a more valid playoff success measure in tow, all that's left to do is calculate correlations between PWA and the hundred or so regular-season DVOA splits we have in our Premium database, and answer the following two questions:

For 1997-2005, does PWA come to the same general conclusions about what DVOA splits predicted playoff success as Barnwell's original analysis using PSP did?

Has the formula for playoff success (in terms of DVOA splits) changed over time?

To answer all of these questions, I added one methodological wrinkle because statistical inference tests are a crutch. Namely, in order for me to conclude that a DVOA split was predictive, the correlation had to have a p-value less than or equal to 0.05. So, without further ado, below is a table showing PWA correlations for each of the three time periods provided that at least one of them was statistically significant. It's sorted in the best way possible to delineate which DVOA splits were important during each time period, and the color of the shading corresponds to the correlation's level of significance (i.e., p ≤ 0.01 is darker green if this split leads to more playoff success and darker red if this split leads to less playoff success, p ≤ 0.05 is lighter green or lighter red, respectively, and nonsignificance is unshaded):

DVOA Split

1990-1996

1997-2005

2006-2012

Pass Defense, 1st Down

-0.006

-0.290

-0.069

Defense, 1st Down

-0.042

-0.260

-0.049

Run Defense, 2nd Down

0.105

-0.252

-0.050

Special Teams, Variance

-0.008

0.251

0.035

Defense, Red Zone

0.082

-0.249

-0.061

Run Defense, Unadjusted

0.084

-0.230

0.015

Defense, Away

0.064

-0.225

0.029

Special Teams, Punt Returns

0.402

0.219

-0.154

Defense, Unadjusted

0.169

-0.214

-0.026

Defense, Goal-to-Go

0.057

-0.211

0.102

Special Teams, Unadjusted

0.153

0.198

-0.002

Run Defense, Weeks 10-17

-0.010

-0.198

-0.029

Defense, Tied/Winning Small

0.137

-0.194

-0.048

Special Teams

0.188

0.190

0.037

Run Defense, Weighted

0.013

-0.187

0.019

DVOA Split

1990-1996

1997-2005

2006-2012

Offense, Goal-to-Go

0.357

-0.013

-0.205

Pass Offense, Weeks 1-9

0.273

-0.086

-0.213

Offense, Momentum

-0.266

-0.140

-0.137

Offense, Weeks 1-9

0.250

-0.075

-0.170

Offense, 3rd-and-Long

0.249

-0.116

-0.161

Offense, 2nd Down

0.248

-0.003

-0.265

Offense, Home

0.245

-0.009

-0.180

Offense, 2nd-and-Short

0.237

-0.007

-0.277

Defense, 2nd-and-Short

0.236

-0.052

-0.002

Special Teams, Weather Points

-0.228

0.020

-0.148

Offense, Winning Small

0.227

-0.078

-0.267

Pass Offense, 2nd Down

0.227

0.042

-0.262

Pass Offense

0.226

-0.062

-0.274

Offense, 1st Quarter

0.213

-0.037

-0.339

Pass Offense, Unadjusted

0.209

-0.091

-0.292

Offense

0.205

-0.081

-0.240

Offense, Unadjusted

0.201

-0.100

-0.264

Offense, 3rd Down

0.195

-0.116

-0.246

Offense, 2nd-and-Long

0.193

-0.047

-0.239

Pass Offense, 3rd Down

0.185

-0.118

-0.261

DVOA Split

1990-1996

1997-2005

2006-2012

Pass Offense, Weighted

0.141

-0.107

-0.292

Offense, 1st Half

0.165

-0.078

-0.280

Offense, Weighted

0.103

-0.124

-0.267

Pass Offense, Weeks 10-17

0.118

-0.017

-0.260

Pass Offense, Red Zone

0.107

0.049

-0.248

Offense, Weeks 10-17

0.091

-0.059

-0.245

Offense, Away

0.093

-0.133

-0.239

Total, Unadjusted

0.092

0.145

-0.225

Offense, Tied/Losing Small

0.067

-0.098

-0.222

Offense, Late & Close

0.165

-0.130

-0.220

Offense, 3rd-and-Short

0.027

-0.064

-0.216

(Before moving on, here are a few more notes about reading the table. First, remember that since DVOAs get lower as defenses get better, playoff success for better defensive DVOAs is shown by negative correlations. Second, the "Momentum" split is just the difference between the unit's weighted DVOA and unweighted DVOA, so a positive correlation means teams playing better towards the end of the season had more playoff success. Third, "Unadjusted" means VOA, which is not adjusted for opponents. Finally, "Special Teams, Weather Points" is a measure of how much weather and altitude was responsible for a team's success on special teams. It will be high for Denver and dome teams, and low for cold-weather teams other than Denver.)

With respect to comparing the results for 1997-2005 using PWA to those using PSP, the details are slightly different, but the general conclusion is the same. Of the 15 statistically significant correlations in that time period, none involved offense. To boot, the closest any offensive DVOA split came isn't even on the table because it also wasn't significant for the other two time periods (strength of schedule at -0.175). Focusing in on the DVOA splits that were predictive of PWA from 1997 to 2005, four matched up with Barnwell's column: First-down pass defense, first-down defense, red-zone defense, and away defense. This isn't surprising when you consider the teams that overachieved during that time. At -74.4%, the 2002 Tampa Bay Buccaneers remain the best first-down pass defense in DVOA history, and they posted +1.55 PWA during that postseason. The second-best pass defense from 1997 to 2005 was the 2000 Baltimore Ravens (waaaaaay behind the Bucs at -35.3% DVOA), and they ended up with the highest PWA of that era (+2.15).

In terms of our second question, the pattern of correlations leaves no doubt that the recipe for playoff success has changed over time: There wasn't a single DVOA split that was statistically significant across all three time periods, and only eight of the 46 in the table overlapped across two time periods.

What's more, it's as if we're looking at three distinct eras of different s--t working in the playoffs. (Again, Barnwell's timing was impeccable, writing his piece about DVOA correlations that ended up being mostly inapplicable to eras before or after.) I already discussed the "defense plus special teams" recipe for 1997-2005, so let's focus on the other two. From 1990 to 1996, playoff success was most enjoyed by teams with a good punt return unit and a good overall offenses that slumped towards the end of the regular season. The poster child for that era was its first champion and owner of its highest PWA (+1.71): the 1990 New York Giants. New York finished the regular season ranked seventh at 10.5% Offense DVOA, but their Weighted Offense DVOA was only 4.9% because of a -33.3% showing in a Week 13 loss to San Francisco (Breathe, Danny. Breathe.), and Dave Meggett propelled their punt return unit to a No. 1 finish (+10.3 net expected points).

The last seven postseasons have been like Bizarro 1990-1996: Offense is important again, but it's bad offenses that have won more games than expected. Of the 24 statistically significant correlations for this time period, only Total VOA doesn't have "offense" in the name. And yet, every one of those offensive DVOAs has a negative effect on playoff success. The 2007 New York Giants -- I'm sensing a theme here -- had the highest PWA since 1990 (not just from 2006 to 2012), but ranked 18th with -1.1% Offense DVOA. The 2009 New York Jets -- seriously, what's with the New York thing today -- amassed +1.06 PWA despite finishing the regular season at -12.5% Offense DVOA (ranked 22nd). Meanwhile, the 2010 New England Patriots, who currently own the second-best Offense DVOA of all time (so far) were one and done thanks to the N+1 incarnation of that Jets team despite being a 3-to-1 favorite to win the game. Finally, the team with the worst PWA of this era (-0.79) was the 2007 Indianapolis Colts, they of the 22.2% Offense DVOA and Divisional round exit at the hands of Billy Volek.

The fact that having a good offense -- especially a good pass offense -- seems to be a recipe for playoff failure these days is puzzling to me for two reasons. First, if that's the case, then why isn't having a good defense -- especially a good pass defense -- part of the recipe for success? Of 126 DVOA splits, the most influential defensive correlation ranks 28th (-.191 for front zone) and pass defense doesn't show up until 48th (-.129 for Weeks 10-17). Second, and more importantly, what the hell's going on out here? Anyone who is either in tune with NFL stat analysis or has happened to watch an NFL game over the past few years knows that having a high-octane pass offense, usually on the shoulders of an elite quarterback, is the shortest distance between showing up and winning. Over the past seven years, however, 14 of the 18 teams that finished the regular season in the top 3 of Pass Offense DVOA underachieved in the playoffs, including the last six No. 1s. Of course, in the irony to end all ironies, the only No. 1 pass offense to win the Super Bowl during this period was the 2006 Indianapolis Colts. After 2,500 words, it turns out Bill Polian's s--t comes up smelling like roses to this moron.

If I had to guess, I would point the finger at two culprits: sample size and missing variables. Regarding the former, my sample for the 2006-2012 correlations comprised 84 teams, and that's woefully small in this era of big data. That said, I had no problem finding that good offense led to playoff success from 1990 to 1996, which involved an identical sample size. Therefore, it's probably more an issue of missing variables. I've made a couple of crucial improvements to Barnwell's analysis, but what's really needed is to see how well regular-season DVOA splits autocorrelate with postseason DVOA splits. It might very well be that bad regular-season offenses are sleeping giants these days, raising their games (for whatever reason) during the playoffs. In other words, this might be a case of statistical mediation: Postseason success may very well depend on good offense, but it's the worse regular-season offenses that are more likely to be good come playoff time.

To finish things up, I'm going to apply what I've learned from this analysis to the 2013 playoffs. For fun, I'll apply it two different ways: (1) Assuming the 2006-2012 postseasons are the most predictive, and (2) assuming this postseason is best predicted by an amalgam of the previous 23. Under both assumptions, we don't know the final Vegas lines past the Wild Card round, so in the spirit of Baseball Prospectus' original foray into the topic, I'm just going to create a composite score for all 12 teams using the following method: (1) Include rankings only for those DVOA splits that were statistically significant over the time period assumed to be the most predictive, using some common sense discretion when it comes to overlapping splits; and (2) weight the included rankings by the magnitude of the correlation. (If you want more details about the weighting procedure, ask me in the comments.)

So, for the 2006-2012 method, after paring down the 24 statistically significant DVOA splits that appear in the table, I ended up with these 11 DVOA predictors of playoff success:

Offense, First Quarter

Pass Offense, Weighted

Offense, Winning Small

Pass Offense, Second Down

Pass Offense, Third Down

Pass Offense, Red Zone

Offense, Away

Total, Unadjusted

Offense, Late and Close

Offense, 3rd-and-Short

Pass Offense, Weeks 1-9

You might have read that and yelled, "But at least five of those involve things we know to be nonpredictive in general!" If so, you're right... in general. Some of these splits are kind of random, but we're talking about the playoffs, where randomness abounds. In small sample applications, I don't mind including things that tend to only work in a small sample.

For the second method, I'll use rankings for the following eight DVOA splits, which are based on a correlation analysis of data from 1990 to 2012 (listed from strongest to weakest significance, direction in parentheses):

Offense, Momentum (-)

Pass Defense, First Down (-)

Special Teams, Kickoff Returns (+)

Special Teams, Weighted (+)

Special Teams, Punt Returns (+)

Offense, Away (-)

Offense, Front Zone (-)

Pass Offense, Weighted (-)

Below is a table showing the 2013 playoff teams ranked from first to 12th in expected playoff success according to the two methods I just described:

Team

2006-2012

1990-2012

DEN

12

12

NE

6

8

CIN

4

6

IND

1

3

KC

2

2

SD

11

9

SEA

5

4

CAR

3

7

PHI

10

10

GB

7

1

SF

9

5

NO

8

11

In the AFC, the sans-Polian Indianapolis Colts are the team most likely to overachieve during this year's playoffs, especially now that they've eliminated the team second-most likely to overachieve. The Colts ranked in the bottom half of the NFL for each of the six most influential DVOA splits from 2006 to 2012 (see above) for which low rankings are advantageous: 16th in first-quarter offense, 20th in weighted pass offense, 20th in offense when winning by a touchdown or less, 16th in second-down offense, 22nd in third-down offense, and 24th in red-zone offense. They also ranked 15th or worse in four of the five bad-is-good splits I used from 1990 to 2012. Speaking of which, the full-sample method was bullish on the Chiefs because of their No. 1 Weighted Special Teams DVOA, their No. 1 punt return unit, and their No. 2 kick return unit. Too bad these "random" variables ended up playing far less of a role than another "random" variable: injuries.

Meanwhile, both methods hate the chances of both the Chargers and the Broncos, so the Patriots-Colts winner has the inside track to a Super Bowl berth -- at least according to this analysis. Of the 19 DVOA splits I'm looking at across both systems, Denver is on the wrong end of 18. For the 11 bad-is-good predictors based on 2006-2012, they rank fourth or better in all of them. And for the 1990-2012 predictors, the only bright spot is a slumping offense (-6.6% difference between Offense DVOA and Weighted DVOA), but that's offset by low rankings on two of the three good-is-good predictors: Weighted Special Teams DVOA and net expected points on punt returns. For San Diego, the main detractor is recent postseason history: The Chargers don't measure up well in 10 of those 11 predictors. The fact that they won this weekend (I think) tells us more about just how awful Cincinnati played than about San Diego's prospects going forward.

The results are far less clear-cut for the NFC, where the most concrete thing I can say is that the top two seeds have a slight edge, and -- in the reverse of the Colts-Chiefs situation -- New Orleans' elimination of Philadelphia anointed the Saints as the remaining team to most likely underachieve from here out. The Eagles got killed in both systems by having offense too good and special teams too poor. The Saints, meanwhile, get demerits in the 1990-2012 system for their special teams rankings (27th in Weighted Special Teams DVOA and 27th in net expected points on punt returns). And the 2006-2012 system does them no favors because they rank in the top half of the NFL in 10 of 11 bad-is-good DVOA splits.

Elsewhere, Green Bay's loss appears on the surface to be an indictment of the 2006-2012 system considering they were its most likely team to overachieve. And even in the 1990-2012 system, they were still a better bet than the 49ers. However, a few seconds of deep thought reveals that the systems were so high on the Packers mainly because of the bad offense they displayed during Aaron Rodgers' injury absence. No doubt, a full season of Rodgers would have placed Green Bay higher than 22nd in Weighted Pass Offense DVOA, which has been the second-most (negatively) influential split in recent times. Furthermore, the Packers' offensive momentum wouldn't have been -8.0% DVOA with a healthy Rodgers, and that's the No. 1 predictor -- albeit in a negative direction -- according to the full-sample correlations.

In closing, I'll go ahead and state a few things I've taken away from this research project. First, regardless of what you just read, the Colts remain long shots. Going back to the tedious math I mentioned a couple thousand words ago, the No. 4 seed is theoretically about 12-to-1 to even make it to the Super Bowl, and the Colts' win over Kansas City only increased those odds to 7-to-1. In other words, even giving credit to the two systems for correctly identifying Indianapolis as a playoff overachiever this postseason, that doesn't necessarily mean they're going all the way. Second, I'll reiterate what our dear leader has said before: Over time, parity has decreased during the regular season, but increased during the playoffs. To wit, this exercise has proven to me that, like Hulk Hogan was in the 80s, playoff success is really hard to pin down cleanly. Finally, even with the improvements I've made to Barnwell's original analysis, mine is only a second step. There are plenty of ways to make it even better (e.g., using a win probability model based on a multivariate logistic regression rather than the Vegas line). Lets work on that over the next 10 years, OK?

Posted by: Danny Tuccitto on 07 Jan 2014

144 comments, Last at
11 Jan 2014, 11:12pm by
Overfitting

Comments

seem to be the most obvious potentially important variable that's worth digging into. Both regular season as a whole, injuries down the stretch, and injuries for playoff games (at the least, I'd guess that most injuries are pre-game and are known in advance). I don't know how good AGL is, but it seems like a reasonable starting point.

Well, the analysis itself was based on a small sample. Also, the Colts had to win more than one game to win the Super Bowl.

Data mining theories like this can be fairly fragile when it comes to the addition of counter-examples. The basic problem is that there are so many possible theories that any small data set is going to support some of them, even when no real pattern should exist.

RickD wrote: so many possible theories that any small data set is going to support some of them

Yes, that compounded with our human proclivity to see patterns where none exist. We need explanations, so we invent them. And, Stevie Wonder sings "Superstition" in the background and our beer ads tell us "it's only crazy if it doesn't work".

Would there be some way to factor PWA into regular season DVOA in the context of each individual matchup? And, if yes, is there an argument that such an approach might yield a good representation of which teams were best situated to take advantage of their 'secret sauce' to marginally increase their chances for victory?

Hazy intuition: could it be true that defense and special teams are more important than offense in the playoffs because they're more important in close games and playoff games are more likely to be close? And that teams that win a lot of blowouts in the regular season are just .500 teams in close games like everyone else so they underperform their regular-season WP% in the playoffs when their opponent is non-blowout-able?

Again, I have no idea whether this makes even basic sense when it's quantified. I'm just brainstorming.

Yeah, and we'll see how Luck works with all this. His Colts are something like 16-2 in one-score games the past two seasons. 11 of them, of course, the famous 4Q come-from-behind games. That's kind of a ridiculous stat. Not like being 10-0 in your first ten playoff games, but it's a start.

Unless I missed this - did you do correlations across all 3 playoff time periods? so, don't chop up 1990-2012 into 3 different time periods, just treat it as one time period. are there are predictive factors then?

Given the three time periods are broadly the same, isn't it possible to just add the correlations together to get something vaguely akin to a total for each metric? That's what I just did. It reveals that the top 5 positive correlations are:

I'm sorry, but I found this entire exercise to be questionable at best.

"So with a more valid playoff success measure in tow, all that's left to do is calculate correlations between PWA and the hundred or so regular-season DVOA splits we have in our Premium database"

That feels like blindly throwing darts at a wall and drawing bulls eyes around the darts. First, you have the problem of unfettered data-mining. If you test 100 things, 5 will be "statistically significant" at the .05 level and 1 will be at the .01 level, just due to chance. But--and I'm no expert on this--are the DVOA splits normally distributed? The problem could be a lot worse if not. Also, there is probably a lot of correlation between certain specific splits, which seems like it would also exacerbate this problem.

"In terms of our second question, the pattern of correlations leaves no doubt that the recipe for playoff success has changed over time"

I disagree--"the pattern of correlations" I see only makes me think "random random random", not that any "recipe for playoff success has changed." First Down pass defense is correlated with success for one arbitrary time period and then not for another? What the hell do you do with that information? Especially when you've been testing 100 other splits? So I hate to say it, but I found this article shoddy in terms of statistical inference, and I left feeling like I haven't learned anything of value.

At the same time, this is the best football stats site probably on the internet...I only post this comment because you guys are the best and the level of debate here is so high.

One has to be careful with data mining of this sort. It's important to have a divide between your hypothesis generation and your hypothesis testing. It's typical in machine learning circles to use a subset of the data to generate hypotheses and then test them against the rest of the data. That way you're not simply over-valuing random artifacts.

RickD writes: It's typical in machine learning circles to use a subset of the data to generate hypotheses and then test them against the rest of the data.

And actually, by dividing the data up into different periods, that's actually what Danny did. The fact that the predicted correlations from one period did not match the correlations from another, suggest that our machine learning algorithm has not successfully learned. (At least that's what I understand from what I've read on the topic, not being a statistician, machine learning expert, or having any other credentials.)

I suspect that the issue, has to do more with Danny's definition of success, which I'll talk about in another reply.

I promised to write more on the definition of success. However, I think that is pretty well cover by numerous people, so I'll try to be brief. There are two main issues with PWA.

1) it compares the results to "Las Vegas Lines". That means that if the lines already properly compute success, the best one should do is break even. Note, I think the lines are more about getting even betting on both sides, so there may be some room for improvement, but it would be slight.

2) It defines success by "exceeding expectations". As many have noted, this skews the measurement toward rewarding teams with depressed expectations, rather than rewarding teams that won (even if they were expected to win).

As several people (me included) have mentioned a simpler metric based solely on wins would be more convincing. And, note, that should be the point of the article: presenting convincing evidence.

I would take it further, to allow DVOA like analysis, e.g. do the various DVOA metrics still properly correlate in the post-season. If they diverge, that would be news, Of course, it they diverge, the bar for proving that will be high, because you have small sample issues, and an ingrained belief that they shouldn't diverge.

I'm with you. This is pretty average work. The table of correlations makes me think that the playoffs are just random.

Also, I find this type of statistical work with dependent events (since one team winning a game necessitates another team losing). I took some statistics courses, but not to the degree of quite a few others on FO boards, so maybe I'm off base, but it just seems less sound to do this analysis with a small set of dependent events (Team X cannot win two games unless they win the first game, Team B cannot win three games unless Teams C-G lose a game, etc.)

The thing you're missing is that even if you conclude the playoffs are random, that is important in and of itself. Why? Because the regular season isn't. Not even close - you can actually show that very easily by just looking at the distribution of wins, and it's not even close to what you would expect from a random draw.

Since we know they actually are playing the same game in the playoffs (so it can't magically become purely random) it must mean that the playoff selection process produces some bias in the remaining teams that changes things.

If you believed that the playoff selection process was perfect, it could possibly be random. Imagine, for instance, a perfect BCS, selecting only the top 2 teams - if the talent distribution in the NFL was such that there isn't much difference between the #1 and #2 team, then you'd end up with a nearly-random playoff.

But the playoff selection process certainly isn't perfect - the division winners are almost random, in some sense: it could produce a very bad playoff contender. So how in the world can a selection process that lets in between 4-6 almost random teams convert a 'non-random' game into a 'random' game?

I think Danny's conclusion is spot on: that there's some variable that's not being measured that's driving playoff success.

He is starting with the Vegas line as a baseline. All this seems to show is that there's no secret sauce to beat Vegas. It's not at all showing that the playoffs are random. Just that playoff over/under-performance relative to the Vegas line is random.

Which means the Vegas line is appropriately calibrated, and if there's any additional information about what works better in the playoffs than the regular season it's already factored into the Vegas lines.

Adding to Dan S's comments, I may have missed something, but it seems like there's a another big flaw here.

Let's say that teams with great offenses tend to do well in the regular season and get high seeds in the playoffs. Therefore, teams with great offenses tend to have a relatively high number of expected wins. This means that if a great offensive team/high seed loses early in the playoffs it will score a relatively high negative PWA. At the same time, when a team with a great offense/high seed wins the Super Bowl, it will have accrued a relatively small (for a Super Bowl winner) PWA. Therefore, the ineffectiveness of great offenses is overstated when the team loses early and the effectiveness is understated when the team ultimately wins the Super Bowl.

Yup. Now that should balance out in the long run because the relatively high negative PWA shouldn't happen very often. However, we know that an unexpectedly high number of top seeds lost from 2007-2012, meaning top seeds accured lots of negative PWA during that timeframe, much more than we expected.

The question is: did those top seeds lose because something about the playoffs is different than the regular season, from which we based our expectations, or is it simply a randomly unlucky stretch that would have regressed to the mean if given more trials? The results would look the same either way.

I agree there is a problem with the basic logic of the scoring system. It seems you have created a measure of success that rewards the teams that perform badly in the regular season but still squeezes into the playoffs. This eliminates the effect of splits we know lead to success in the regular season, although these splits are very likely to be relevant in the playoffs as well. What you have left is a tiny bit of information and a lot of statistical noise, and it is impossible to distinguish.

I suggest you reward success points only weighted by home field advantage, which is pretty much undeniable, possibly awarding more points for more important games (conf finals and SB) based on some subjective definition of 'playoff success'. Award negative points for losses, zero for byes so you don't create another bias for wild card teams.

I am guessing the results will still be noisy, since an entire playoff season represents fewer games than a single week of the regular season, but you might come up with something a little better than 'bad offense is good'.

Or, we could just move on and everyone agree that playoff success is more likely for good teams but still pretty random.

Why is that a flaw, if the point of the exercise (as I understood it) is: why do some teams overperform/underperform in the playoffs? To answer that question, you have to categorize teams as Meets Expectations, Exceeds Expectations, and Does Not Meet Expectations. If a team that is rated the best team in football wins the Super Bowl, by definition it Meets Expectations. If it loses earlier, by definition that team has Underperformed.

Well and good, but the whole point of Secret Sauce (as I understand it) is not to function as a betting aid, but to predict which teams will succeed and fail in the playoffs. You might want something that will help you beat the spread, but I'm pretty sure that's not what Barnwell and the Baseball Prospectus article before him were trying to do, nor do I think that's what readers are looking for when they read a Secret Sauce article.

"which teams will succeed and fail in the playoffs"
Define 'succeed'.
The first gut feeling would say Wins and Losses are. And that a Super Bowl win is more success than a divisional win. But random chance is always a part of success or failure (if you define them with W's and L's) and a good game in the conference CG is just as good of a game as the same game in the Super Bowl.
I don't think 'Wins' is what you're chasing here. What you want to know is which teams perform well, playoff DVOA is what I would be looking for.

The question I'd ask is: What part of a team increases a team's playoff DVOA?

[EDIT] I read that Eddo is saying pretty much the same thing in post #19 and I agree with him totally.
Why would you want to measure if a team exceeds expectations when you're actually looking for the special ingredient of succes in the playoffs?

Agree completely.... this seems like a horrible case of trying to over-fit the data.

No reason to lump the eras into smaller sample sizes unless you think there was something rules wise that would make those eras significant theoretically.

So much of the rest of the over-analysis is similar.

It could have been a very nice article/analysis... look at some common large factors (E.g. OFF vs. DEF/ST)... then maybe look at Pass/Run splits. Doing so we could see if indeed there is different "predictors of post season success" then otherwise. You could also use theory to concoct other testable hypotheses (e.g. number of starters lost for playoffs might be an interesting, which would show teams that got healthy for playoff runs).

But this "throw spagetti against the wall to see if it sticks" method really calls the whole excercise into quesiton.

Another point--
Can you just tell us whether overall DVOA, defensive DVOA, offensive DVOA, and ST DVOA predict anything over your entire data sample? That's the most obvious thing to test but I didn't see it (and apologies if I missed it). Maybe it's not here because these obvious stats have no explanatory power.

Given that--maybe the only thing you can say about playoff success is that it's random as balls. Why do a bunch of questionable arithmetic acrobatics to convince yourself otherwise?

Actually, survival analysis would be a more appropriate statistical method for studying playoff success as the teams keep playing until they lose. It's pretty simple to plug in the teams stats to a multivariate survival model and use something like AIC to select the variables most important for playoff survival.

So a team with a great regular season offensive DVOA will be expected to succeed in the playoffs.

Here's the problem...you are then computing the "special sauce" by their observed playoff results relative to that expectation. In that case it's unsurprising that factors positively correlated with high "expected" values will be negatively correlated with "observed minus expected".

To say it colloquially, good offense means you should be favored in playoff matchups. Which makes it hard for good offenses to do better than expected.

This entire article isn't looking at what makes a team successful in the playoffs. It's about what makes a team more (or less) successful than people expect it to be (as measured by the Vegas lines). So it's as much about measuring what drives people to think highly of a team as it is about how well the team actually does.

I wonder if some of the cause of the negative recent trends you see on offense are tied to fantasy football. One effect of the increased popularity of fantasy football is that people are, in general, more knowledgeable about offensive players, and more likely to think highly of a team in real life that got lots of fantasy points. So offensive juggernauts with porous-defenses are more highly thought of than tough defenses that win a lot of 20-7 games. Hence, if a team that won all its games 34-27 or so goes up against a 20-7 defensive team, I would wonder if the Vegas lines would favor the offensive team more than they should, and hence we'll see a negative correlation in what you've done.

I think your methodology might be worth applying to simply playoff success--NOT playoff success compared to the expected success. Something closer to Barnwell's original methodology, but corrected for the more accurate win probabilities. I would bet that you would see very different predictive splits.

The baseball prospectus PSP score was not based on expectation at all, it merely assessed wins and losses and how far teams advanced in the playoffs. I think adding expectation to Barnwell's PSP and PWA was a bad idea.

Obviously it's harder to measure success in the NFL playoffs because there are fewer games and teams play an uneven number of games. But I think a better scoring system would based solely on how far teams advanced. In baseball, losing a playoff series 4-3 is better than losing 4-0, but otherwise that's all PSP is measuring- how far a team advanced.

So 1 point for advancing (via win or bye) to the divisional round, another point for advancing to the conference championships, another for making the Super Bowl and a fourth for winning it. Seems like that would solve the problem of removing expectation and the special advantage teams with a bye receive would seem to be deserved and therefore acceptable anyway.

Were there any patterns that were statistically significant over the full 1990-2012 sample? If not, my takeaway from this is that there is in fact no "secret sauce". I find that much more likely than the idea that the secret sauce exists, but changes completely every 7 years or so for no discernible reason. Random wins and losses happen, and if in a 6-8 year period a couple of those random winners have similar profiles (and/or random losers have opposite profiles) then you will see patterns emerge that have no meaning.

My biggest concern is that PWA [EDIT: removed parenthetical(*)] isn't measuring "what works in the playoffs", but rather "what works more in the playoffs than in the regular season".

Consider the extreme example of a team that was 16-0, won every game by 70 points, and wound up being considered 99% favorites in every playoff game by Vegas. Then, that same team goes out and wins its three playoff games by 70 points each.

Clearly, that team's shit works in the playoffs, as it dominated three opponents on its way to a championship. But its PWA is going to be very small, and thus whatever it does well wouldn't have much effect on the conclusions of this study.

Part of me thinks you should use VOA/DVOA or SRS in playoff games to determine playoff success. This would be a proxy for how well teams played once they got to the playoffs. Instead, what you're measuring is how well they do as compared to expectations, which isn't quite the same thing.

(*) I never really cared for Barnwell's PSP measure either, mainly due to its arbitrary assignment of value for various wins.

Another method for calculating PWA would be using Bill James's Log5 method, which Barnwell used when he calculated the probability of teams making the playoffs.

An alternative method for looking for the "secret sauce" would comparing the most successful teams to each other. You could (arbitrarily) group all the teams that won two or more playoff games and see if there was any stat that was similar across teams.

I think the overall topic is very interesting, but this particular analysis path is lacking. I'm not trying to jump on the pile here, so take this however you want.

Most importantly, I think you're confusing what correlation means. Correlation measures the extent to which there is a linear relationship between two variables. It is NOT the magnitude of the impact of one variable on another. Higher correlation means the relationship is more linear, NOT that the slope of the line is steeper.

Second, as (dan s) mentioned above, there must exist a lot of correlation between the DVOA splits you're using, which means you're not estimating "marginal" or "independent" effects of any one given split.

I think some sort of regression analysis would be best suited to address these issues.

Finally, this is just food for thought. It's possible that part of the effects you're describing here are driven by the process that determines favorites/lines in Vegas. In other words, it could be the case that the true "secret sauce" for success in the playoffs has not changed over time, rather that different qualities are valued differently by line setters in Vegas. For example, offensive performance has gained popularity in the past decade, which has caused strong offensive teams to be overrated by Vegas, making it more likely they under-perform in the playoffs.

There's no question that variables being colinear is a big issue here. This isn't meant to be the end of the discussion, certainly. This is why Danny says at the end, "this is only a second step."

I definitely would come to the conclusion that this research suggests that there is no such thing as secret sauce.

This is the kind of post that really brings out the trolls and the people who believe that nothing is worth doing unless it is perfect, but thank you to everyone providing useful constructive criticism (18 being one example).

Aaron - I don't think the large number of complaints here are just a bunch of "trolls"... we just think this article needed a healthy dose of editorial/analytical feedback before releasing... it is long, difficult to read, and because of all that, doesn't tell us much.

The reason folks are complaining so much is because the premise of the article is actually really good, and there are certainly interesting questions that could be asked here. But simple things, like a simple analyses of whether DEF is more important in the playoffs, seem left out. The science/process of this analysis was just REALLY BAD, and thge process used is why stats analyses can get bad names sometimes.

We hold you guys in FO to high standards because we know you can meet them. This was not your team's finest effort.

The biggest change in this site over the last couple years has been that Aaron has gone from assuming comments may have some valid concerns, to assuming that everyone who disagrees with him is a troll. I partially blame the CBS/ESPN cross-posting.

Frankly, its lead to some huge declines in the quality of analysis done. This place used to be about questioning the common wisdom thoughtfully. Now its about pedalling a different common wisdom.

Yep, generally way too defensive when any criticism arises. Although I enjoy and appreciate a lot of FO's work, I'm sure I'm not alone in thinking the best asset this site has is the community of posters and the general quality of the casual discussion they provide.

There are a couple of trollish comments on this thread (slating the article without offering any reasoning as to why it is wrong, or how it might be improved) but overall the standard of the comments are incredibly high given it is just a bunch of random internet people. The criticism is nearly all constructive, from people who clearly know what they are talking about. Dismissing them as trolls is ridiculous.

Agree 100% about the first part. You won't get more constructive comments than on this website, probably 80% of the comments on this thread are very well thought out and relevant ones. Yet they are being dismissed as a bunch of trolls with 1 or 2 relevant ones. This is just wrong, and make Aaron looks like a full of himself editor unable to accept criticism. And this is far from the first time it happened, sadly.

The level of interaction between writers and fans since the beginning of FO has diminished, and while I know this is due to happen with FO getting bigger and getting more exposure, but still.

For example, on this "secret sauce", in an ideal world, FO would have run an article saying, "We want to revisit the secret sauce article, this is what we have been thinking about: PWA, correlation analysis pwa/dvoa splits, etc. What do you guys think".
I am 100% sure the level of feedback in the comments would have been amazingly good, and would have allowed good writers/analysts such as Danny/Aaron to produce a great study and article off those. Now we are stuck with an article with some flaws (the biggest one being how PWA is supposed to measure success in the playoffs), which is not a big issue in itself, and the editor calling most of the comments trolls, which is one.

I think you are over-reacting to Aaron's troll comment, which in itself was (in my belief) a little stronger and more dismissive than he actually intended. There are some terse comments that could be taken as trolling, but lots of well reasoned replies that I cannot believe are simply being dismissed. Of course, some of those are a bit harsh, even though they are simply pointing out that PWA as a metric is not measuring what is claimed and that the statistics and presentation used aren't convincing. However, I definitely don't want the writers here to go into circle-the-wagons-mode and become overly defensive. Therefore, I think we should go out of our way, to not take the trolling comments personally, as I honestly believe it was not meant that way, and just "came out bad". Otherwise, we are just making it us-versus-them. I'd like to believe we are actually on the same side, trying to understand football at a deeper level.

I agree with your point that if this article had been presented as a discussion piece, it would probably have received fewer, "this is bad" without critique comments. Then, again, the internet has not improved general civility, so perhaps not.

I myself look forward to the rewritten article that it sounds like is in the works. I hope that it does not get scotched by the number of negative comments here and the amount of work that may be needed to address even just the PWA issue. One of the things I like about FO is that they do try different things and revisit them to improve them. It may take 4 or 5 follow-ons (e.g. in 5 years from now and so forth) to the "secret sauce" article before we have really gotten to conclusions that are even generally accept by this group, much less percolated into the general community.

Did you guys run a stepwise regression? There are certainly some issues with that but since you're looking for only the statistically significant splits, I would think you could develop a somewhat decent model rather than only looking at correlations.

Alternatively, maybe a simpler method along the lines of just predicting wins in the playoffs and what DVOA splits model that, if any. So a logistic regression with binary values for wins and losses. The reason this may make more sense is that you're looking at each game as basically an independent event as opposed to assuming past games have any effect on playoff results which Vegas lines use (which from the results of this article, it seems they don't).

Aaron, let me assure you that writing off customers, any customer, as 'trolls' is never a good idea. Especially not in the vicinity of other customers.
Sure, we all applaud the 'criticism template' for those who clearly not understand what this site is about. Then again; you should cherish customers with good intentions, even if the comments are not what you hoped for.

I re-read posts 1-19. All except 7 and 11 gave cogent responses with some useful content. 7 and 11 were just statements that the article was bad, which, although not very useful, is at least accurate.

The reason this "kind of post" brings out negative feedback is that it deserves negative feedback. The solution is to improve your quality control, not to bitch about trolls. I've enjoyed much of Danny's content. But this "analysis" should never have seen the light of day.

Since you crave specifics:

1) It "improves" its measure of success by destroying it. "Winning more often than Vegas predicts" is not a better measure of success than "Winning more often". The object of playing a football game is to win. If you want to get more fine grained than that, the object of each football play is to increase the chance of winning the game, usually by increasing the net expected value of the next score.
2) It sets as one of its goals the ability to assign different levels of success to teams that go one-and-done. To translate that into a clear example, it wants to be able to say "The Bengals were more successful than the Chiefs in the 2014 playoffs, because the Bengals were worse in the regular season". That is neither correct nor interesting.
3) It fails to understand that home team winning percentages in the playoffs are attributable to more than home field advantage. Teams are awarded better seeds because they have been more successful in the regular season.
4) It fails to understand that teams are underdogs precisely because Vegas thinks they are less likely to be successful.
5) Even if the goal of the article were to find a "secret sauce" for beating Vegas instead of just winning playoff games, it does this poorly. Why not just ask "did the team beat the spread"? That's what defines "success" to Vegas.
6) Having made all the above mistakes, it gets obviously strange results and uncritically accepts them.
7) Having accepted bogus results, it misinterprets them. "The fact that having a good offense -- especially a good pass offense -- seems to be a recipe for playoff failure these days is puzzling to me..." If you replace "playoff failure" with "under-performing Vega's expectations" it is not puzzling at all.

I could go on and on.

The best thing you could do for this article is to attach an "editor's note" admitting that the use of Vegas odds as a baseline invalidated the whole effort, then rewriting it using sensible measures of success. To wit:

1) A loss is exactly 0 success.
2) A win is 1 success, multiplied by an "importance" factor based on playoff round. That factor can be flat (a win is a win) or increasing (Super Bowl > Conference > Divisional > Wild Card) - but is made for a clear reason.
3) A portion of home wins are siphoned off to "regular season success" because that's when the seedings are earned.

Once you have a decent measure of playoff success, you can go searching for correlations. You'll still have a fluffy piece based on mining mostly noise, but you might find something useful. If you find a likely "secret sauce" ingredient, test it by seeing if it holds up in past playoffs. For example, if it looks like pass D on first down in the regular season is a strong predictor of playoff success, see if pass D on first downs in the playoffs correlates well, too.

Read all the other suggestions given in the comments here. Follow many of them if they look reasonable.

I think it is a mistake to give more playoff success points to teams that overachieved their expectations. The goal is winning the Super Bowl. Period. Why are we giving bonus points to teams that played poorly enough in the regular season to garner negative expectations? I think it makes far more sense to value success based on how far a team made it in the playoffs. For instance: Super Bowl win = 8 points, Conference win = 4 points, Division round win = 2 points, Wildcard win = 1 point, No wins = 0 points.

I don't think it makes any sense to say that the 2007 Giants were more successful than any other Super Bowl winning team. Also, tying postseason success to expectations this way is cooking the books against positive correlation with regular season statistics. The 2007 Giants had a low win expectation BECAUSE people saw what they did in the regular season. It feels like you're counting their poor regular season performance twice, once in their regular season DVOA splits and again in their playoff win expectancy.

Agree with this... a measure of how much teams over-performed their expectation is interesting, but not really related to the search for "secret sauce" to win championships which I thought was the premise of the analysis.

"Why are we giving bonus points to teams that played poorly enough in the regular season to garner negative expectations?"

The goal of the article was to see if there was a difference between regular season and postseason play, so it makes sense to see if teams that played relatively poorly in the regular season significantly overperformed in the playoffs, and then see if the overperformers had similar charateristics that indicated there was a difference.

Unfortunately the methodology will not find any such differences because the Vegas line presumably already factors in any differences in its lines.

My problem with this article is that it is billed as "What do successful playoff teams have in common?" That is not what the data was looking at. The data instead compares regular season performance with Vegas postseason betting lines. The question these tables actually answered is "What regular season attributes are over/under-valued in Vegas postseason betting lines?" Then the analysis that follows the tables was written as if the data addressed the first question.

I don't mean to be critical, because it's an impossibly difficult topic, but the information simply wasn't presented in a way that was helpful for understanding it.

Part of the reason everything is so muddled is because I have yet to year a reasonable reason why anything should matter more in the playoffs.

Are the playoffs really any less consistent than any given four week stretch of NFL games.

If we only looked at last 20 four week segments of regular season games, and then cherry picked only the games with the best teams playing each other so we had a roughly 80 game sample size, would it result in similar or different results than the playoffs.

Logically, I can't think of any reason why the playoffs are any more high pressure than early December games. They are usually "do or die" games for many teams too.

Do we know definitively the playoffs are different? Might they just feel different because of the one and done aspect? If the Chargers beat the Broncos in Week 15 it raises an eyebrow. If they do it this weekend IT MEANS SOMETHING!!!!

The only difference is that a single upset throws off the whole system because we lose the Broncos from the equation and they can't go 2-1, they can only go 0-1.

So you have a small sample, prone to be destabilized by a single outlier and viola! Chaos.

You bring up an issue that has always bugged me (and relates to "clutch"): Why should any one game mean any more to the players. They are paid millions, there is an incredible amount of pressure on them, scrutiny, hard work all year long, coming to a boil every Sunday... and do we think they can give the proverbial 110% come January? I suppose it's more likely that they are giving 90% before then and find some extra motivation, but my first question would be "why the hell are you slacking off for four months!?!?"

If I am out there busting my hump, risking my body (and brain), trying to make millions, and my teammate next to me is not pouring everything he has into it every freakin' day, he's gonna hear it from me. (I'm not a very cuddly person)

So to my mind, they are all relatively the same. Yes, you can let down your guard if you feel your opponent is a patsy, but otherwise, you should play 95-100% every Sunday, and in practices, too. Same for coaches. In Sept when you're 1-1, in December when you're 9-6 and fighting for a playoff spot or 15-0 and fighting for immortality. To me that's logical.

That being said, we're fickle mammals, we humans, and every player--100%--says that playoff football is different, it's a step up, tougher, more intense, etc. Are they just repeating the crap they're told? Do they not realize that implies they don't give it all from Sept - Dec?

As a lifelong Colts fan, well, the visual evidence is that, yes, things ARE different in the playoffs. I just don't understand how or why. ("are you going to believe me, or your lying eyes?") I think these articles help nudge me along the path to discover what MIGHT be different. I've only kept my sanity since about 2005 by saying they're largely a crapshoot.

"You bring up an issue that has always bugged me (and relates to "clutch"): Why should any one game mean any more to the players."

Why shouldn't it? Is salary the sole means of assessing motivation? Consider the NHL -- players actually make less money during the playoffs than during the regular season. Alexander Ovechkin makes around $110,000 per game played during the regular season. The Caps were eliminated in 7 games in Round 1; he made $10,000 total for those 7 games. Ask him which game that season meant the most to him. It wasn't game 7 of the regular season.

You're ignoring the value of pride, fame, infamy, internal factors, and all other reasons for playing games. You might as well wonder why all soldiers don't act like Medal of Honor winners full-time.

Well, most medal of honor winners are deceased, so there's that aspect. Most guys would like to be heroes, but come home alive. But I know what you mean and I agree--humans are not automatons... unfortunately. I did say that we're notoriously fickle--I meant behavioral variance as opposed to what our favorite flavor of ice cream is.

You would think that this variance follows a fairly normal distribution across the 45 actives for each team--each week 5 guys give 100%, 5 give 93%, 5 give 72.6% and so on to fill the bell curve. And that would be for each team, different guys each week (after all some are hurt, some have the flu, some have lost loved ones, and some are great) which would make that behavioral variance fairly vanilla. And if each guy can squeeze out a little more in the playoffs ("give 10% more than you did last week!"), then that would be expected to be the same for every team as well. In the playoffs.

Maybe the difference is coaching--the guys who can motivate their teams to give at least 90% every week, constantly strive to improve through the season, and give 100% when it matters most--those coaches are the secret sauce. Parcells was a famously good motivator, and pretty successful in the post-season.

Consider the Bulls of the NBA. Thibodeau is known for driving his players to full effort during the regular season. As such, the Bulls have been one of the most successful regular season teams over his tenure. He's also 3-3 in playoff series.

Gregg Popovich has never been terribly concerned about the regular season, to the point of holding most of his starting lineup out of games for rest purposes, so long as they are comfortably making the playoffs. He's won 4 rings.

Thibodeau's method doesn't work because it wears his players down with meaningless extra minutes over a grueling season and because when you play every game at 100%, you don't have any reserve capacity for when the other guys go from 85% effort to 95% effort. (Games where even great players are truly going full out are vanishingly rare)

The problem is that the nature of the games are different. The goal of the regular season is to make the playoffs. The goal of the playoffs is to avoid elimination. The strategy of managing effort and present-vs-future can be entirely different.

Exactly.
The issue wasn't Schottenheimer choking in the playoffs so much (although there may have been an element of conservative play calling that helped do him in), as he was overachieving in the regular season, often with mediocre quarterbacks.
Although in case I wouldn't characterize this as burning his teams out, rather maximizing the talent. I don't have an opinion on the Thibidou situation as to which it is.

I can't speak to what goes on with a player's psyche from week to week, but it seems the difference in football is that healthy players don't ever get a chance to rest (except maybe in week 17 if the team has nothing to play for).

Also, the larger point about the NBA is valid, but it's still worth noting that of the 3 series the Bulls have lost recently, 2 were to the Heat and 2 were in large part due to the absence of their best player (granted, last year's loss to the Heat almost certainly would have happened with Rose).

I think his point is that Rose would stay healthy with a coach who is, I don't want to say less demanding, but acknowledged that it might be better to play less that 100% in the regular season in exchange for playoff success.

The only reason for this article is that the Football Outsiders system somehow fails to predict innovations in the playoff outcomes.

A more reasonable question to ask would be: is it perhaps the case that Football Outsiders system generally fails to predict innovations?

To have a system with 20 predictive variables you need roughly 120 * 20 = 2400 samples / observations.

Considering the VERY HIGH bar of predicting better than the Vegas spread or even just beyond what some basic seeding model would suggest is a VERY LOFTY goal.

I am very skeptical that given the number of observations / samples available, that is possible.

You have 10K+ regular season games that are "In the FO database".

For playoffs, you have 11 games per season, so around 200 games total.

I think significantly more advanced statistical techniques are needed, or the objective function needs to be changed in order to obtain a statistically sound model.

An example of advanced statistical technique that could help in a situation like this would be resampling, e.g., particle filtering, markov chain monte carlo, etc.

And lastly, this problem might be impossible to solve, given the data. Imagine there are hidden variables such as employing slightly new plays, game-plans, schemes, etc, which are revealed during playoffs. I am not saying that a team goes and develops a completely new playbook, but there are some alterations that are enough so that a model can not capture.

"The fact that having a good offense -- especially a good pass offense -- seems to be a recipe for playoff failure these days is puzzling to me for two reasons. First, if that's the case, then why isn't having a good defense -- especially a good pass defense -- part of the recipe for success?"

Because after 2005 and the rule changes and emphasis changes and now that a collision between a 260-lb TE and a 170-DB is a personal foul on the DB, there just aren't any good pass defenses anymore?

It seems like we're still seeing teams whose relative strength is their defense win over teams whose relative strength is their offense.

First: I applaud what Danny's doing. But, just from observation, one problem I have is with the dependent variable itself. If you look at the extremes as he's listed, the left tail is far more extreme than the right. Indeed, i wouldn't be surprised if the distribution itself was either highly skewed, highly kurotic, or both. And that ultimately isn't too much a surprise. If I understand pwa correctly, the potential for pwa is higher if you advance while one and done if you don't.

I love math, but what most of what FO does is way over my head.
I do have a question, isn't this analysis just saying the Colts are the most likely to overachieve? Which isn't the same as most likely to win.

"PWA" is a horrible way to measure or define playoff success. It is a measure of the ability to be more successful than odd-makers expect you to be, which is a far cry from pure success. It's more a measure of how to depress expectations than it is a measure of playing well.

To take an extreme example, imagine a fantastically awesome team with an expected winning percentage of 100%. Naturally, they would get the first seed in their conference and go on to win the Super Bowl. PWA would rate them as dead average in the playoffs. Every skill or attribute they possessed would be deemed unimportant to playoff success. Epic. Fail.

Imagine the scenario you are describing, but with slight modification:

- you have an awesome team, with an expected winning percentage of 95%,
- also imagine that you have record of 10,000+ NFL seasons: I kid you not; it is year 12,014, and we have over 10,000 seasons of NFL, all uniform, i.e., 32 teams in current format, 12 playoff teams, current format playoffs, etc,
- now, imagine that during EVERY season, there is one awesome team, as described in the first bullet, with a naive model predicting 95% success for that team, every season,

Now, imagine a BETTER model, which predicts the outcome 100%, i.e., in those 5 percent of the seasons when the awesome team ACTUALLY DOES NOT WIN, our model CORRECTLY predicts that. Of course, our model also predicts the 95% when the awesome team wins.

Useless? Definitely not.
Lofty? Absolutely so.

The reason your example looks useless is when the variable is non-random, there are no correlations to be discovered.

You need a random variable, i.e., some outcome that has a non-trivial distribution, and the purpose of your model is VARIANCE REDUCTION.

I.e., you come up with one or more predictors, such that AFTER the predictors are applied, the unexplained portion of your dependent variable vanishes.

Generally speaking, a good bet for the team "most likely to overachieve" is the team with the lowest expectation of winning.

Look at it this way. The best you can do is win the Super Bowl, no matter who you are. If you're favored to go all the way, then you can hardly overachieve at all. But if you're expected to be one and done you have a huge upside possibility.

Defining the outcome metric as "overachieving" relative to win probabilities creates a ceiling effect. Doing it with Las Vegas lines has that ceiling effect plus a bunch of additional (potential) bias.

The opposite of the GW Bush term "the soft bigotry of low expectations." It's the harsh and unrealistic prejudice of high expectations. Or what I like to call, why 50% of internet moron comments call Peyton Manning a choker, a loser, a waste, etc. If there was the internet back in Marino's and Fouts's day.... They also forget that it's a team sport as well, but that's a discussion for another day.

Except that the DVOA favorite Seattle Seahawks are the 2nd or 3rd most likely to overachieve of all the teams remaining. They're already on the list of the greatest DVOA teams of all time, so I'm not sure what overachieving would mean for them. Go back in time and take away their oppnents' prior championships, I guess.

I agree with those who have suggested that these results seem to imply randomness more than anything else, but to me there seems to be an obvious explanation for that:
PWA is based on the comparison of actual results vs. Vegas predictions, so doesn't the randomness just suggest that Vegas is doing a good job? If there were a real relationship between some measurable factor (e.g. defense, special teams, etc.) and playoff success, then the Vegas oddsmakers could identify it and account for it in setting lines, spreads, or whatever the predicted wins here are based on. If they do a good job, then we should expect to see that measurable factors don't consistently predict consistent deviations from their expectations.

This question is a very interesting one and gets to the heart of DVOA analysis. I am going to paraphrase the question just to emphasize the key point. What does it mean to be successful in the playoffs. When you understand that you have determined what to measure. Do you want to measure actual wins and chances of winning the Super Bowl, or do you want to measure how well the teams performed? Those aren't the only estimates, of course, you could also measure scores or guts/stomps or any other metric.

The PSP and PWA are both answers to that question. The PSP answer to the question is "winning playoff games is success in the playoffs", with the slight modification that winning an "away" game is considered more of a success than a "home" game. The PWA answer is "beating the 'expectation' is success in the playoffs". As I read it, the intent of WA was to mix two estimates to determine that expectation, regular season DVOA (an estimate of the teams quality based on past performance) and the Las Vegas line (an estimate of the "conventional wisdom").

If you are really interested in determining Super Bowl winners, I think the PSP measurement is actually closer to the ideal. It is a closer to a measurement of actual winning. In fact, I would go further that direction an use a simple hierarchical weighting as an estimate, doubling the value at each level closer to winning the Super Bowl, where losers in the WC round get 1 point, losers in the divisional round get 2 points, losers in the conference round get 4 points, Super Bowl losers get 8 points, and Super Bowl winners get 16 points. This makes the simplifying assumption that the teams which lost out at lower levels had more wins needed to win the Super Bowl (and each win is a 50-50 shot), i.e. the Super Bowl loser only needed 1 win to be the Super Bowl winner, the Conference losers needed 2 wins to be the Super Bowl winner, the Divisional losers need 3 wins, and the Wild Card losers needed 4 wins. Statistics could be used to adjust the weights based on the probabilities, but we are running into small sample sizes if we do. Mike Harris's playoff prediction charts could be a stand in for a tighter probability model. Here I am assuming Mike Harris has done some analysis that suggests to him that the simulation he runs approximates the teams chance of winning each game.

Alternately, we can use DVOA itself to determine playoff success. As I recall, the DVOA system is tweaked regularly to try to improve its self-correlation (my stat background is too far past to remember whether this is the term auto-correlation means, so I won't use that term), so that DVOA after n-games is a reasonable prediction of DVOA after n+1 games. The measurements that don't tend to self-correlate, e.g. fumble luck, get excluded from DVOA, or at least certain variations on it. DVOA also attempts to incorporate a model of what winning play looks like, e.g. achieving 60% of the 1st down distance on the 1st play is considered successful. Thus, DVOA not only self-correlates, but also predicts an idealized model where the successful team should win, and thus should correlate with actual victories, albeit not accounting for "fluke" factors that change a particular game but are not predictive of future game successes. If we ignore the Las Vegas line part of the PWA estimate. PWA seems close to that model. Although as Brian Anderson mentioned, you don't need to beat the expectation to be successful, just replicate it, or at least if it was replicated, DVOA would be the best estimate of playoff success. That is a teams DVOA at the end of the playoffs should be as close as possible to its preceeding games DVOA (as an estimate of the teams strength). The only real question is whether the various measurements in DVOA are capable of predicting future success, especially for that same metric.

If the model of the DVOA metric is good, the Pythagorean win prediction [in the playoffs] should approximate the actual playoff performance. Because of the elimination nature of the playoffs, the Pythagorean model may need to be adjusted, perhaps it computes the logarithm of the actual wins, but it should be fit-able. The small number of data points problem is present, but could be ameliorated as RickD mentioned by formulating the Pythagorean model on a randomly selected subset of the playoff years and testing on the remaining years.

I have to agree with 6, 18, 43, and others that there are a lot of problems with this methodology for the stated purposes (to wit, to test if the "predictors of regular season success are different from predictors of postseason success").

I see massive confounding variables throughout the study, in particular the Vegas line. Any postseason "secret sauce" variables the line takes into account will already be factored into the expected wins and thus not show up in the PWA. Furthermore the Vegas line is dynamic and NOT independent of past results! If Vegas came to the same conclusions as Barnwell in 2005 and then adjusted its future lines it'd be no suprise if those variables suddenly stopped overperforming!

Let's take a somewhat silly example: Tom Brady. Before the 2001 and 2003 playoffs, Vegas sets its lines assuming Brady is a typical playoff QB before Brady overperforms and wins both Super Bowls. Using this methodology he'd show up as part of the "secret sauce" responsible for ~3.5 PWA. Vegas responds by adjusting its lines before the 2004 playoffs with a "Brady" variable. Brady still wins the SB in 2004, but because his expected wins were higher after 2001/2003 his PWA is much lower. Nothing about the Brady or the "secret sauce" changed but, solely because Vegas changed, PWA no longer identifies Brady as part of the sauce!

In addition, the sample of studied teams is not independent either - by defintion an overperforming team is overrepresented in each sample as they appear in multiple games. This is obvious by looking at the 12 most overperforming and underperforming teams as the 12 most overperforming team add 24.3 PWA while the 12 underperfoming teams only have -9.1 PWA. While it's important to know works in the playoffs it should be equally important to know what does NOT work. In the end we end up nearly triple counting the best teams compared to the worst (24.3/9.1 = 2.7). If some of these overperforming teams won not based on some "secret sauce" but on pure randomness we would never be able to tell with these small samples, yet they would dominate the results.

For a sanity check, I'd be curious to see this same methodology applied to the 2013 season broken into thirds and/or the final 5 weeks of three consecutive seasons (if you're worried about weather) which both have roughly equivalent game samples as the article. Then we can compare if the accepted season long predictors hold up under similarly small sample sizes.

I wonder if you are doing the calculation for expected wins correctly. I feel like there should be some correction for different sample sizes or that the expected number of wins should include some amount of later round wins that were thrown away. For example, 1996 Denver would have been favored in the AFC conference game and would have had a chance in the Super Bowl, but the calculation of expected wins throws that away.

This may be more relevant if you want a stronger metric for which teams underachieved in the playoffs, as opposed to a method for measuring what works in the playoffs.

Agree that this post set its own bar well too high. instead of trying to find if any one of a zillion factors (punt returns, etc) was predictive in tiny sample sizes (why chop up the analysis into 3 arbitrary time slices??) I think you could find some very interesting things by simplifying. For instance:

1) use the entire playoff set as one data set. 1990-2012. not 3. that is just creating noise for no reason
2) focus on just the big predictors. Offense. Pass Offense. Run Offense. Defense. Pass Defense. Run Defense. Special Teams. Just those 7
3) Now, what do you find?
4) Compare these winning %'s vs. regular season winning %'s, adjusting for HFA.

In other words, just try to compare which major factors are most predictive of regular season wins, then compare that to playoff wins. my guess is you can't find anything, but that is a better exercise than trying to find if punt returns were more correlated with playoffs wins in bucket 1 than bucket 2

As I was reading through this, the negative influence of the highly-ranked passing offense stuck out to me. I would think running the ball, particularly situational might end up being a factor.

Forgive me for not recalling the term, but the "4-minute drill" is where you need to run out the clock by getting some first downs, making the other team use their time outs and then run out the clock or score the winning TD/FG, right? Someone was talking about Reid and the reliance on passing, even in these end-game conditions, was a huge drawback.

I realize I am walking dangerously close to "run the ball to win" but maybe that's what comes out.

There seems to me to be a very simple solution to improve the methodology of this article. Instead of comparing underachieving top seeds to overachieving bottom seeds, which rewards teams for sucking in the regular season and punishes them for playing well, we should be comparing teams against similar teams.

That will be able to show us which top seeds succeed or fail, relative to other top seeds, and the same for the other tiers. It solves the problem of saying the 2013 Broncos are most likely to underperform simply because they have the highest expectations. Instead, we can talk about the 2013 Broncos compared to their peers. Alternatively, if "secret sauce" is really about picking the candidates from the WC teams who are most dangerous to make a deep run, the factors that cause top teams to fail are pretty irrelevant, but what distinguishes a WC team who makes a run from a WC team which is 1-and-done?

I share many of the more layman-accessible concerns about methodology, and am quite prepared to believe that many or all of the ones which are going above my head are also well-founded.

But, let's suppose for a minute that despite these problems we are seeing a real effect. What would be the mechanism? Why would strong pass offenses have become, over the last 8 years or thereabouts, an indicator of playoff underachievement?

Well, we know that over that time frame pass offense - and especially the good pass offenses - has got more prolific, more efficient, more leaned-upon, more dependent on short passes, more likely to be run out of the shotgun and more based around the spread. I think this is both a direct consequence of the 2004 changes in illegal contact rules and (perhaps more importantly) a result of the tactical innovations those changes triggered.

If that's the important thing that's changed, then there must be something about playoff football which responds to those changes differently to regular-season football. The best candidates I can think of are quality of opposition (the average playoff opponent is significantly stronger than the average regular season opponent - perhaps playoff teams are generally good enough to make a higher proportion of games into near-random last-possession-wins shootouts in an era of pass offense dominance, or perhaps good defenses are less susceptible to this style of offense) and weather (a higher proportion of playoff games are played in bad and - especially - cold conditions, which might reduce the impact of passing).

It seems to me like it ought to be fairly possible to test both these hypotheses. We can split playoff games into good and bad weather games: if the correlations are being driven by bad weather games, that would suggest that this was the mechanism. And we can run a similar analysis for regular-season games between teams that wound up making the playoffs, which should do a good job of revealing whether team quality is the issue here.

Of course, it is very possible (even likely) that the null hypothesis is correct. But these seem like fairly straightforward things that could be done to check.

Unless myself and quite a few other commenters are totally wrong, there's not really any basis for attaching explanation to that negative correlation between offensive DVOA split and "overachieving". The observed negative correlation is almost certain baked into the very definition of "overachieving".

My own reading of the situation is that it would be virtually impossible impossibility for those numbers to ever be other than moderate negative correlations, regardless of what is or isn't different about playoff football and regular season football.

It akin to running the clocks forward for Daylight Savings Time and observing that it not only stays light one hour "later" in the evenings but it also stays dark one hour "later" in the mornings. You don't need an elaborate speculative theory about how the earth's orbit may be changing overnight...the entire effect is due to setting the clock forward.

Perhaps I'm being obtuse, but if it were virtually impossible for those numbers to ever be other than moderate negative correlations, why was the correlation in the opposite direction 20 years ago, and neutral 10?

I completely see how it might well be random noise, but I don't quite follow the idea that it's baked into the definition.

Ok, one of us is definitely missing something. I'm talking about the findings displayed in the big table with correlations for a whole bunch of stats in different mini-eras, with some boxes shaded red or green. That table covers all three periods with the same methodology, and PWA as an outcome variable. It shows those pass offense correlations as having changed (inverted, actually) over time.

Hi everyone. This is actually the start of a series of FO 10th anniversary columns which will seek to update old FO analysis, so the basic idea here was simply to update Barnwell's original analysis in a similar fashion, using more data and improving it a little bit. Obviously, this is not something that I would ever submit to a peer-reviewed journal. I do appreciate the constructive suggestions for more strenuous statistical methodology being made in the comments. In fact, most of these improvements are things I originally considered making a part of this article; in the end, I decided to leave these more complex ideas for future expansion upon this research, perhaps in FOA 2014. That includes such issues as making Bonferroni corrections to account for the number of correlations, or dealing with the multicollinearity in the factors I listed in the 2013 section at the end. But the goal of this piece was not to be 100 percent statistically rigorous; it was really to post all the raw correlations out there for public consumption, and start a discussion, much like Barnwell's article and the Nate Silver Baseball Prospectus article that inspired it.

The largest problem you have is the metric you are using. This is correlating factors with vegas expectations, not playoff success. It doesn't matter how fancy your statistical approach is if the metric you are building the model is against is flawed.

In that regard, this is _worse_ than the original. The sample size improved, but this was not taken into account, and the metric got worse.

Don't get me wrong, I really found it interesting and enjoyed it.... but did not find it enlightening or very meaningful other than as a description of over-achievers and under-achievers from different eras.

While I agree with many of the "this isn't showing me what I thought it would show me" (I wanted it to tell me how to take my excellent 14-2 team that fails in the playoffs and turn it into a team, no matter the record, that wins the Super Bowl - or at least head that direction), I just wanted to say that I really like saying "pwa". Almost as much as "pwomp".

The problem with "regular season" high-powered offense is that it is too dependent on officiating. Differences on how a single crew officiates PI and illegal contact can allow high scoring or low scoring contests. It all depends on what the DBs are allowed to do and I don't believe this is measurable.

You have to convince me that officiating assignments don't look like a random distribution, or at least that particular teams get crews that call things to one extreme or the other. Otherwise, I'll continue to believe that it all balances out over the course of a season.

I think the implication is that in the play-offs, refs tend to "swallow their whistles" and not call the games as tightly. According to SI, last week was a definite case of the total penalties being way down compared to the regular season.

An attempt to succinctly (and constructively) summarize the main problem with this analysis (as stated in various ways by others):
If a variable (e.g. offense DVOA) is the best predictor of success over 16 games, then it is highly unlikely to be an even better predictor of success (compared to the other variables) over 1-3 games. It is the very definition of small sample size. It still might be worthwhile to look for a special sauce for playoff success, but it should be no surprise that the best-correlated variables over a large sample size are less highly correlated over a small sample size. Furthermore, the only way to find a special sauce is to look for general trends over pooled sample sets, not to slice and dice the variables and samples, as presented here.

Wouldn't a better thing to measure the correlation of the different stats against be simply wins? And compare that correlation to the correlation against wins in the regular season? Because isn't that what you're actually looking to find out? Whether certain attributes are more useful in the playoffs than the regular season?

I realise that would include a hell of a lot of work (as I imagine you'd need to compile single game data for every single game played between 1990 and now in every variable measured).

The use of the vegas odds likelihood of victory thingy seems like an unnecessary overcomplication, and possibly a case of measuring something other than what you were setting out to measure.

"...[H]aving a good offense -- especially a good pass offense -- seems to be a recipe for playoff failure..."

To be clear, your study found that Vegas lines overvalue "a good offense -- especially a good pass offense." It did not find that "having a good offense -- especially a good pass offense -- seems to be a recipe for playoff failure." You are equating exceeding Vegas' expectations with success and failing to meet Vegas' expectations with failure. That is invalid. The entire article is very confusing because the analysis seems to be ignorant of the data collection methodology.

Step 1: Win the regular season, get first round bye. Best way to do that is have a top offense.
Step 2: 'Underachieve' -- which is simply in this case to fail to live up to expectations. As the top seed, you underachieve even if you do as well as all of the other teams.

All this is saying is that recently top seeded teams have been less successful.

I don't want to know what variables lead to "overachieving" or "underachieving" in the playoffs. I want to know what variables lead to SUCCESS. Success in the playoffs is not relative to expectations, it is absolute.

Haven't had the necessary amount of coffee yet to comprehend the entire article, but I'm wondering whether it's possible to use this data to show how individual teams can cancel out each others' strengths when the teams go head-to-head in the postseason.

Shame, shame to cite Greg Easterbrook, but I got this idea from him in the year of the Colts/Bears Super Bowl. Easterbrook suggested that since the Colts had a great offense and the Bears a great defense, the game would be decided by what the Bears offense could do against the Colts defense.
In other words, the teams' greatest strengths would cancel each other out and the game would be decided by the relative strengths of the remaining factors.

The data above, which appears to show that a great, reliable passing game is non-predictive of playoff victory, could be logical if that hypothesis is proved correct.
Obviously, a great passing offense wins games in the regular season, where any team has a 50/50 chance of facing a below-average pass defense. However, in the postseason, one can expect a higher-quality opponent.

Pass defense is hardly mentioned in the color-coded table above, but where it is shown, the numbers are very close to zero. Could this suggest a neutralizing effect? Is neutral good enough?
And, if pass-offense is neutralized, could victory be the result of the "non-predictive" other qualities shown in the later table? Perhaps those small, non-predictive numbers are actually showing the slim margin of victory.

Interesting stuff. Haven't read ALL the comments, but I would urge people to not read too much into it. It simply shows that teams with bad offenses and/or cold offenses do better than expected in the playoffs. Not particularly earth shattering - don't build this up into a magical formula.

Second point I'd make is that it's possible that NFL teams themselves have too much data by playoff time and overadjust, leading to random looking results eg. last weeks Saints vs Eagles. Data said that Eagles strength on D was run D, Saints weakness on O was Run O. Yet what happened? Saints running game was the difference. Probably because the Eagles didn't prepare for it as much. Thomas was out, they weren't scared of Ingram etc. This would probably happen with a lot of playoff matchups, where a teams weakness is underestimated or a strength overestimated, based on previous matchups or regular season data.

I know you're just joking, but this came up last year, too. With a little digging you can find that during those seven (not eight, I think) years...

a) No NFC team got a bye through doing better against the AFC East than they did against the NFC.
b) No NFC team got into the playoffs through doing better against the AFC East than they did against the NFC.
c) NFC playoff teams that played the AFC East during the regular season have a record in excess of .750 in the playoffs.

It's all much too small a sample to do anything with. But if you must insist, the conclusion to draw is that playing the AFC East makes you stronger in the playoffs.

However, you may have stumbled across something greater. Perhaps it's the team with greater machismo that has the advantage or the team that has out-machismo(d) other teams? If that's the case if we measured machismo and out-machismo(ism) in the remaining teams perhaps we can come up with a favourite. At the moment my money would be on Seattle, but I think this is a weekend project for someone!

It kind of makes sense that special teams is correlating well because:

1) Special teams can be an indicator of team depth. The starters play on offense and defense; the back-ups play on special teams.
2) Kicker and punter are the two least opponent-influenced positions on the field, and therefore two of the most consistent.
3) IF (BIG IF) a team has figured out an edge on special teams, it can be a huge factor in games. In most games, teams don't have an edge on special teams, and so you won't see it in massive sample sizes. But when special teams gets outside of the normal range of performance, it can have a massive impact on an individual game. If this can be done predictably, it will have an impact (similar to have a home run hitter in baseball). In addition, there is just a diversity of special teams edges (6'10" player consistently blocking kicks, punt returner with unusual acceleration and ability to hit seams, effective gunner on punts are a few I have seen) that it's really hard to tease out one that matters and rely on it to be predictable; you almost have to be breaking down game film on a specific team all season.
4) Special teams is sometimes an indicator of team morale. I used to not buy into this theory and I don't think it's absolute, but when a team quits on it's coach and is playing just to get through the game without getting hurt, I think it shows up on special teams first.

When one of these are reliable influencing a team, I feel that special teams actually becomes more predictable than offense or defense.

"You might have read that and yelled, 'But at least five of those involve things we know to be nonpredictive in general!' If so, you're right... in general. Some of these splits are kind of random, but we're talking about the playoffs, where randomness abounds. In small sample applications, I don't mind including things that tend to only work in a small sample."

I don't think this is quite the moment of cognitive dissonance that you make it out to be. The important thing to remember is that playoffs are not a representative sample, so we shouldn't expect the results to mirror the population at large.

I'd be interested to see if a similar analysis of matchups of "good teams" (say, top-10 DVOA?) in general would produce a similar result.

If you ranked every playoff team just by Jeff Sagarin's strength-of-schedule rankings, the team that played a harder schedule is 36-12 straight-up and 36-12 against the spread since 2002. When you narrow those numbers to teams that finished at least 10 SOS spots higher than their opponent, they improve to a staggering 23-3 straight-up and 22-3-1 against the spread (including New Orleans, Indy and San Francisco last weekend).