FO Mailbag: DVOA Without Fourth Quarter

OK, it's time for a quiz.

Each of these teams sees its offensive DVOA drop in the fourth quarter: Green Bay, Houston, New England, and Pittsburgh. Can you put them in order from the team with the biggest drop to the team with the smallest drop?

While you work on that, here's the mailbag question from today's DVOA discussion thread.

Paul M.: If there was a "First Three Quarters" DVOA, I would think the Packers would dominate and begin to show they are more of a historically dominant team than they are currently credited for, but then again that would discount some teams (any one in particular that comes to mind??? Hmmmm..... maybe they play in the Mountain Time Zone??) and their ability to rally late.

Nat: Aaron, could you publish the rankings/numbers for "First Three Quarters" DVOA? ... In theory, this all-but-late DVOA should avoid the prevent-defense, garbage time, hail-mary, shut-the-offense down, play-the-backups issues -- while still being a large enough sample to characterize each team pretty well. Pretty please?

OK, so, first let's answer the quiz question above. The answer, in order from biggest drop to smallest, goes: Pittsburgh, Houston, New England, Green Bay.

Didn't expect that, I bet?

An idea that came up in the DVOA discussion thread today is that the Packers take their foot off the gas in the fourth quarter, and that's the biggest reason they don't have a historically dominant DVOA that compares with teams like the 2007 Patriots and 1998 Broncos. Well, if DVOA is to believed, this is simply not true. In general, compared to this year's other top offenses, the Packers don't drop off much in the fourth quarter. This week against the Raiders was a dramatic exception, with the Packers putting up 33.4% DVOA in the first three quarters and then -155.4% DVOA in the fourth quarter (on only eight plays, compared to 53 plays in the first three quarters).

Actually, the offense which drops off the most in the fourth quarter is Miami, which is slightly above average for three quarters and then the worst offense in the league in the fourth quarter. Apparently, the Dolphins take their foot off the gas even when they are losing the race. And of course, we know which team improves the most in the fourth quarter.

TEAM

Q1-3 OFF

RK

Q4 OFF

RK

DIF

MIA

7.1%

14

-38.4%

32

-45.5%

PIT

28.8%

4

-16.0%

27

-44.8%

HOU

25.7%

5

-4.2%

24

-29.9%

CHI

-2.3%

21

-25.5%

29

-23.2%

NE

41.0%

1

19.1%

8

-21.9%

SD

19.7%

7

-0.7%

19

-20.5%

STL

-22.1%

31

-36.9%

31

-14.9%

CIN

9.0%

10

-3.4%

23

-12.4%

KC

-16.0%

29

-26.6%

30

-10.6%

GB

37.2%

2

26.9%

3

-10.4%

BAL

10.8%

8

4.5%

14

-6.3%

BUF

6.6%

15

0.6%

17

-6.0%

OAK

2.8%

18

-1.7%

20

-4.5%

DET

7.2%

13

3.3%

16

-3.8%

ATL

10.1%

9

7.0%

13

-3.1%

WAS

-8.5%

24

-10.4%

26

-1.8%

TEAM

Q1-3 OFF

RK

Q4 OFF

RK

DIF

SF

-2.0%

20

-3.2%

22

-1.2%

CAR

19.8%

6

19.6%

7

-0.2%

MIN

-2.5%

22

-1.8%

21

0.7%

JAC

-22.1%

32

-19.0%

28

3.1%

PHI

6.3%

16

9.6%

11

3.3%

NO

32.4%

3

37.2%

1

4.9%

IND

-15.7%

28

-9.9%

25

5.8%

CLE

-9.9%

25

-0.1%

18

9.8%

DAL

8.8%

12

19.8%

6

11.0%

TEN

4.6%

17

18.9%

9

14.3%

TB

-8.3%

23

7.6%

12

15.9%

NYG

8.9%

11

30.2%

2

21.3%

NYJ

0.7%

19

23.5%

4

22.8%

ARI

-19.9%

30

3.8%

15

23.7%

SEA

-12.6%

26

13.7%

10

26.2%

DEN

-12.9%

27

22.8%

5

35.6%

Actually, Green Bay seems to take its foot off the gas more on defense; its defense would rank 17th if we didn't include the fourth quarter. But San Francisco and New England see their defensive DVOA ratings decline even more in the fourth quarter than Green Bay's.

TEAM

Q1-3 DEF

RK

Q4 DEF

RK

DIF

CAR

9.1%

24

42.7%

32

33.6%

SF

-20.9%

2

9.9%

20

30.8%

PHI

-3.0%

10

23.8%

29

26.8%

NE

8.0%

21

32.9%

31

24.9%

NYG

4.4%

16

28.0%

30

23.6%

MIA

-5.6%

8

15.8%

23

21.4%

SD

8.6%

22

21.8%

28

13.2%

GB

5.7%

17

17.7%

25

12.1%

MIN

8.8%

23

20.2%

26

11.3%

DET

-10.1%

5

-1.9%

13

8.3%

WAS

-0.6%

12

7.2%

17

7.8%

ARI

6.7%

18

14.2%

22

7.5%

BUF

13.1%

29

20.4%

27

7.2%

DAL

1.6%

13

8.8%

19

7.1%

HOU

-10.0%

6

-3.8%

11

6.2%

BAL

-22.2%

1

-18.0%

2

4.2%

TEAM

Q1-3 DEF

RK

Q4 DEF

RK

DIF

TB

13.0%

28

16.1%

24

3.1%

NYJ

-12.9%

3

-10.5%

8

2.5%

OAK

7.0%

19

7.5%

18

0.5%

JAC

-10.6%

4

-11.6%

7

-1.0%

TEN

2.4%

15

-0.4%

15

-2.8%

NO

17.6%

31

12.7%

21

-4.9%

STL

11.4%

27

5.0%

16

-6.4%

DEN

7.1%

20

-1.2%

14

-8.4%

CHI

-7.1%

7

-16.9%

4

-9.8%

SEA

2.2%

14

-10.5%

9

-12.7%

PIT

-0.8%

11

-17.6%

3

-16.8%

CLE

13.8%

30

-4.8%

10

-18.6%

ATL

-3.1%

9

-22.6%

1

-19.5%

KC

10.7%

26

-13.9%

6

-24.6%

CIN

10.5%

25

-15.9%

5

-26.4%

IND

25.1%

32

-2.1%

12

-27.1%

Here is what the overall ratings would look like if we just included the first three quarters -- except in special teams, where frankly I'm too lazy right now to go do a whole new set of "first three quarters" special teams ratings.

I was wondering about this too since the Week 12 DVOA article, noted an improvement in the run offense after switching from BT to AT. What I really wanted to see, though was their 1-3/4 Defensive DVOA split.

Excluding special teams, Denver appears to improve in the fourth quarter (+44.0%) just as much as New England tanks (-46.8%). This could be a fun game to watch.

This Bears fan's anecdotal impression is that the Tebowfence wears teams out really badly (well it did the Bears but part of that has to be a function of the Hanie and Co not moving the ball at all). Constantly having to work to maintain gap integrity on every single down and then chase Tebow around the field on third downs - and then the bugger runs over a crowd to pick up another third down - seems to really take it out of a team's legs. The variety in the running game combined with the spread looks and Tebow constantly trying to roll out to his left means that on every play defenders have extra responsibilities; extra stuff to counter against means more effort expended to ensure you are responding to the correct play and haven't missed your keys and gotten lost. Playing at Mile High can't hurt either (and not because it brings Tebow that bit closer to God).

My response was to a Broncos fan that thinks the reason for the Broncos doing better in the fourth quarter last week was down to the use of the spread. My point is that if Denver had played the spread all game I don't think they would have scored any more points. The issue was fatigue, I have watched pretty much every snap of Urlacher's career and rarely seen him looking so gassed.

Who was in the NYT's Fifth Down blog who showed that, adjusting for score and time remaining, GB was the pass-happiest team in the league? The data was only for the first half of the season, but he certainly didn't find any conservative late game pattern.

I came in to post a similar sentiment. It may just be me (or just short-term memory), but it seems like this year stands out in FO's overall willingness to come out and address these types of complaints when they're voiced over several weeks and from several ... vocal commenters. I also appreciate this effort, and have found these posts very enlightening. Thanks to all behind the curtain.

That the 49ers decline in 4th quarter defence doesn't surprise me, they don't substitute at all on defence, partly because they don't have great depth and partly because the starting ends move to tackle to pretty good effect. I would also point out that the one pass rusher that they do substitute in, Aldon Smith, produces at a pretty high level. Increasing their defensive depth should be an offseason priority.

I tend to trust this more than full game DVOA as a measure of the teams' strength. But for some of these teams, the difference between the fourth quarter and the rest of the game is the story of the year.

As bad as the Patriots defense is, their prevent defense is far worse. Denver really does take off the training wheels in the fourth quarter.

A game doesn't have to be out of hand. It just needs a large enough margin that teams no longer play for what DVOA thinks they play for - the best next score (on average) regardless of time consumed.

I wouldn't suggest cutting the fourth quarter out of DYAR - the plays really happen after all. But DVOA is an average of a measurement with known issues in the fourth quarter. Why not use a sample that doesn't suffer from those issues? Do you think teams are trying to play badly in the first three quarters? Do you think that the rules of the game change?

By the way, drives with two score leads or deficits are very common in the fourth quarter. But these are precisely the drives where coaches are tempted to alter their schemes to save or burn time. It varies from coach to coach. Some coaches go full-prevent way too early. Others wait too long. On the other side of the ball, some teams get desperate early and others aren't desperate soon enough. But it's only the fourth quarter that has this problem in a meaningful way.

If those scenarios are so common, and if coaches are so tempted to alter their schemes, wouldn't you expect the 4th quarter to have somewhat different outcomes than the others and wouldn't that imply that it's important to consider? Frequent and different seems like a bad reason to discount a situation.

True. But that argues for tracking two separate DVOAs: most-of-the-game and fourth-quarter.

But here's the rub. We don't know if fourth quarter DVOA even works. There are a number of known problems with DVOA that are either specific to the fourth quarter or much worse in the fourth quarter.

(1) DVOA's success formula doesn't model correct fourth quarter strategic goals in many games
(2) The baselines for different down-distance-time-margin situations suffer from selection bias. (that is, teams that are desperate are more likely to be bad - polluting the baseline used for comparison)
(3) Fourth quarter plays often include scrubs put in to "try out" for a higher spot in the depth chart
(4) Fourth quarter plays include many meaningless plays, where neither team has incentive to do more than just practice something, much like preseason games

To simplify and overstate the case about the fourth quarter:
DVOA values the wrong things, compares to the wrong mix of teams, both grades and compares to the wrong players, and mixes in meaningless noise.

It doesn't do this all the time, because many fourth quarters are competitive. But it's often enough to make fourth quarters look as different from the first three as one season looks compared to the next.

Aaron has mentioned a number of times that he ran the numbers without the garbage time possessions, and the accuracy got worse. Certainly there can be improvements - he frequently admits to this as well - but it's not like the difference in the way the game is played late is being ignored.

Sure. Eliminating just garbage time by whatever criteria introduces more selection bias, and makes things worse. That's why it's better to just look at the quarters. That way you eliminate the known issues while maintaining an unbiased sample of plays.

The tests that Aaron needs to do are these:

(1) To what extent is DVOA in each quarter "predictive" of other quarters?
(2) To what extent is DVOA in each quarter predictive of DVOA in the same quarter in the next season?

I suspect that quarters 1-3 are predictive of each other pretty well, and reasonably predictive of their next seasons, but that fourth quarter DVOA is not predictive of other quarters, and not as good at predicting its next season.

I believe this because while the first three quarters DVOA is determined primarily by your team's skill, fourth quarter DVOA is dictated by both skill and the mix of game situations you face. I think the mix of fourth quarter situations is determined largely by factors that have nothing to do with skill (luck, small sample size, opponent strength to name three).

Well, thanks Aaron. I appreciate the digging. I turned out to be half-right-- Packers are clearly a better 1-3 Quarter team and more dominant viewed that way, but they're not alone. So I am duly chastised. And as for the Broncos, Holy Crunch Time, Batman!! And the Giants get all silly at both ends of the field, don't they??

Eli being insane in the 4th quarter this year is largely due to the fact that the defense usually collapsing at the same time.

I think this is the year that Giants/Eli most closely resemble the past couple years of Peyton/Colts. Both have shitty defenses and no running game. QB and WR play getting it done almost entirely by themselves.

Do people just not remember the 2009 season? Before rest-a-palooza, the Colts had the #9 DVOA defense, and #2 scoring defense in the league. Sure, in 2010 the defense was quite bad, but you have to go back to 2004 for the defense to be that bad again.

I think part of it is that the Colts became so infamous for pulling starters in the last two weeks and losing games to inferior competition that everyone started discounting them as a potential 16-0 team.

The Manning Colts could lose week 17 games to the Little Sisters of the Blind.
The Painter Colts could lose that game in weeks 1-16, too.

Maybe it's the lack of depth on defense that's killing the Packers (I think the Pats have the same problem). Injuries mean the d-line rotation is short; the last game was played with two middle linebackers with a total of one full and two partial seasons between them; Nick Collins means Charlie Peprah is playing all the time, and Pat Lee is missing, too; Matthews and Woodson don't practice a lot because they're dinged up.

Well, based on this, the Saints & Packers are going to have a 4Q shootout if they meet in the NFCCG. If the Saints can keep it close in the 1st 3 Q's, their D gets better in the 4th, & GB's goes down--enough that they pass each other in the rankings.

What would really be enlightening is separating out garbage time vs. non-garbage time in the 4th quarter. This is the big question concerning the Packers because they have had more garbage time than probably every other team. Yet they have had some close 4th quarters as well, so this data doesn't show if the offense drops off uniformly across all 4th quarters, or if it drops off a lot in blowouts and is offset by strong play in the fewer close games.

You already know the Packers are undefeated-- that means they've done well (enough) in the 4th quarter of all games close at the end. I'm not sure there's much more to learn.

Including selected 4th quarter plays would certainly boost the Packers' DVOA; but it won't give them a dominant first place DVOA ranking, if that's what you're looking for, since the Q1-3 DVOA puts them right in the midst of BAL, HOU, NE, and PIT.

Once again, 4th Quarter DVOA is worse at 'predicting' performance. In fact, it's essentially useless as a predictor. (3Quarter Def DVOA isn't that great a predictor either)

What's it mean? Either fourth quarter football requires a separate kind of football skill, or something other than football skill is being measured by fourth quarter DVOA. Either way, they should be reported and judged separately.

I would suggest fatigue on defense. Defensive coordinators seem to realise (and have been paraphrased as having said so) that they can only expect their defensive players to last 55 snaps or so (their number not mine - although FO reader Kal studied the Oregon offense and found the 55 number there as well so maybe there is something to it). It might be that some schemes suffer more than others either in terms of wearing down faster or the scheme not working as players slow down.

I would love to see if either offensive or defensive DVOA vary with the number of snaps played.

Every single play run in the NFL gets a "success value" based on this system, and then that number gets compared to the average success values of plays in similar situations for all players, adjusted for a number of variables. These include down and distance, field location, time remaining in game, and current scoring lead or deficit.

DVOA certainly does know how much time is remaining. There is room for improvement, but a lot of people on this thread don't seem to understand that the system does not think 5 yards on 1st and 10 from the 20 on the first play of the game is the same as 5 yards on 1st and 10 from the 20 with 2 minutes to go and down 9.

DVOA uses time remaining only to establish its buckets of "similar plays". This does nothing to fix the underlying issue that DVOA can mistake bad plays for good plays when the clock becomes a major factor. (See Barber, Marion: brain-fart thereof)

The use of time remaining in defining the buckets of similar plays also makes the selection bias in the average success values even worse, which biases both the success over average for each individual play and the percentage calculation used for VOA and DVOA.

If you didn't understand those two facts, count yourself among the "lot of people" who don't understand the implications of time remaining in DVOA.

It's still just comparing teams to average. Sure, with that little time remaining the average gain is probably longer, but DVOA still doesn't account for ending the play in or out of bounds. It doesn't know that an incomplete pass is better than pass for a short gain. There are probably some other things too.

You've just shown why the success value used by DVOA is a problem in so many fourth quarters. Teams run different plays when clock becomes an issue because they have different goals when clock becomes an issue. Success value and thus DVOA doesn't know that, and thus mis-characterizes plays as good or bad or in between. In these situations, "success value" measures the wrong thing. No amount of adjustment will fix that problem. The necessary data is already lost.

Comparing to an average play result does not address this problem. It just adds a second problem: the "average" for such plays is heavily biased in terms of the quality of the teams in each bucket. Bad teams get in more bad situations; teams with leads are more likely to be good teams. That bias pollutes both the "success value over average" and the "total baseline" used to calculate VOA and DVOA.

In theory, these problems happen in every play of the game. In practice, they are a big problem in the fourth quarter, and a smaller one at other times... so much so that 3-quarter DVOA is a good predictor of future success, while 4th quarter DVOA is a bad predictor, and in the case of defensive DVOA, no predictor at all.

Correlations of 2010 DVOAs to 2011 DVOAs (so far) using all four quarters....

Offense: 0.47
Defense: 0.08

So if you want to project next year's DVOAs, you're better off working from this year's three-quarters DVOA rather than you are using the DVOA from complete games.

Yet more evidence that you should be using DVOA for three quarters to project a team's strength. The fourth quarter seems to have issues that make it much less predictive. (It would be worth checking other years.)

What do we as consumers and fans of DVOA do this season? And what should Aaron and the rest of FO do to improve DVOA now or during the off-season?

Aaron is right to resist making mid-season changes to DVOA. DVOA is what it is for a season. It takes time and focus to propose, prototype, and test DVOA changes; time that Aaron does not have until February. Quick fixes are likely to cause more problems than they solve. But there are things which can be usefully done with little risk.

For DVOA, it might be nice to repeat this 3Quarters/4thQuarter DVOA report, perhaps as a lead-in to the playoffs.

For DYAR, it would be great if unusually divergent fourth quarters could be called out in the comments, as already happens sometimes.

For us fans, we just need to be aware that there is some noise in DVOA and DYAR due to the wonky fourth quarters. Some of that wonkiness is real differences on the field, and some is probably due to soft spots in DVOA that affect the fourth quarter more than other parts of the game.

Off-season, there's work to do. I have no idea how Aaron could include clock management in the DVOA success formula. But the issues regarding the baselines can be fixed.

(1) Correct for selection bias in the baselines, so the "A" in DVOA approximates an average team playing against an average team.
(2) Use a stable denominator when calculating VOA as a percentage, rather than summing the baselines for all plays in the game. That way each play's value over average would depend only on itself, and not be measured on a scale that varies by the mix of other situations in the game.

And if Aaron can't come up with a fix to the success formula for clock management time...
(3) Make fourth quarter DVOA and DYAR a regular part of the charts, so fans can judge for themselves whether garbage time plays or baselines are skewing the results.

You seem to have several clear problems with your analysis. First, I have no idea why you compared DVOA for just the 4th quarter to DVOA for the first 3 quarters. I would imagine that DVOA for a randomly selected 3 quarters would correlate better than any individually selected quarter. Second, it is not clear to me why you are willing to make conclusions based on one season's worth of correlation data. Finally, it seems to me that the difference between all 4 quarter DVOA and first 3 quarter DVOA in correlation is so trivial (.04) as to be not particularly meaningful. I will also add that in order to test your thesis you'd also have to look at other 3 quarter selected periods and check their correlation.

I used three quarters vs one quarter because that's what Aaron posted.

If all four quarters were equally predictive measures of football skill, I doubt the fourth quarter correlation would have been so much lower. But, yes, it would be easier to follow if I had checked each quarter separately. I don't have that data.

I did one season because that's what Aaron posted. I'd love to have Aaron run the same check in other years. But truly, there are enough other reasons to distrust fourth quarter DVOA, I wasn't at all surprised by the result.

Finally, if I mix one quarter of poorly correlated data with three quarters of better correlated data, its no surprise that the final correlation is between the two, and closer to the three quarters portion of the data. If I mix a teaspoon of manure with two liters of Coca Cola, it's still mostly Coke. But that doesn't make it a good idea.

Hold your horses there. You've shown that for two seasons 3Q DVOA correlates about as well as full game DVOA (the difference is negligeable) and that 4thQ DVOA correlates less well. This is all interesting and good data, but it's not enough to go into wholesale change mode.

First, I'd expect one quarter to correlate less well. TO me that result isn't the interesting one. It's the 3Q vs full game that's interesting. But the numbers are so close that it doesn't seem to matter. How would looking at more season affect the correlation. Most importantly, while good to know, DVOA isn't attempting to correlate current DVOA against previous season DVOA. Teams do get better or worse. DVOA is attempting to correlate to winning football games.

I was just being brief. Elsewhere I included the usual "do more study" caveats. Please don't give me a hard time because I don't put them in every single post.

One quarter should correlate less well, I agree. But by this much? I doubt it. The good check to do is to see how the different quarters correlate with each other during a season. I'd bet the fourth quarter is the odd man out.

And lastly, you are wrong about DVOA.

It's VOA that is supposed to correlate with winning, although for offenses and defenses it's designed to correlate with maximizing the next score. That validates the success factor and baseline concepts.

DVOA itself is supposed to correlate with the next season's DVOA. That validates the use of opponent adjustments to arrive at a predictive measure of repeatable skill. I merely used last year's DVOA as if it were next year's DVOA. I did put 'predict' in quotes when I remembered to. But if DVOA is working well, it should correlate with the previous season as well as it correlates with the next. Correlations have no "arrow of time".

Green Bay has had a decent lead over the entire league for the entire season in estimated wins. You have to go back to week 15 of last season before Green Bay drops out of the top 5 of estimated wins for the regular season.

Estimates wins uses a statistic known as "Forest Index" that emphasizes consistency as well as DVOA in the most important specific situations: red zone defense, first quarter offense, and performance in the second half when the score is close. It then projects a number of wins adjusted to a league-average schedule and a league-average rate of recovering fumbles. Teams that have had their bye week are projected as if they had played one game per week.

Looking at defense in the 4th vs 1-3rd quarters you see that the average change across all teams is about -2.5%. This makes me wonder what DVOA is doing. If the baselines were consistent across the whole game then one would be fine with a defense getting more tired than an offense and performing worse, but the baselines are supposed to move, in some mysterious way, with the progression of the game.