Romer took NFL play-by-play data from 1998, 1999, and 2000. (He looked only at first quarter results, in order to minimize the effects of score and time remaining.) He then used “dynamic programming” to fit a smooth curve to the point values of first downs on every yard line on the field. Having done that, he’s now easily able to figure the best fourth-down strategy.

For instance, Romer’s analysis found that having the ball first-and-ten on your own 30- yard line is worth about +0.9 points (these numbers may not be exact – I’m reading them off a graph). The opponent having the ball in the same spot is worth –2.9 points.

Now, suppose you have a fourth-and-short situation on your 30. If you go for it and make it, you’ll be at +0.9 points. If you go for it and miss, you’re at –2.9. If you kick to your opponent’s 30, you’re at –0.9 (since the other team will be at +0.9).

Therefore, Romer would find that if you can make fourth down with at least 45% probability, you should go for it. That’s because (45% of 0.9) + (55% of –2.9) equals the sure -0.9 you get from punting.

Romer then goes one more step. He figures out that the chance of making five yards or more is very close to that 45% chance. Therefore, a team who’s fourth-and-5 would presumably be indifferent (in the economic sense) between punting and going for it. And so you’d expect that teams in that situation would punt about half the time.

But in real life, teams don’t go for it 50% of the time. Romer doesn’t tell us the actual percentage. However, he does figure out the empirical “yards to go” breakeven point for coaches based on their behavior. That is, if coaches won’t go for five yards 50% of the time, when *will* they go 50% of the time? Is it three yards? Two? One?

In their own territory, it’s effectively zero – coaches never went for it more than 50% of the time, even on fourth and 1. In the opponents half of the field, though, they did. Eyeballing the graph shows the trend is irregular, but appears to average about 1/4 of the theoretical yardage. For instance, on the opponent’s 30, the breakeven point is fourth-and-7. But coaches acted as if the breakeven were about fourth-and-3. (This information comes from figure 5, which is a very interesting chart.)

It would be interesting to see the converse – now that we know that coaches go for it half the time on fourth and 3, how often do coaches actually go for it on fourth and 7? Is it 10%? 20%? Never? Romer doesn’t give us this information. This means (at least in theory) that coaches may be rational, but their own calculations are different. For instance, maybe they think that fourth-and-6 is the breakeven, so they seldom go for it with 7 yards to go, but often go for it with 5 yards to go. This is theoretically possible, especially since in “The Hidden Game of Football,” authors Bob Carroll, Pete Palmer, and John Thorn [CPT] come up with different numbers.

CPT’s methods seem similar to Romer’s, but their conclusions are different. Romer writes that CPT “do not spell out their method for estimating the values of different situations ... and it yields implausible results.” (I wasn’t able to notice anything that implausible, but I’ll give Chapter 10 another read.) Also, Romer writes that “their conclusions differ substantially from mine.”

One significant difference is that Romer advocates going for it on your own 30 when you have fewer than 5 yards go to. CPT, on the other hand, advocate punting even on fourth-and-1 (according to CPT, punting is worth –0.5 points, going for it on fourth-and-1 is worth –0.8, and going for it on fourth-and-6 is worth –1.9).

The difference seems to be the values assigned to the different field positions. Both studies give –2 points for first-and-ten on your own goal line, and +6 points for first-and-goal on the opponent’s one-inch line. But CPT assigns values linearly between those two values, while Romer has an S-shaped curve, flat in the middle but steep near both goal lines. To CPT, the value of a punt from one 30-yard line to the other is more than three points – but to Romer, it’s only two points. That’s a big difference, and it’s enough to make the difference between punting and not punting.

I don’t know whose numbers are right. It would have been nice if either source had shown us a chart of actuals – that is, how many times did team X have the ball on their 30, and what were the eventual results? That way, we could try to evaluate both authors ourselves.

In any case, it’s probably true that all the conclusions, on both sides, are dependent on the situation values. Without actuals, we really don’t know which is correct. If CPT are correct, and Romer is wrong, it could be that coaches aren’t all that conservative at all.

This is a meaty paper. It’s interesting to read, and provides much data (mostly in graphs) that would be useful to football sabermetricians. I do wish I understood Romer’s method a bit better – I’m not familiar with “dynamic programming” – so I could form an opinion on whether his values are empirically superior to CPT’s. (CPT don’t provide any raw data either.)

But even with Romer’s interesting data, I’d still wish for more. We are told that when it’s a 50% shot, coaches go for it less than 50% of the time. But what about when it’s a 70% shot? Or a 90% shot? Does that make a difference? Do coaches go by yards, so that they’ll always go for it on fourth-and-inches regardless of field position? And, as I wondered above, is it possible that coaches agree with Romer’s analysis, but are just using different numbers?

It seemed like Romer found one way of looking at the data that showed strong risk aversion, and stopped there. Sabermetricians, who are more interested in the particulars of coaches’ strategy than the yes/no question of whether they maximize, may be frustrated that the author has so much data yet chose to show only a small part of it.

And if anyone has any data or suggestions on figuring out whether Romer’s data is better or worse than CPT’s, I’d be interested in knowing.

13 Comments:

I agree that coaches are too conservative on going for it on fourth down. However, it appears to me (and correct me if I'm wrong) that this study does not allow for the fact that the defensive tactics change with the down. If I'm a defensive coach and it's fourth and five, I'm not going to worry so much about my opponents going for a big play. I'm going to do my best to prevent them from making those five yards. On other downs, I'm going to worry about the big play as well. It seems to me that this would make a significant difference.

The study did assume that fourth down success rates are the same as third down success rates. The idea, I guess, is that teams feel that if they stop a team on third-and-5, they get the ball back -- same as on fourth-and-5 (although the rewards are greater on fourth down, because there's no punt).

That seems reasonable to me, but I think others have criticized that assumption.

The author covers this issue on page 20 of his study ... he says he looked at fourth-down plays from his play-by-play sample and found no significant difference in success rates.

Romer provides a very interesting analysis. I can believe that coaches are too conservative, but I'm skeptical that the gap is as big as suggested by Romer (though it's certainly possible).

On the 3rd vs. 4th down issue, I think a possible difference is that the consequences of failure by the offense are different: failing on 4th down is often much worse as you give up possession rather than punting (depends on field position). This means that the offense will be concerned with maximizing probability of 1st down, and will rarely go for long yardage. In practice, this means they should run more often than average. However, if the defense knows a run is coming, they can defend more effectively.

That's one point I think Romer misses: when the offense faces constraints on their choice of play, that helps the defense, even if both sides have equal knowledge of the constraint. The single biggest edge the offense has is they alone know what play is being called; anything that narrows their effective choice of plays has to help the defense. As for the empirical data from later quarters, I just didn't find it that convincing. So, it may be true that 3rd-down success is an adequate proxy for what would happen on 4th downs, but I don't think he quite proves his case.

Other issues:

I find it surprising that the value of field position is linear (except perhaps at the two extremes, accorind to Romer). Because scoring usually requires several successive first downs, it would seem that the probability of scoring should decline exponentially with yardage from opposing goal line. For the same reason there is just one 50+ game hitting streak in baseball, but six 40+ streaks and 42 streaks of 30+ games, it would seem that moving from your 20 to your 40 yard line should be worth less than moving from your 40 to the opposing 40. So I wonder whether Romer is underestimating the cost of failing on 4th down deep in your own territory, as well as overestimating the value of converting in that situation and keeping possession.

I also wonder how a win expectancy approach would modify Romer's conclusions, if at all. He treats all points, and thus (for the most part) all yards, as equally valuable. But in football, unlike baseball, scoring events are not independent. Once a team has a lead, it can influence (somewhat) the time available to the other team to score. Also, the trailing team at some point has no choice but to take greater risks (going for touchdowns over fieldgoals), which must on average reduce their points scored. So getting the lead in the early quarters -- and preventing your opponent from doing the same -- is extremely important.

Another analysis of the value of field position was done by Aaron Schatz at Football Outsiders here.

Bill Krasker built his own dynamic programming model based on win probability and uses it to examine fourth down decisions (among many other strategy questions) on footballcommentary.com.

Other similar models are here and here and go as far back as Carter and Machol's 1971 and 1978 Operations Research papers.

Romer's conclusions tend to be more aggressive compared to CPT, Krasker, and others. But the most extreme analysis is by Jason Scheib, who proposes a strategy of never punting.

It turns out that even the most aggressive NFL coaches don't go for it anywhere near as often as the most conservative of the recommendations suggested by the available research. All those cited above have come to the same conclusion, one way or another, that NFL coaches are indeed too conservative on fourth down.

Not sure I agree with you about the third vs. fourth down thing. I think the chances of success should be *higher* on fourth down.

Suppose that team X makes it 55% of the time on third down. And suppose they play exactly the same strategy on fourth down. They should still make it 55%, no? Unless they decide to play a different strategy (say, run all the time). But they'd do that only if they could *increase* their chance beyond 55%. If they can't, because the defense is onto them, they can just stick with the 55% strategy.

Now, the defense could try a different strategy themselves to try to reduce the chance below 55%. But if they could do that, why wouldn't they do that on third down too?

Here's how field position could be linear: suppose the chance of making a first down is 70%. Then, as you point out, the chance of making six straight from their 20 to score a touchdown is lower than making 1/3 of the chance of making two straight from the opposing 30. However, there's always the chance of a monster 70-yard play. That benefits the two situations equally, and may cause things to balance out.

Phil:Re 3rd/4th down: Let's say teams convert on 3rd and X 55% of the time. I don't agree this reflects a distribution of plays that maximizes conversion. It should also include some long-yardage attempts with a lower success rate but a higher payoff when successful (more yards or a score). What's being maximized is the expected points at the end of that play, which isn't the same as the chance of getting a first down. Plus you have a game theory component: if a team never goes long, it will soon find that it's less successful on runs and short passes (as defense adjusts).

But on 4th down, the offense has less latitude because a failure to convert brings a much worse penalty. Some teams may still attempt a 25-yard pass, but only if they're at midfield. Certainly at 4th and 2 on your own 30, virtually every team will try a run or short pass. And that knowledge has to help the defense at least a little. In any case, that's my theory.

Re: linear value of field position: I agree that the possibility of big yardage plays should flatten the curve a bit. But even if 70-yard passes were as common as 50-yard passes, they can't be more common. So I don't see how this could result in a straight-line relationship. However, there are other complicating factors as well: FGs, where field position value must max out at around the 25; similarly, once you are past the opponents' 40 or so there must be few gains in terms of opponent field position if you have to punt. Still, I have trouble seeing why the curve isn't somewhat exponential.

I see your point about field position. Let me think about it some more.

On the fourth down thing, I'm not arguing that the optimal strategy on fourth down is the same as on third down. I'm just saying that the optimal fourth down strategy has to lead to *at least* the same probability of making it as third down. If not, the team could just revert to their third-down strategy and get that percentage.

Suppose that on third and 5, the offense runs half the time and passes half the time. They make a first down with 60% probability on the run, and 40% on the pass, for an overall 50-50 chance of making it.

On fourth down, they might go for the run, so their probabilities change. But if the probabilities were less than 50% of making 5 yards, they just won't go for the run!

I would agree with you that the expected gain *on a run* on fourth down may be less than the expected gain *on a run* on third down, for the reasons you mention. That is, the chance may drop from 60% to 55% because the defense is anticipating it. But if it dropped to less than 50%, the offense would just pass sometimes on fourth down to foil the defense.

Phil:You may be right, but I'm not sure. You're assuming that on 3rd down the offense is trying to maximize it's conversion %, while the defense is trying to minimize it. If so, then you're correct: the offense could convert at least as frequently on 4th downs as it does on 3rd. But that should be the goal on 3rd down: the teams should be trying to maximize/minimize total point value of the play. So the defense might be capable of driving down 3rd down conversion to 45% (rather than 50%), but only by giving up more damaging long pass completions. On 3rd down, that's a bad trade and they don't do it. But on 4th down, it becomes a good trade because the reward for preventing conversion is much greater. So the defense changes their strategy. The offense can then try to adjust as well, but they still have to give more weight to maximizing conversion% than they did on 3rd down. I don't think we can know that the end result is a success rate as high as on 3rd down.

OK, I see what you're saying ... you're saying that even if the offense doesn't change its strategy, the *defense* will, because the reward for preventing a first down has increased. I agree fully that you're right about that.

The question then becomes: (a) how much does that affect the chances of the offense making it, and (b) how much does the offense's adjustments counteract the defense's new strategy?

Intuitively, I suspect the overall difference between third and fourth down is low after taking the strategies into account. (It is even theoretically possible that the expected point value is *higher* on fourth down -- if the defense is doing something stupid -- but I doubt very much that it is.)

But I don't agree that the offense faces constraints on fourth down but not on third down. They face a different strategy from the defense, and they have to adjust accordingly, but that's not really a constraint.

Regarding 3rd vs. 4th down strategies: 4th down may actually allow riskier strategies. The cost of turnover is no worse than failing to convert.

For example, a team can select riskier pass plays because an interception is no worse than an incompletion in most cases. (And might be better in others.)

4th down conversion rates might be higher because it would be irrational for a QB to "throw it away" out of bounds. On 4th down, he could force a pass without (much) fear of the interception. (There is always the threat of a runback.) On 3rd down the QB would be smart to throw out of bounds rather than force a pass.

Hi guys. Interesting discussion here. I had one question that was somewhat germane to it, and I hoped one of you might be able to answer. Specifically, I was trying to figure out what % of NFL plays took place on 4th down.

I could care less whether the play was a punt, field goal, conversion, etc. Am just curious how often 4th down occurs relative to all the other downs combined.

My offhand guess would be ~10-12% of all plays occur on 4th down; if anyone can add any informed light, that'd be great, and much appreciated.