Sunday, January 10, 2010

Run/pass, continued

The previous post gave a simple model for play sequencing that results in unequal run/pass payoffs, but that's not the only way for things to be complicated in play-calling. In the previous example, you've still got a stable Nash equilibrium, located at the minimax point of the two-play sequence. Given the variation we see in NFL playcalling, it seems unlikely that a stable Nash equilibrium exists at all - coaches almost seem to choose the fraction of times that they run based on personal preference, rather than some optimal mix.

Competition dynamics can provide more complicated behavior than a simple stable Nash equilibrium, however, but essentially require nonlinear behavior in the "utility function" - that is, the function that gives the payoff for a choice as a function of the opposing player's choice. One possibility is a fixed point which is unstable - that is, one of the players has incentive to move away - but which admits oscillations about that point. You see behavior like this with predator-prey dynamics.

A simple, non-linear model

Consider the following game:

Each team chooses a number of plays, either run or pass.

The plays are then presented, one against each other, randomly.

The payoff for a run vs. run is -1-(number of previous run vs. run plays)/(number of previous plays)

The payoff for a run vs. pass is 1-(number of previous run vs. pass plays)/(number of previous plays)

The payoff for a pass vs. pass is -0.5-(number of previous pass vs. pass plays)/(number of previous plays)

The payoff for a pass vs. run is 2.5-2.5*(number of previous pass vs. run plays)/(number of previous plays

This may seem very bizarre, but basically the idea is that as the various personnel see a certain type of play more, they react quicker to it rather than the other option. One note about this model is that things jitter around a lot early on (as the fractions jump around a lot) but then settle as the game goes on and approach an equilibrium behavior. I'll only talk about the equilibrium behavior.

Assume that the fraction of runs the offense attempts is x, and the fraction of pass Ds the defense attempts is y (yes, continuing my ridiculously silly column order mixup). As the number of plays gets large, the utility function for run and pass approach:

As you can see, the utility function is clearly non-linear, and moreover, depends on the offense's choice. Passing or running 100% of the time is a poor choice regardless of what the defense does : for a pass, the payoff is zero if the opponent plays run all the time, and -1.5 if the opponent plays pass all the time.

This model isn't quite as silly as the previous one - in my mind, it's just as viable a model for football as the simplistic zero-sum, constant payoff game. Even if the defense is in a pass formation or in pass personnel, if an opponent constantly runs similar plays at them, they'll get better at recognizing it and will perform better against it. Not as good as the proper formation would, but better.

However, the "optimal strategies" for this kind of a game are extremely different. One way to see this is to look at the average payoff per play, as a function of pass fraction and run D fraction. Here's the plot for the simple zero-sum game, with coefficients ((0.5,-1.5),(-0.5,1.5)).

I've compressed the scale and shifted it a bit to make it easier to see the main structure, which is that cross-shaped structure around (0.5,0.75). That's the minimax point, and the Nash equilibrium - note that at a run D fraction of 0.75, the defense has no incentive to change its playcalling, since the result is always the same. Similar for the offense at 0.5. And, as we expect, at these points, the payoffs for the two options are the same. Thus, the equilibrium is stable - if either player changes his strategy at this point, that player's payoff will decline or stay the same, and the other player will have a better strategy available as well.

But what about for our new "learning defense" game?

Wow, that looks completely different. There is something "kinda like" the equilibrium structure, at about (0.75, 0.75) here, but it's tilted almost 45 degrees. That is, past this point, "more passing/more run D" is strictly better for the offense, "more passing/less run D" is strictly better for the defense, and so forth. Note that this point is already weird - it's saying "run 75% of the time, but play pass defense 75% of the time."

This is not located at the minimax point for both players - the offense minimizes the defense's maximum payout at about ~60% running, whereas the defense minimizes the offense's maximum payout at about ~70% pass D.

That point is also not stable! At (0.75,0.75) the defense thinks it can do better by playing either run D more or pass D more. The offense thinks it's perfect. But when the defense plays, say, more run D, the offense can then improve by playing more pass. What you end up with are orbits around that point - in fact, if you model each team's behavior as "if you can do better, you try it" you get what's called a limit cycle, where defenses and offense continually chase each other's tail.

And that's the key to having unequal payoffs between runs/passes here: the "learning D" game model results in a situation where defenses always can do better, but offenses can always counter, and the method by which the two do better results in them oscillating over time between "run heavy/pass heavy." However, in this situation, over time, sometimes runs would be better, sometimes passes would be better - in a long term average, they would be somewhat close to equal. So the problem here may be that coaches aren't stupid, and over any small timeframe, passes and runs wouldn't be equal, but long term, things would somewhat balance out.

But! There's another equilibrium here, and it's stable: at roughly 90% passing, 10% running, with defense playing 100% run D (at 0.1, 0). The offense thinks it's doing the best it can, as does the defense for small changes - it can't see that if it played 40% run D (a huge change) it'd get the same results, and even less run D would give it even better results (pushing back to the limit cycle).

At this point, the run/pass payoffs aren't equal at all! Runs produce -1.1, and passes produce 0.25. Think about how insane this seems: the offense is passing 90% of the time, but you're playing 100% run defense - because it's 'good enough,' and playing the pass 'a little bit' makes things worse. From the offense's point of view, runs are god-awful, but mixing them in even a little boosts the output of your passing game a lot (from zero to 0.25 in this case).

Sounds like a plausible description of the current situation.

So now we have two ways that we can have unequal run/pass payoffs without stupid coaches:

Offenses may be optimizing multi-play sequences, rather than one play at a time.

Defenses may play better when exposed to the same situations.

In the second situation, there may be a better option available (the league could be 'stuck') but the coaches aren't being stupid, because the 'better option' doesn't really appear 'better' unless you drastically change; small changes just make things worse.

In this case, the game theorists could be both right and wrong; it may be that the current situation isn't ideal, but the league is trapped in a local optimum away from a global optimum.