Thursday, July 27, 2006

Hockey study: how much is a faceoff worth?

A recent issue of JQAS contains an interesting academic study on hockey that reaches a few conclusions about strategy through Markov Chain analysis.

(I tried to write an easy primer on Markov chains, but I should get it read by people who know what they’re talking about. For Markov chains in baseball, the world leader is Mark Pankin, who has done a whole load of batting order studies.)

The author, Andrew C. Thomas, divided a hockey game into 19 states. Nine states are the 3x3 combinations of team in possession of the puck (team A, team B, faceoff) and zone (team A’s zone, team B’s zone, neutral zone). Two more states are faceoffs at the respective blue lines. Two more states are goals having been scored. Four states are possession in the offensive or defensive zone after a turnover. And the last two states are possession in the defensive zone after deliberately retreating to avoid a forechecker.

Based on observations of 18 games of the Harvard men’s hockey team, he then calculated probabilities of moving from one state to another. Having figured the states and the probabilities, this allowed him to use Markov Chain techniques to analyze certain aspects of the game.

I don’t understand everything Thomas did, but it seems more complicated than I thought it would have to be. For instance, the study does a lot of work to include a continuous time factor in the Markov Chain. In reality, it doesn’t matter how long it takes a team to move from the defensive zone into the neutral zone – there’s no 24-second rule in hockey, so you can take as long as you want. All that extra complicated math (over the use of a discrete-time Markov chain, like a baseball lineup) doesn’t seem to add much to the conclusions the study draws.

Also, I’m not able to figure out, from the study, how Thomas gets his probabilities. I would have thought he would just watch the games, watch how often things happened, and use those observations as probabilities. But he does something more complicated – “Bayesian inference with a multinomial/Dirichlet model.”

(I’m not an expert on Bayesian, but I know you use that kind of model when you have prior information on what to expect. For instance, if a player goes 3-for-4, the naïve statistician would estimate that he’s a .750 hitter. The Bayesian statistician would note that he can’t be a .750 hitter, because hitters are normally distributed with a bell-shaped curve that ends well below .400. The Bayesian approach is to say, what can you expect from the 3-for-4 hitter *given* that he’s pulled from that normal (prior) distribution? And the answer might be, he’s a .275 hitter on average.)

The implication is that there is prior knowledge about what that number should be, and so even if the state goes from defensive zone to neutral zone 75% in real life, you can’t take that figure at face value. But I can’t figure out what that knowledge is – why, if the observed proportion of pucks brought out of the zone is 75%, the study wouldn’t just go ahead and use 75%.

Or maybe I just don’t understand the Bayesian technique at all.

Anyway, given the model, Thomas comes up with these findings:

After 40 seconds, the current situation is no longer dependent on the starting situation. That is, if you start out in your own zone, you’re less likely to score in the first ten seconds than if you’re in the opponent’s zone. But you’re *equally* likely to score between the 40th and 50th second no matter where you start.

Carrying the puck into the opponent’s zone is, in terms of goal differential, almost exactly as valuable as the dump-and-chase. However, the dump-and-chase leads to a slightly lower probability of either team scoring, and so perhaps is slightly worse when behind by one goal in the closing minutes of a game.

If you start with the puck in your own zone, you should expect to be outscored by .0043 goals over the next 40 seconds. That is, every 233 own-zone starts cost you one goal.

If the other team has the puck in your zone, you should be outscored by .0258 goals . That’s one goal for every 39 possessions.

If you give the puck away in your own zone, it only costs you .0244 goals (1 in 41). That’s actually less than if the opposition brings the puck in themselves.

It’s one goal for every 47 faceoffs won in the offensive zone, one in 143 for neutral zone faceoffs, and one in 67 for faceoffs at the blue line.

Yeah, I guess it is similar to regression to the mean. The formula by which you figure out how much to "regress" is more complicated, though. In fact, there's no formula -- it all depends on the distribution of the prior, I think.

And, in theory, you could be going *away* from the mean. Suppose all pitchers are .100 hitters, and all batters are .300 hitters. A player goes 11 for 100. You'd probably adjust his .110 to .100, thus moving away from the league mean.

You don't necessarily need to regress to the mean of the population. The right mean to regress to is the *best* mean. Let me explain .... In you example if player A goes from 11/100 and is a pitcher then you'd regress to .100 if that is where you expectation of his performance lies.

You are right about the article though. I had another look through it and it does get extremely complicated for the layman to follow!