Sunday, December 23, 2007

2007-08 5v5 Goaltender Performance

I've put together a simple first-order system to analyze goaltender performance. First, I calculated the probability of scoring from 2x2-foot quadrants using data from 2001 through last week's games:

I then took every shot faced by each goaltender and calculated the number of goals we'd expect an average goaltender to allow is he faced the exact same set of shots. [Note that shot distances are relative to the end boards, not to the goal line.] This is obviously a coarse estimate: we have no information on whether shots were defended or screened, and because of the difference in the way shot types are recorded in different arenas, I haven't differentiated between them. [Yes, in hopes of improving the system, I am happy to accept critical comments!]

At any rate, the table below shows the number of goals allowed by each NHL goaltender this season (minimum: 300 mins) and the expected number of goals allowed by an average goaltender.

So as an example, Henrik Lundqvist has allowed 41 goals on 567 shots at 5-on-5, for a 927 save percentage. However, an average goalie who faced the 567 shots from the same locations would expect to allow 58.3 goals, for an 897 save percentage. On a per-60-minute basis, Lundqvist's Goals-Against-Average is 1.90 - 0.80 goals lower than an average goalie. I'll publish some charts for individual goaltenders so we can compare the best goaltenders to the worst in this metric.

Actually, even absent the goalie's personal stats, the analysis is interesting. What I mean is that it's fascinating to see which teams limit opponent's chances to the kind of shots that an average goalie would stop, versus a team that allows shots more apt to beat an average goalie.

The goalie's stats laid over top of those is just icing. (so to speak)

I've played around with this a few years ago. It's cool stuff. I would suggest using road shots only to come up with a factor though - Alan Ryder looked at the problem and concluded that there's some serious bias in the data - the Rangers scorers are insane, for one. Good stuff though - this is absolutely the future.

I'm going to make some assumptions about what you did but I personally would recommend using rink as a random variable in the model for expected goals. Then use the broadest inference space to estimate the expected goals (you could get more particular and compare each goalie against the narrow inference space for each game but that sounds a bit to awkward for my liking.)

Something that I think would be cool to do with this:As I see it now, this is a good measure of how a goalie "fits" into his specific situation and is valuable to his team compared to how other goalies would do in his situation (maybe certain goalies are better at taking shots from certain places). You could also use it though to compare goalies more generally. Say if you find how many shots are taken in each specific grid (in your sample space) and assume that every goalie got exactly that set of shots per game (so that each goalie is facing an equal number of shots from equal distances and so on), then see which goalie would in that instance have the best save percentage.That would seem more fair in actually ranking the goalies (not that that was necessarily your goal).