Advancing Our Knowledge of Hockey Through Statistical Analysis

Let’s talk Corsi…

I know I am in a bit of a minority but it is my opinion that one of the greatest failings of hockey analytics thus far is overstating the importance of Corsi at both the team and (especially) the individual level.

We care about Corsi% because it predicts future goals for/against better than just using goals for/against.

The problem is, this is only partly true and is missing an important qualifier at the end of the sentence. It should read:

We care about Corsi% because it predicts future goals for/against better than just using goals for/against when sample sizes are not sufficiently large.

We can debate what ‘sufficiently large’ sample sizes are but at the team level I’d suggest that it is something less than a full seasons worth of data and at the player level is probably between 500 and 750 minutes of ice time depending on shot rates based on some past research I have done.

Winning in puck possession and scoring chances is important and will lead to wins but does not encompass the full game. The largest factors outside of possession and chances are luck (ie: bounces), special teams, and combination of goaltending and shot quality (probably in that order).

The problem with that paragraph is that there is no context of sample size. Sample size means everything when writing a sentence like that. If the sample was 3 games played by a particular team luck is quite probably the most important factor in determining how many of those 3 games the team wins. If the sample is 300 games luck is mostly irrelevant. Without considering sample size, there is no way of knowing what the ‘luck factor’ truly is. Furthermore, luck will mostly impact goaltending (save percentage) and shot quality (shooting percentage) so while goaltending and shooting talent can have minimal impact on winning over small sample sizes, it can’t be known what impact they have over the long haul without looking at larger sample sizes. Far too many conclusions about shot quality and goaltending have been made by looking at too small of sample sizes and far too few people have attempted to actually quantify the importance of shooting talent at the team level. As a result, far too often I hear statements like ‘Team X’s shooting percentage is unsustainable” when in reality it actually is.

Below is a chart of the top 5 and bottom 5 teams in terms of 5v5close shooting percentage over the 5 years from the 2007-08 season to 2011-12 season along with their shooting percentages from last year and this year through Saturday games.

2007-12

2012-13

2013-14

Pittsburgh

8.45

10.12

8.28

Philadelphia

8.30

8.96

8.14

Tampa

8.29

7.68

7.40

Edmonton

8.17

7.79

9.01

Toronto

8.16

10.52

8.05

Top 5 Avg

8.27

9.01

8.18

Bottom 5 Avg

7.14

6.51

6.68

NY Islanders

7.23

8.14

7.38

San Jose

7.19

6.59

7.23

New Jersey

7.14

6.35

6.65

Ny Rangers

7.11

5.99

6.11

Florida

7.05

5.49

6.05

What you will see is that the top 5 teams had an average 5-year shooting percentage 1.13% points higher than the bottom 5 teams. This is not insignificant either. It means that the top 5 teams will score almost 16% more goals than the bottom 5 teams just based on differences in their shooting percentage. If one looks at 5 year CF/60 you will find the top 5 teams are just over 17% higher than the bottom 5 teams so over a 5 year span. Thus, there is very little difference in the variation in shooting percentage and variation in corsi rates at the 5 year level.

Now, are shooting percentages sustainable? Well, in the 2 seasons since, one lock out shortened and one not yet complete, the top 5 5-year teams have actually, on average, improved while the bottom 5 teams have, on average, gotten worse. Aside from the 2012-13 NY Islanders all the other bottom 5 teams remained well below average and nowhere close to any of the top 5 teams. There is no observable regression occurring here.

Based on these observations, one can conclude that when it comes to scoring goals at the team level shooting percentages is pretty close to being equally important as shot generation. I won’t show it here, but if one did a similar study at the player ‘on-ice’ level you will find the difference in the best shooting percentage players and worst shooting percentage players are significantly more important than the difference in shot generation.

I don’t quite know why hockey analytics got this all wrong and has largely not yet come around to the importance of shot quality (it is slowly moving, but not there yet) as there have been some good posts showing the importance of shot quality but they largely get ignored out by the masses. Part of the problem is certainly that some of the early studies in shot quality just looked at too small a sample size. Another reason is that 2009-10 seems to be a real strange year for shooting percentages at the team level. Toronto, Edmonton and Philadelphia (top 5 teams from above) ranked 25th, 23rd and 20th in shooting percentage while San Jose, NY Islanders and New Jersey (bottom 5 teams from above) ranked 6th, 10th, and 13th. These were anomalies for all those teams so any year over year studies that used 2009-10 probably resulted in atypical results and less valid conclusions. Finally, I think part of the problem is that analytics have followed the lead of a few very vocal people and dismissed some other important but less vocal voices. Regardless of how we got here for hockey analytics to move forward we need to move past the notion that shot-based metrics are more important than goal based metrics.

Shot-based metrics are OK to use only when we don’t have a very large sample size. The thing is, this isn’t true for most players/teams. The majority of NHL players have played multiple seasons in the NHL and teams have a history of data we can look at. We can look at multiple years of data to see how sustainable a particular teams or players percentages are. It isn’t that difficult to do and will tell us far more about the player than looking at his CF% this season.

When I am asked to look at a player that I am not particularly knowledgeable on, the first thing I typically do is open up my WOWY pages for that player at stats.hockeyanalysis.com, especially the graphs that will quickly give me an indication of how the player performs relative to his team mates. I’ll maybe look at a multi-year WOWY first, and then look at several single-year WOWY’s to see if there are any trends I can spot. I’ll primarily look at GF% WOWY’s but will consider CF% WOWY’s to and maybe even GF20/GA20/CF20/CA20 WOWY’s. I look for trends over time, not how the player did during any particular year. This is because the percentages can matter a lot for some players and it is important to know what players can post good percentages consistently from year to year. I then may look at that players individual numbers such as GF/60, Pts/60, Assists/60 as well as IPP, IGP and IAP to determine how involved they were in the offense while they were on the ice (and I’ll do this looking at several seasons, and multiple seasons combined). Then I’ll take a look at his line mates, quality of competition, and usage (zone starts, PP/PK ice time, etc.). Only then will I start to feel comfortable drawing any kind of conclusions about the player.

As I recently wrote and article suggesting hockey analytics is hard and the above explains why. There is no single stat we can look at to find an answer. A goal-based analysis has flaws. A corsi-based analysis has flaws. Looking at just a single season has flaws. Looking at multiple seasons has flaws. There are score effects and quality of teammates and quality of opponents and zone starts that we need to consider not to mention sample sizes. Coaching/style of play is another area where hockey analytics has barely touched and yet it probably has a significant impact on statistics and results (maybe especially significant on corsi statistics). Hockey Analytics is hard and corsi doesn’t have all the answers so it is important not to reduce hockey analytics to looking up some corsi stats and drawing conclusions. I fear that hockey analytics has over-hyped the importance of corsi at the expense of other important factors and that is unfortunate.

35 comments

… any year over year studies that used 2009-10 probably resulted in atypical results and less valid conclusions.
What makes 2009-10 less ‘typical’ than other years? Unless there were particular circumstances (rule changes and the like) that affected just that season I don’t see why you’d consider it less valid than any others.

In terms of those 10 teams (5 top teams, 5 bottom teams) 2009-10 seems to be an anomaly. Every other year there was significant separation between the top 5 and bottom 5 and rarely would a bottom 5 post a shooting percentage above any of the top 5 teams. In 2009-10 several bottom 5 teams surpassed several top 5 teams and on average their group shooting percentage were about the same. I don’t know why 2009-10 was so strange, it could be just randomness gone wild but whatever the reason it seems to be an anomaly.

“It means that the top 5 teams will score almost 16% more goals than the bottom 5 teams just based on differences in their shooting percentage.” I’m curious how you arrived at that. Using the 07-12 sample, isn’t a top 5 vs bot 5 13.6% different? Or are we talking about different things?

Also, I think using team wide samples dating back to 07-08 is specious. The rosters are almost totally different, the coaches are almost totally different, I don’t see how a comparison including styles/rosters that are totally different makes much sense. Unless the argument is that simply by virtue of donning an Oilers jersey you will improve your shooting percentage, which I don’t think you’re arguing.

Because of that, I’m not sure what you mean when you say there is no observable regression. I would argue your numbers show almost the exact opposite. There is a tremendous amount of regression. Just for starters, Toronto and Pittsburgh’s numbers have regressed 23% and 19% from last season to this season. San Jose’s by 10%. I don’t think there’s as much misunderstanding of shot quality as you’re arguing. No one’s saying all the teams will regress to identical percentages, but that doesn’t mean that outliers still won’t regress to a mean.

When given the option it is still a good idea to fade high team wide shooting percentages all day.

I’m curious how you arrived at that. Using the 07-12 sample, isn’t a top 5 vs bot 5 13.6% different? Or are we talking about different things?

8.27/7.14 = 1.158 implying top 5 are 15.8% better than bottom 5.

Also, I think using team wide samples dating back to 07-08 is specious. The rosters are almost totally different, the coaches are almost totally different, I don’t see how a comparison including styles/rosters that are totally different makes much sense.

Are they really? Sure there are major changes, but the majority of teams, especially the good ones, have the same core. San Jose is Thornton-Marleau-Pavelski, etc. Anaheim is Getzlaf-Selanne-Perry-Koivu, etc. Pittsburgh is Crosby-Malkin-Dupuis-Kunitz-Letang etc. For a lot of teams the core has been around for 7 years. What changes more frequently are coaches. What is interesting is over longer periods of time corsi loses persistence while goal rates increase persistence. This could be that coaches and playing style impact corsi more than the percentages.

Because of that, I’m not sure what you mean when you say there is no observable regression. I would argue your numbers show almost the exact opposite. There is a tremendous amount of regression. Just for starters, Toronto and Pittsburgh’s numbers have regressed 23% and 19% from last season to this season. San Jose’s by 10%.

What you are observing is randomness, not regression. Or at least a significant part of it. As a group the top 5 did not regress and as a group the bottom 5 did not regress to the mean either. Randomness will make you believe some are regressing if you look at them in isolation but for every one regressing others moved in the opposite direction and that is randomness.

When given the option it is still a good idea to fade high team wide shooting percentages all day.

Only if there is zero evidence that they cannot sustain it. Pittsburgh’s 5v5 shooting percentage is 8.4% this season which is well above average (average this year has fallen from previous years too) but I wouldn’t regress theirs at all. If anything I’d suggest their talent level is probably greater than 8.4%. Similarly, New Jersey’s shooting percentage is a below average 7.1% this year but should I expect them to regress towards the mean in the future? No because they have a history of being significantly worse.

Now there are extreme example like the Leafs last year, St. Louis this year that need regressing but my point is that we can’t just regress everyone to the mean. We need to look at their past history and see if their current shooting percentage is sustainable or not. An 8.5% shooting percentage for one team may be perfectly sustainable while for another team a 7.8% shooting percentage is significantly inflated. We can’t normally just look at a single years shooting percentage and draw any conclusions from it. Things just aren’t that simple which is my whole point.

I don’t really want to address all your grievances, but the team roster one is just so galling I feel I must. I really think you are grossly overstating how similar the rosters are.
Using Wikipedia and going by the acquired by year the following is a list of the teams you named and the number of players they have still on roster from the 2007-2008 to present day.

Feel free to use a 2 or 3 year sample and compare to the following 2 years and see what you find. Or of course you could look here to see how well goals and corsi do at predicting future point totals. Or you could look here to see how well corsi and goal rates predict future goal rates at the player “on-ice” level. Or you could look here to see that at the player level what makes good players better than bad players is more shot quality and shooting percentage than corsi.

Or of course, you could remain oblivious to reality and continue to nitpick the details.

I’m not sure it’s nitpicking to point out a critical error with your methodology that torpedoes your argument. If your argument is about teams having consistently high shooting percentage it’s imperative that your definition of “team” is consistent. Using your sample it is not. It’s possible you’ve done work that demonstrates your point before. Due to critical methodological errors, you haven’t done that here. I wasn’t attacking your conclusion as much as I was questioning your sample.

The most recent 2 year sample plus a full follow up season would require looking from 09-11 and then the 11-12 season.

Pointing out one ‘error’ while ignoring the overall conclusions supported by ample evidence is nitpicking.

And while you may be right in that a lot of players have changed over the years, you are wrong in that it nullifies the argument. If a GM can replace players with equally good shooters then not only is shooting a talent, a GM’s ability to build teams around that talent is also a talent.

David, I think you’re missing a key point: no one disagrees that shot quality matters, the issue is that when the the talent dispersal is narrow, as it is in the NHL, it’s more randomness than skill. The world jrs and olympics are great examples of true talent disparities being large enough that it leads to high SH% and SV% for the good teams and low for the bad teams.

I think Canada (in the olympics) and Kings (NHL) are demonstrating that shot quality can be minimally affected…DOWNWARDS depending on strategy. I guess it can technically go up but it’s overall very narrow and SH% that hover around 10%, by all accounts, do not appear sustainable.

I will agree with you that the larger the sample size the smaller a role luck plays, but I don’t think anyone is arguing the contrary.

the issue is that when the the talent dispersal is narrow, as it is in the NHL, it’s more randomness than skill.

My point is I don’t think it is as narrow as people make it out to be. Also, the talent differences at the player level are far more significant than at the team level and thus any player level evaluations need to consider shooting percentage or you are not getting an accurate evaluation of the player. If I am a GM of an NHL team and primarily use corsi as an evaluation tool I am doing a disservice to my team. Shooting percentage talent is at least on par with shot generation talent on a player level.

Finally, people always say “no one disagrees that shot quality matters” but I frequently have people telling me shot quality doesn’t matter or matters very little. For years there was a debate as to whether shot quality was even worth looking at and anyone who suggested otherwise were mocked but some very vocal members of the analytics community. Trust me, I have had many of these debates over the years. Things have changed slightly, but I still believe that corsi is wildly oversold and shot quality wildly undersold among most in the hockey analytics community.

Also, saying “no one disagrees” is not understanding reality and saying “no one disagrees” and then following it up with “but it doesn’t matter much” is missing the point too.

OK, there’s a lot to unpackage here- a lot of reasonable arguments, none of which I have seen from anyone within analytics that really matters.

1. I thought it was well accepted that certain players have persistently higher than avg shooting percentages, and an even more select group who can sustainably raise on-ice SH%. I assume that everyone takes this into account, is there someone who is important within the fancy stats community who doesn’t in general agree that shooting percentage which is sustained over long term is probably a good talent indicator?

2. Are you suggesting contemplating a shot quality strategy that takes advantage of better players against worst…? It’s called line matching, but the flip side is the other coach is doing that too so it becomes a wash. Maybe I’m missing your point, so if I am please feel free to clarify.

3. I think you are misunderstanding those that say ‘shot quality doesn’t matter’ as I think the real argument is ‘shot quality matters a lot but it comes about mostly randomly (except for the teams I mentioned above) so it’s not as important within a discussion of tactics and player evaluation as something as year to year repeatable as corsi.’ Maybe I’m wrong though.

I thought it was well accepted that certain players have persistently higher than avg shooting percentages

Yes, this is true.

and an even more select group who can sustainably raise on-ice SH%

Somewhat true, though I would suggest that “even more select group” is downplaying the spread a little. There are 110 forwards with >3000 minutes of 5v5close ice time over the past 6 seasons. The 30th worst player on that list is Mike Fisher at 7.86%. The 30th best on that list is Mike Ribeiro at 9.01% which would result in a 14.6% boost in 3 of goals scored given equal number of shots. The 30th worst SF20 forward is Legwand with a SF20 of 9.841 while the 30th best is Milan Michalek at 10.722 which would equate to just 8.95% better or 39% less important than the spread in shooting percentage. Even if we regressed all shooting percentages a little, the spread is still significant. And because we have a 3000 minute cutoff we are only looking at 110 players, or essentially first and second line players. As far as offense goes, driving on-ice shooting percentage is at least as important as shot generation.

Are you suggesting contemplating a shot quality strategy that takes advantage of better players against worst…?

No, I am mostly talking about player evaluation and team building. If there is a significant spread in player on-ice shooting talent then it must be considered in player evaluation and if players can exhibit that spread, there is no reason smart management who smartly builds a team through consideration for shooting percentage can’t. As I showed above, there are some teams that have shown to have maintained above average shooting percentages over the course of several seasons so not only can it be done in theory, it is being done in practice.

‘shot quality matters a lot but it comes about mostly randomly (except for the teams I mentioned above) so it’s not as important within a discussion of tactics and player evaluation as something as year to year repeatable as corsi.’

That may be the argument, but that is something I disagree with. What I am saying is that the randomness in shooting percentage makes it difficult to identify as a talent, but that every attempt to minimize the impact by luck either by looking at several years worth of data or by grouping similar players as Tom Awad did the result is that shooting percentage is clearly an important factor in player talent and should be considered in player evaluation.

Even in the long run, when the randomness in shooting percentage has largely evened out, we still see shot differential being three times as important a factor as shot quality in today’s NHL.

This is what I disagree with. At the team level I think it is probably more significant, especially at the extremes. Maybe approaching 50/50 historically. What I really want to emphasize is that at the player level it is probably more shooting percentage than shot generation and because of that it is theoretically possible to build a team that relies more significantly on shooting percentage than shot generation and thus we cannot dismiss that possibility so easily.

In the comments to Eric’s post Benjamin Wendorf wrote:

As for your question “how can we so readily dismiss it?” Because statistically there doesn’t seem to be (again, at the NHL level) a way to repeatedly take better quality shots. Everybody’s trying to take high quality shots, everyone’s trying to prevent high quality shots, and the talent is close enough across the league that nobody gets a major edge from that struggle.

Again, this is what I disagree with and it is comments like this, and the once I quoted in my article above, that I believe are over selling corsi and shot generation and underselling shooting percentage and shot quality.

“No, I am mostly talking about player evaluation and team building. If there is a significant spread in player on-ice shooting talent then it must be considered in player evaluation and if players can exhibit that spread, there is no reason smart management who smartly builds a team through consideration for shooting percentage can’t.”

So GM’s should build teams around sh% ala Toronto and EDM but not SJ and NYR?

I am not suggesting we ignore shot generation/prevention. I am suggesting we not ignore shooting percentage. Are you suggesting that SJ and the NYR shouldn’t try and improve their shooting talent and continue to be satisfied with first or second round playoff exits?

For what it’s worth, I actually agree that the analytics community is often too quick to dismiss shot quality arguments, and I think there’s value to be gained in looking at it on a team-by-team basis. One often encounters a broad assumption that any high team Sh% is bound to regress; in some cases that’s true, but in others it isn’t. For example, dating back to the beginning of last season, the Ducks and Leafs have shot 9.0% and 9.4% (respectively) at even strength, despite having played over 120 games in that time (my first impulse was to wonder about score effects, but those Sh% numbers are 9.3% and 9.0% when we limit to 5v5 Tied). At this point, regression back to league average is going to require an implausibly long cold stretch.

Nevertheless, for the shot quality argument to gain traction, we may have to go beyond numbers and rely on video, if the hope is to identify features of team systems that lead to elevated Sh%. Where I do think the Corsi-based arguments are stronger is on the question of repeatability: after digging into the subject in some detail, it looks like shot attempts against are the one measure that teams seem best able to control. Of course, with more data, my analysis could be replicated team-by-team . . .

Nevertheless, for the shot quality argument to gain traction, we may have to go beyond numbers and rely on video, if the hope is to identify features of team systems that lead to elevated Sh%.

Not sure I agree with this statement. If people aren’t willing to admit that there are significant differences in the on-ice shooting percentages of the top quartile players vs the bottom quartile players by looking at the numbers (which to me are about as clear as day), I am not sure video will change that. I think what needs to happen is a new group of analysts come up and take hockey analytics in a new direction, leaving those that choose not to change their thinking in the past.

I do think you are right about shots against. I would view corsi as more of a defensive stat but for that we must look at corsi against independent of corsi for. I have toyed with the idea of developing a ratings system based primarily on corsi against and goals for but haven’t spent the time researching that yet. I do think that corsi is heavily driven by playing style and probably most easily impacted by a coaching change (see what happened in Toronto when Carlyle took over, corsi tanked despite largely the same players).

I think what needs to happen is a new group of analysts come up and take hockey analytics in a new direction, leaving those that choose not to change their thinking in the past.

And I think this is happening. One of the great things about hockey stats is that the barriers to entry are basically nonexistent: if you have the interest, some knowledge of stats, an internet connection, and enough time to set up a site (which is really easy via Blogger or WordPress), you can be part of the community in no time. Even in just the past year, it seems like the number of people contributing to this field has exploded. And, of course, there are people like Chris Boyle taking innovative approaches to the shot-quality question.

As far as the contribution of coaching, I did take a stab at that here. In general, not enough data to tease many things out. The best example I found of a coach transforming a team’s style of play was Bylsma.

I have toyed with the idea of developing a ratings system based primarily on corsi against and goals for but haven’t spent the time researching that yet.

I do something similar when I evaluate our Penguins players. I like to look at Fenwick Sh% and Corsi Sv% as part of the evaluation.

I like Fenwick for Sh% because I felt that a player’s ability to hit the net should also be factored into their ability to score. If they hit the net only half the time they shoot the puck it kind of makes it less impressive than a player with a slightly lower percentage of goals who hits the net most of the time he aims for it. But the shooter has no real control over whether or not his shot gets blocked, so I tend to downplay the impact of Corsi For.

And then I use Corsi for Sv% because a defensive player’s ability to block shots (and to a lesser extent disrupt plays and causing opponents to take low quality chances) should be taken into account when looking at their defensive skill. I still prefer Fenwick Against over Corsi Against when just looking at shot totals though, because blocked shots are an active defensive strategy, so using that as a negative just doesn’t make sense to me. The opponent is going to miss 100% of the shots you block, so why does Corsi count that as a negative?

Of course I don’t have a fancy way of looking at it like the formulae you use in your HARO/HARD stats. It would be much more meaningful if I could find a way to look at it in relation to their teammates and opponents. But I look forward to reading your future endeavors into improving the statistics community at large.

Thanks for the post, David. I agree with the premise of the article, that shooting percentage does vary from team to team, and can be important in predicting wins. The problem up until this point has been that it’s difficult to sort the data from the noise when doing these analyses, as even player shot% can vary widely from year to year. Also the team player turnover (as stated above) is astonishing, and makes the numbers meaningless.

I would propose that instead of using historical ‘team’ SH%, that you use historical individual player SH%. Build your team SH% by averaging all your current forwards’ SH%, factored by expected ice time. As roster changes occur, recalculate the team data. Now, compare this to the league average to find the ‘skill’ advantage of that particular team.

Take your Score Adj Fenwick (or your favourite flavor of corsi stat) and adjust it by this variance. I would wager that your predictions might get a little more accurate.

That doesn’t account for save percentages. Goalies are so unpredictable that it may not be possible to do it.

But you can’t just use raw individual shooting data. You have to put it into a “per 60” type of stat and then look at their deployment.

Take Clarkson for example. On the Devils he got far more ES and PPTOI than he does now with the Leafs. As a result, his overall shot total per game is way down, but his shot total/60 is more or less the same.

Of course. There’s also the success Toronto has had in the shootout. These are added points that only consider individual shot % (and opponent sv%) and have no connection to possession whatsoever. The shootout alone is probably destroying possession based predictions. There must be a way to incorporate this into the equation.

I think the shootout argument is a little overblown. Take the Leafs for example. They are 9-4 int he shootout. That looks lopsided but if they were 7-6 no one would say a thing and it is only 2 points in the standings. For most teams it isn’t a big deal though 0-8 New Jersey may have something to complain about. Were they 4-4 they’d be in a playoff spot right now. Generally speaking though, it isn’t really a significant factor.

Although not directly the same, soccer and hockey share similarities. With soccer, shots closer to the goal with no defender near you (and only the goal keeper in the way) is going to have an expected goal probability closer to 0.5 than to 0 (An expected goal probability of 1 would mean that the shot is guaranteed to go in, no matter what), and if you kick a shot from midfield with 2 defenders draped all over you, the expected goal probability will be closer to 0 than to 0.5. I imagine this is similar with hockey. I understand that after a large sample size, than a lot of the noise in shooting percentage is drowned out, but that doesn’t mean that there is an element of shot quality. There is a reason why forwards have better shooting percentages than defensemen, and there is a reason why top-liners have better shooting percentages than fourth-liners, even after regressing for multiple seasons.

This is a project that I am interested in doing in the offseason, to determine the expected shooting percentage (even strength, and maybe power play) of a team, based on the players on the roster.

Search for:

Welcome

Welcome to HockeyAnalysis.com, where I strive to get a better understanding of the game of hockey through the use of statistical analysis. I hope you enjoy whatever time you spend here and maybe even learn a little. If you have any questions or comments, feel free to drop me an e-mail at david (at) hockeyanalysis.com.