The importance of quality of competition

Abstract

Quality of competition faced is often used qualitatively when assessing a player’s performance, but a quantitative adjustment has proven elusive. It has been widely presumed that the difficulty arises from stratification of playing time; if the players who face top competition are usually themselves good players, then we would not see facing top competition correlating with poor results.

Here, we look at results at the individual shift level rather than the season level, so that we can compare how the top players do in their shifts against good players to how they do in their shifts against weak players. We find that they are, as expected, more successful in their shifts against weaker competition. However, for the most widespread competition metrics, no player faces extremely strong or weak competition on average – the measured differences, while real and persistent, are small and scarcely worth correcting for.

Introduction: Statistical corrections for a player’s usage

An important function of non-traditional hockey statistics such as those found at behindthenet.ca is to give insight into a player’s usage, which provides context for evaluating his results. If we only looked at the stats that measure outcomes, a player with average results would be considered average. However, if we know that he achieved those average results while taking on particularly tough minutes, we interpret that as a better-than-average performance.

In effect, we are grading on a curve – judging players not by their absolute performance but by their performance relative to what we expect from the average player put in those situations. The most common things to look at when evaluating a player’s usage are the quality of his teammates, the quality of his competition, and how often he started in the offensive or defensive zone. Each of these factors likely has an impact on the player’s outcomes, and as statistical analysts we seek to quantify that impact so we can correct for it when evaluating performance.

Using that approach, an average player who started in the offensive zone as often as Daniel Sedin did last year would have been expected to see his team get 56.1% of the shots, while an average player put in Manny Malhotra’s skates would see his team get just 39.2% of the shots. With differences that large, we can understand why it would be important to correct for a player’s zone starts.

Attempts to correct for the strength of a player’s teammates are also common. For example, rather than look at Corsi shot differential, we will look at a player’s relative Corsi: the difference between the team’s shot differential when he is on the ice and when he is off the ice. By asking how much better his team did when the player was on the ice, we remove some of the teammate effects and hope to get closer to understanding which players are the best and worst on a given team. Teammate effects are still in play – being one of the better forwards on the Bruins is more impressive than being one of the better forwards on the Blue Jackets, after all – but though imperfect, the quantitative corrections are very useful.

Correcting for the quality of competition that a player faces is surprisingly rare. One reason for this is that the analysis isn’t simple at all. A simple regression shows absolutely no relationship between a player’s Corsi or relative Corsi and his quality of competition, and multivariable analysis suggests competition has just a very small impact. This is counter-intuitive, and it is often presumed that the reason we don’t see an effect is because usage and skill are linked. If goons mostly face other goons, they’ll break even (on average) and the simple statistical look will conclude that competition doesn’t matter much – we would see that even the guys who get to play against goons all the time only break even.

Developing a correction for quality of competition

So if we just ask “how does a guy do if he faces goons most of the year”, the answer will be “not so great”, because he’s probably a goon himself. We never get to run the experiment where Pavel Datsyuk plays against goons for the better part of a season. But he does get the occasional shift against them, and we can finally answer this question by looking at the data at the individual shift level instead of at the season level. We can reframe the question as “how much better does the average player do in the shifts where he faces goons than the shifts where he faces top lines” and thereby remove the influence of how often he actually faces each opponent and how good he is.

Here is a plot of how even strength results changed as a function of quality of competition in 2010-2011. It is immediately evident that quality of competition is of course very important – we can now see that if the Datsyuks of the world played against goons all year, their teams would get almost 2/3 of the shots while they were on the ice.

It is clear that quality of competition is an important aspect of context, that players do substantially better in their shifts against weaker opponents. However, it is still an open question whether this is something we can and should correct for at the season level – does anyone face competition that is consistently strong (or weak) enough that we need to correct for it?

How big are differences of usage?

We know that coaches manipulate ice time to get certain matchups. This results in real, persistent differences between players in the competition metrics; the one based on relative Corsi is particularly effective at identifying which players the coach tries to use against the opponent’s top line.

However, although the differences between players in quality of competition are real, they are not very large. The plot above goes from +10 to -10, but actual year-end totals go from roughly +1.5 to -1.5, with most players falling in a much narrower range than that. Everyone faces opponents with both good and bad shot differential, and the differences in time spent against various strength opponents by these metrics are minimal. The Flyers were relatively focused on matching lines in 2010-2011 – Andreas Nodl ranked 15th in competition among NHL forwards with at least 500 even strength minutes played, while Blair Betts ranked 290th (out of 337). And yet a histogram showing how much of their ice time was spent against opponents of various strengths shows scarcely any difference between them:

Now we see why it has proven so difficult to apply a correction factor based on competition faced: to a first approximation, everyone is facing more or less the same competition by these metrics. The difference between being 95th percentile in competition and 13th percentile is a scarcely perceptible shift in their ice time. This makes sense – if there were players who played the majority of their ice time against top-tier opponents, we would see quality of competition numbers in the +5 to +10 range, but we do not see anything like that in practice. There is no Manny Malhotra for our competition metrics.

Choosing a competition metric

All of the plots and analysis in this article used Corsi quality of competition, but identical analyses were also performed using relative Corsi quality of competition and showed exactly the same patterns.

I believe part of the explanation arises from the selection of the metrics. In this analysis, I have used a player’s shot differential as the measure of how good he is (in the competition metric) and of his results. However, as we have already discussed, simple shot differential requires various correction factors to account for the impact of usage. Ferrari’s study included corrections for zone starts and quality of teammates, which improves the assessment of both the player’s performance and his competition. It may simply be the case that this more sophisticated metric more precisely identifies the strength of competition and therefore finds more differences between players in competition faced.

I also suspect that some statistical quirk made the results of his study come out stronger than the true long-run effect. Ferrari found that differences in competition faced accounted for 55% of the non-luck component of performance. Simple symmetry arguments would suggest that each team should be responsible for 50% of the non-luck component on any given shift. The spread in quality of competition that players face is almost certainly narrower than the spread in quality of skill they play with — Evgeni Malkin spent a lot more time playing on a line with as much skill as Malkin-James Neal-Chris Kunitz than he did playing against lines with that much skill.

So over the course of a season, differences in competition should even out more and account for less than 50% of the difference between players, which makes me suspicious that random chance played a role in Ferrari’s 55% finding. I am looking forward to the work Jared Lunsford is doing to replicate Ferrari’s approach in part because I hope to better understand the numbers Ferrari found.

Conclusion

While competition certainly does play a big factor in determining how a player will do in any given shift, with these competition metrics we see nobody with usage extreme enough to require a major correction factor. Using the curve for the average player on the first chart, we can calculate that Nodl’s 95th percentile usage is only harsh enough to bump an average player’s Corsi down to 49.5%, while Betts’ 13th percentile usage would be soft enough to allow an average player to post a 50.6% Corsi.

The analogous situation in zone starts would be if everyone’s offensive zone start percentage were between 48% and 52%; such small corrections are scarcely worth the effort, and a person who ignored competition when evaluating players would not be wrong by much. Quality of competition is very similar to shot quality: it plays a huge role in individual shifts/shots, but over the course of a season the differences across teams and players are small enough that it can usually be neglected.

These competition metrics provide valuable insight into what a coach thinks of a player and how he tries to use them, but in practice they do not show differences large enough to have significant impact on the player’s results.