Statistics

The unmentioned, underutilized, and novel

There’s a seemingly endless amount of hockey statistics. Many are commonplace, such as shots and goals. Some are openly debated, e.g., plus-minus (+/-). Some are often overlooked, like icing and offside events, or independently researched but not implemented on a large scale, e.g., shot quality. The backbone of icetistics.com focuses on those stats that are often overlooked, not commonly implemented, or completely novel in nature. Follow the links below to learn more.

Puck Possession and Natural Turnovers

(15 minute read)

Background

Indirect estimates of puck possession and possession efficiency, such as, Corsi, Fenwick, and Tango, are commonly used by analysts. These statistics offer insights into offensive and defensive efficiency, but they’re quantified independent of time. Yes, metrics like Corsi FOR/AGST per 60 minutes, and other similar stats exist, but these metrics do not quantify temporal puck possession, i.e., the length of time that a given team was in possession of the puck. To our knowledge there is no publicly disseminated metric in the hockey analytics community that quantifies direct estimates (temporal) of puck possession. We’re changing that.

Instantaneous puck possession changes, i.e., those that happen ‘on-the-fly’ and not after a stoppage of play, are documented in the NHL play-by-play files via giveaways and takeaways. It can be argued that both metrics are subjective in nature. They’re clearly turnovers, but a large gray area exists when defining whether a team truly committed a giveaway or a takeaway. In our article on quality assurance and quality control (QA/QC), we’ve identified that many documented giveaways and takeaways are invalid. For more details on this, see our article, here.

Here, we define natural puck turnovers as changes in puck possession that are i) not documented as giveaways or takeaways and are ii) not directly identified within the play-by-play datasets. Quantifying natural puck turnovers allows us to derive estimates of temporal puck possession. We believe that temporal puck possession estimates may be valuable resources to the hockey analytics community especially when used in conjunction with other statistics. Our approach is detailed below.

Methods

By investigating certain information among sequential rows (i.e., a time series) of an NHL play-by-play file, we can discern, with some degree of error, puck turnovers, and subsequently, which team had the puck and for how long. In the NHL play-by-play files, certain information is documented with respect to the event team and not necessarily with respect to the team possessing the puck. As such, logic is needed to identify the team possessing the puck. There are a handful of events that confirm the event team has control of the puck, such as a shot or a missed shot. Similarly, a handful of event types inform that the documented event team does not have the puck, e.g., a hit. We use this information to assign puck possession to a given team for each documented event in a hockey game. A single possession is discerned by investigating sequential lines of an NHL play-by-play file and identifying points in the game (indices), where it is clear the puck has changed from team A to team B and vice versa.

Consider Table 1, which shows a truncated play-by-play file for a 110 second span of a single game between the Washington Capitals (WSH) and Dallas Stars (DAL) that took place on 2018-11-04. Can you identify which team has the puck and where natural turnovers may have occurred?

Table 1. Example NHL play-by-play file for game 20200 of the 2018-2019 season between the Washington Capitals and the Dallas Stars. seconds = elapsed seconds; ev.team = team committing the event; eventTypeMore = event type; description = description of the event (includes the event type and applicable players).

How did you do? Could you identify which team had the puck and when? Were you able to identify all five of the natural puck turnovers in addition to the two documented giveaways?

Now look at Table 2. Notice that an additional column is present in Table 2, named possession. This column directly informs which team has possession of the puck. The possession column and rows identifying general turnovers (those highlighted with red boxes) are derived via our algorithms and are not comprised in nominal NHL play-by-play files. Notice also that these added rows contain time estimates within the seconds columns. These are defined as halfway (temporally) between both sandwiching events. This approach is straightforward but introduces some temporal uncertainty; the more elapsed time between documented events, the greater the uncertainty when defining the exact time of the turnover.

Table 2. Example NHL play-by-play file with turnovers (possession changes) identified for game 20200 of the 2018-2019 season between the Washington Capitals and Dallas Stars

Let’s quickly work through the beginning of the play-by-play file to validate our estimated turnovers. We know that DAL won the faceoff at 752 seconds (Jason Spezza is on DAL and Evgeny Kuznetsov is on WSH). Faceoff wins/losses are heavily scrutinized statistics in the NHL and because of this we assume that this information is correct. The next documented event is a giveaway (GIVE) by Matt Niskanen of WSH at 777 seconds. How is it possible that Matt Niskanen of WSH gave the puck away to DAL when the last documented event was a faceoff win by Dallas? It’s not. A player can’t give the puck away when it isn’t in their possession. This means Matt Niskanen or another member of WSH had gained control of the puck at a point in time after Dallas’s faceoff win and before Washington’s giveaway. Our automated algorithm identifies this and places a possession turnover event at 764.5 seconds, i.e., exactly halfway temporally between the two sandwiching events (see Table 2).

Further Processing

Individual Possession Statistics

After identifying natural turnovers for a given game, we generate a new dataset that is broken down by possession. Table 3 provides an example of this for a game between the St. Louis Blues and the Winnipeg Jets from the 2018-2019 season.

Table 3. Possession summaries for game 20011 of the 2018-2019 season between the St. Louis Blues and the Winnipeg Jets

Each row of the dataset displays a summary of an individual possession for a given team. The possession summaries are organized in sequential order of the game like the play-by-play datasets. For each possession, we generate a handful of information invcluding but not limited to:

the start and end times of the possession, as well as the duration (seconds)

the zone where each event occurred

the descriptive strength (man power) of the team possessing the puck, e.g., “Power Play (up 1 man)”.

score differential

period

sums of individual events, e.g., shot, hit, goal, that occurred during the possession. In addition to common NHL stats, we include a handful of statistics that are not commonly discussed, such as icing, offside, shot quality, or are novel in nature (e.g., SHOAG). You can learn more about these metrics here

Possession Splits

After the possession summaries are generated, we produce possession ‘splits’ by aggregating the possession summaries for a given game by:

team

team + strength (man power)

team + period

team + score differential

team + period + score differential

Table 4 provides an example of a team-level split for the exemplar game between the St Louis Blues and the Winnipeg Jets. As you can see, the Winnipeg Jets beat the Blues by a score of 5 to 1. Interestingly, the Blues had puck possession for substantially longer than the Jets. In fact, the Blues had the puck for over 530 seconds (that’s just under 9 minutes) longer than the Jets during the game! The Corsi, Fenwick, and Tango statistics of the Blues are also greater than those of the Jets. With all this in mind, the following question comes to mind. How did the Blues lose so badly?

Table 4. Team level splits for game 20011 of the 2018-2019 season between the St. Louis Blues and the Winnipeg Jets.

To answer the previous question, it is beneficial to check out some of the more granular splits that we produce. Table 5 shows the score-differential splits with commonly documented NHL stats for the same game. It is shown in descending order of puck possession time relative to the score-differentials shown in the scoreTeam column.

Table 5. Score differential splits with common stats for game 20011 of the 2018-2019 season between the St. Louis Blues and the Winnipeg Jets.

Several things are evident in Table 5. First, the game was tied for only a short period of time (i.e., 241 seconds, roughly four minutes) until Winnipeg took an early lead in the game on a power-play goal (note the Penalties Drawn column). Second, the Winnipeg Jets had a one-goal lead over the Blues for a majority (roughly 70%) of the game. During this time the Jets committed three penalties and as a result, the Blues went on three separate power-plays. This aided in allowing the Blues to have puck possession for over five minutes longer than the Jets (see rows 1 and 2 of Table 5). Despite this, the Blues were unable to score a single goal during any of these power-plays. If we filter the data to even-strength play during the portion of the game where the score differential is +/- 1, the possession times drop to 1189 and 1036 seconds for the Blues and Jets, respectively. These findings hint at a notable phenomenon: the leading team tends to play more ‘defensively’ while the trailing team tends to play more ‘offensively’. This is something that commonly occurs in fluid sports like soccer (Bunnel 2018), basketball (Goldman and Rao 2011), and hockey. When analyzing temporal puck possession estimates for the first 660 games of the 2018-2019 season, we find that when leading in a game, the average NHL team (for lack of a better term) controls the puck roughly 4% less than their trailing opponent. In this game, the Jets controlled the puck for nearly 14% of the time less (i.e., -14%) than the Blues while playing with a lead. This falls outside the low end of the IQR for the season-data of the NHL (see red dot in Figure 1).

Figure 1. Temporal differences (percentage) for leading, tied, and trailing situations as calculated between two opposing teams across all the 660 games. Note that the ‘trailing’ data display a mirror image of the ‘leading’ data. Outliers have been removed from the plot.

That the Jets committed three penalties to the Blues one penalty during this time further confirms this phenomenon. Teams that have possession of the puck usually are not the ones committing a penalty, unless it’s a penalty where both teams are guilty, e.g., fighting. Although it might seem counterintuitive to think that the trailing team draws more penalties than the leading team, it’s true. In a study on NHL penalties, Shuckers and Brozowski (2012) revealed that the leading team is usually penalized more than the trailing team. Routley (2015) shows identical findings in his study of over eight years’ worth of play-by-play data, and notes that this may suggest a levelling bias in penalty calling. The data for the game between the Jets and the Blues conform to these findings.

The latter two points are evident when looking at the stats when the game was at a five-goal differential (Table 5, rows 5 and 6). The Jets never gained puck possession once they obtained a five-goal lead, hence why all their stats are zeros during this portion of the game. After the Jets scored their fifth goal of the game, the Blues won the following faceoff and immediately drew a penalty. The Blues then went on to barrage Connor Hellebuyck, the Jets’ goalie, for the entire power-play and scored a goal just after play returned to even-strength. This made the score 5-1. While all of this happened late in the third period of the game and had little to no impact on the result of the game, it does exemplify both of our previous conversation points – leading teams tend to be penalized more and possess the puck less than their trailing opponent. These concepts work in conjunction with one another because of the man power advantage. A team on a power-play usually possesses the puck longer than their penalty killing opponent.

Possession Efficiency

As noted in an earlier part of this document, we calculate a handful of somewhat less common, but important statistics, such as shot quality (see Krzywicki 2010; Ryder 2004; Ryder 2007 for more details), and novel statistics generated by us. Background information on these statistics can be found here. For the sake of brevity, we’ll refer to these types of metrics as ‘advanced’ stats. Table 6 shows the score-differential splits along with some of these advanced statistics. We’ve removed the row of zeros for the Jets when they had the five-goal lead.

Table 6. Score differential splits with advanced statistics for game 20011 of the 2018-2019 season between the St. Louis Blues and the Winnipeg Jets. Data for the Jets at a five-goal lead are not displayed because all stats are zero as a result of no puck possession.

Check out the shot quality (SQ) of each team for all score differential scenarios. The Jets displayed superior shot quality compared to the Blues during all scenarios except for play occurring when having a lead of at least four goals. In other words, The Jets had superior shot quality when it mattered.

The Jets had a one-goal lead going into the third period and then scored three goals within the first eight minutes of the third period to take a four-goal lead. Accoding to Routley (2015) there is a 67% probability that the away team will win the game when having a one-goal lead going into the third period. In other words, the Jets were already had probability in their favor going into the third period. After scoring three goals in the third the probability of the Jets winning was >95%. They could have completely abandoned shooting the puck altogether (which they did once they had a five-goal lead) and still won the game.

Although the Jets committed twice the amount of penalties as the Blues, all of which occurred while having the lead, a strong argument about the Jets’ discipline and possession efficiency can be made. Check out the number of icing events committed by each team. The Blues iced the puck a total of seven times in the game, five of which occurred when they trailed by one goal (Table 6). This was an integral part of the game, and the Blues essentially forfeited possession five different times on their own. The Jets only iced the puck once all game and it occurred when they had a four-goal lead. The Jets scored three of their five goals at even-strength play (Table 7). Two of these occurred without an intermediate event occurring between gaining puck possession and scoring (see the events and eventsTOT columns). In other words, after the Jets gained possession of the puck their next action was scoring a goal. That’s arguably the most efficient play sequence possible. The Jets’ other two goals were scored on manpower differentials. Their first was a power-play goal, their second was a short-handed goal. If you average the sum of all the events occurring between the Jets gaining possession and scoring a goal, it equals one. That’s remarkable. The Blues lone goal came shortly after their power-play expired. After winning a faceoff in their offensive zone, they finally put the puck in the back of the net after seven other events occurred. If we look at the goal sequences temporally (possTime column), the average duration of each possession-to-goal sequence for the Jets equals roughly 25 seconds. If you sum up the possession times of each goal sequence for the Jets, it equals 127 seconds. That’s the same amount of possession time it took the Blues to score their only goal of the game. Essentially, the Jets scored five goals in the same amount of time that the Blues scored one goal.

Table 7. Possession sequences ending in goals for game 20011 of the 2018-2019 season between the St. Louis Blues and the Winnipeg Jets.

Current Work

We are currently focused on exploratory analyses of temporal estimates of puck possession. We will continue to investigate the topics and patterns brought forth in the above paragraphs. Much of this work can be found on our Head to Head Matchups pages.

Future Work

We plan to tweak our algorithms to better estimate times of natural turnovers (possession changes). We are aware that our current algorithm may contain a great deal of uncertainty in instances where the time between sandwiching events is large (≥ 45 seconds). Luckily, this accounts for only a small portion of the data.
We’re hoping to generate predictive analytics that will inform future team performance (and matchups) as a function of the many possession- and efficiency-related statistics.

References

Bunnel D (2018, Jul 5) Which World Cup Team is the Best at Wasting Time? Retrieved from:
https://fivethirtyeight.com/features/which-world-cup-team-is-the-best-at-wasting-time/