If you've been following my few posts here you will know that I am studying Machine Learning for my masters in Computer Science and I am using those techniques and applying them to hockey analytics. I have found some pretty interesting stuff. Some of my first few posts I wrote here (namely this one and this one) were based off some of my academic research. I have since turned those into an academic paper that was accepted to a Sports Analytics paper.

INTRODUCTION

Whether it's on a per-game basis or season-to-season, we spend an awful lot of time looking at shot differentials to get a better feel for the strength of a hockey team. Unfortunately, the hockey blogosphere doesn't actively track rolling numbers at the team-level, a study that could give us a better handle of teams truly gaining/fading over the course of a season.

After the jump, a compilation of rolling team Corsi for the Western Conference during the 2012-2013 season.

Introduction

In my last article I was looking at prediction limits for machine learning and sports. More specifically I answered the question of how much of the standings are because of luck (aka, random chance, stochastic process etc). By using classic test theory and looking at the variance of the observed win percentage over 7 seasons between 2005-2006 and 2011-2012 and comparing it to a theoretical league where the level of teams talents are normally distributed. From this we were able to conclude that luck explains ~38% of the variance in the standings.

This is interesting, much higher than one might initial think, but it makes sense and I will discuss this further on. As my area of research is in Machine Learning and using Machine Learning to make predictions in hockey. What I am curious today is to answer the question, is there a theoretical limit to predictions we can make in hockey?

Introduction

If you've been following along with us here at NHL Numbers, you'll know that this is the fourth and final installment of a series in which we've went ahead and broken down forward lines and defensive pairings into "tiers", and compared their production to not only that of their own team, but also the tiers of the other clubs's in their conference.

We separated the conference into two separate posts since there were zero cross-conference games during this previous lockout shortened season. Therefore, it doesn't make sense to compare the two. We took a look at the Western Conference defensemen on Monday. Today, it's time to evaluate the Eastern Conference blueliners.

Introduction

On Thursday and Friday of last week, Travis Yost made his debut on this platform with a fascinating little research project. His goal was to take a look at how individual tiers of forwards for each team performed relative to their team's overall production, using even-strength zone-adjusted Corsi. Now, I've been tasked with doing the same, but for defensemen.

Why are we choosing to use "tiers", instead of terms such as "lines" and "pairings", which you are likely more familiar with? By doing so, we're able to account for things such as injuries, trades, call-ups, all of which naturally take place over the course of a season. There are so many little variations, and so much tinkering that takes place when it comes to specific usage, that it would really be a hassle to evalute things using those more commonly used terms. These tiers usually correlate very strongly with the most-regularly used combinations, anyways (i.e. the team's top two defensemen are more often than not in tier 1, and so on).

Just remember: context is crucial, especially when looking at data such as this. A substantial injury to a crucial player thrusts an unlikely candidate into a role (more specifically a higher tier, based on more frequent usage) that you may not have expected. But that's sort of the point of this. It tells us how well teams were able to perform with the pieces available at their disposal over an entire season.