I began writing about hockey in 2005. Through a combination of timing and proximity, I have had the fortune of a ringside view of the genesis, dissemination and popularization of hockey's so-called advanced stats. Over this two part series, I will share some of the insights engendered by this somewhat unique perspective. My focus will be on what's currently happening in the league now as teams flock to build analytic departments around possession theory, as well as why the movement grew outside of the league's front offices and where we may expect this sort of analysis to go in the future.

The off-season of 2014 may well be remembered as the summer of stats, although corsi numbers and their various accoutrements made their way into popular discourse earlier in the year when they began popping up in national broadcasts and game day discussions. No doubt the new numbers began to spread in part due to the spectacular failure of the Toronto Maple Leafs, a club that had been deemed as a bellwether for possession-based theory at the onset of the season. Their subsequent 84-point, 12th place finish in the face of expanded expectations and executive confidence was the metaphorical canary in the coal mine as it were.

The Leafs case study also seems to be the main impetus behind the recent rash of stats-nerd hires by NHL teams. Lead by the Leafs acquisition of Kyle Dubas, Darryl Metcalf (Extra Skater), Cam Charron and Rob Pettapiece, a wave of official interest has crashed over the shores of the auto didactic amateurs who were instrumental in spurring the movement, subsequently washing advanced stats further inland into public consciousness by giving the numbers an imprimatur of authority that was previously lacking.

From my own knowledge base and observations as well as discussions with NHL teams and stats hires, here's a general road map for what is - or, at least, what should be - occurring in the new analytics departments springing up around the league.

Establishing Roles

It's no doubt a mistake to claim that "analytics" is new to NHL clubs. Indeed, no doubt every team already had at least one executive and a handful of interns (or healthy scratches) busily collecting internal metrics, watching video and making recommendations. What's new, instead, is an understanding and adherence to 1.) the work and theory that underpins corsi and related measures and 2.) the collection of skills and abilities required to conduct true "big data" statistical research.

As such, the emerging blueprint for the evolution of NHL analytics departments is hinted at by the Toronto Maple Leafs hires: a collection of individuals with related, but differing, skill sets that, when combined, can theoretically perform rigourous empirical studies and create valid statistical models. The interdependent roles roughly translates to:

1.) Lead Analyst(s)

A person or persons with deep background knowledge of the existing work and theory in the field, as well as hockey in general and the team in question. This role would involve filtering through both qualitative and quantitative information to conjure insights about individual players, trends in player valuation, on-ice tactics and exploitable inefficiencies in the market.

Deep existing knowledge would mean not replicating work or investigating previously abandoned blind alleys. A familiarity with behavioral economics and probabilistic thinking would be a strong requirement of the role since it requires big picture, "forest" (rather than "trees") thinking and an ability to resist many of the common psychological pitfalls that tend to skew decision making in the NHL (and many other human endeavours) currently.

2.) Computer Scientist

Building a robust internal dataset requires the creation of databases, stats counting applications and user interfaces. Teams should be looking for a computer scientist who is not only capable of building spreadsheets, but also one familiar with what has worked "in the wild" already. Although this is a task that could be contracted out to any moderately skilled developer, the need to educate someone from scratch about what is important in hockey in general and what is important in advanced stats specifically could result in a lot of wasted time and work.

3.) Math man

Though the lead analyst(s) should be conversant in statistical concepts and modelling, it may may make sense to parse this particular set of skills into a separate role entirely. Having a deep background in statistical theory and modeling would ensure the potential insights rendered from data collection are mathematically valid.

4.) Champion

Perhaps the most important role is that of the analytics champion. With these departments suddenly being grafted on to existing appendages in an organization, there is no small threat of the "stats guys" being segregated behind a thick, sound proof wall of internal ambivalence or apathy. Particularly since some of the insights of corsi-based theory tend to fly in the face of hockey orthodoxy.

If corsi concepts and strategies are to trickle up into the front office or down into on-ice tactics, a club will likely need at least one stakeholder who is willing to advocate for their validity and utility.

Low Hanging Fruit

Besides the fact that the NHL is a "me-too" league, there are practical reasons for teams to be jumping on the bandwagon sooner rather than later (with apologies to the few clubs that were ahead of their time). The two most notable are:

1.) Pluck the low hanging fruit

2.) Inoculate against opponents gaining a competitive advantage

It's true that hockey's fancy stats are still very much in their infancy. Though not perfect, the theory is at least developed enough to help decision makers be less wrong even as theory matures. We might one day get closer to being more predictive with more precision, but corsi can help GM's avoid landmines, make slightly better bets on young players or understand when their roster construction is fundamentally flawed right away. If integrating this sort of knowledge can help a team avoid just a single damaging, albatross contract in the realm of David Clarkson or Brooks Orpik, it will pay for the entire first 5 years of the departments existence. At least.

The second and more cogent issue is the possibility of falling behind other clubs in terms of statistical understanding and tools, both in perception and reality. It could be rather damaging for a GM to be taken to the cleaners in a few trades by clubs with known stats departments, for instance*.

*(Aside - I suspect Tyler Bozak will be put on the trade block any day now. Expect him to be offered to the clubs who haven't yet bought into possession theory first)

The Ideal State

There is obviously a long road ahead in terms of new stats actually having an impact at the NHL level. Though they have infiltrated analysis on mainstream websites and broadcasts to some degree and there have been a few high profile blogger hires so far, it's a different matter entirely for them to hold any sway in the show.

The ideal state for advanced stats advocates and organizations alike is to build understanding of current and legacy corsi theory; conduct internal processes to gain proprietary data/unique knowledge, and; find a way to marry these things with practical applications and the existing knowledge base in the NHL.

Each step along the path is no small feat in isolation and I expect we'll see some clubs fail at one or all of them before finding their footing (or abandoning the experiment altogether). For those who get it right, the sweet spot will be intersection of math, empiricism and real life experience, as expressed by this graphic (grabbed from Nassim Taleb's Facebook page):

NHL organizations mostly have the left bubble wrapped up. Their challenge now is to develop the other two spheres and integrate them accordingly.

Next up - The reasons why corsi grew up outside of the NHL and where it's going next

It can seem to the general public that, from behind the curtain of NHL management, the hiring of a "stats guru" means that the person will do something occult with the numbers to provide vague information to the club which will help them make better decisions.

An overlooked portion is understanding that having the numbers only gets you part of the way if you don't also employ someone who can correctly and isightfully interpret those numbers then the effort is waster.

I look forward to this season more so now because of these recent changes around the league and seeing how they are dealt with by media members.

Good post though I think the teams that get it right will be the teams that don't get hung up on corsi/possession and look deeper into the stats. The next big advantage will not be corsi/possession.

I will say it seems teams are probably currently in four groups.

1. The teams who don't believe in analytics or believe they are doing enough but probably aren't looking at anything to creative or innovative and mostly using them to reaffirm their pre-existing beliefs. There are probably still a fair number of these teams who are fooling themselves into thinking they have an analytics team.

2. Teams that have actually built a useful, functional internal analytics strategy over the past several years that didn't include hiring from the hockey analytic bloggers. There are probably only a few of these teams.

3. Teams that are dipping their toes in the analytics water by hiring bloggers as consultants. I'd consider the Oilers hire of Dellow and likely whoever hired Tulsky in this category.

4. Teams that are going all-in building a full-fledged internal analytcs department like you outline above including a programmer/database expert (or two), a statistician/math expert (or two) and maybe even a group of people to watch games and track events.

I suspect the transition from most teams being in category #1 to most teams being in category #4 will take several years but the process has started.

May be wrong but there appears to be a glaring hole in the roles. The Collector or something to that effect.

'The numbers' surely need to be collected within an organisation that is serious about this. Not every number possible but at least the stats on their own players should be their own, and able to be verified against the public databases...that aren't the property of the Leafs.

Having an econometric background and working in quantitative analysis, once the low hanging fruit is gone a key differentiator will be finding the relevant information and / or signals that give you the ability to be less wrong.

Also, a mate is studying a PhD in sports science using gps technology. Admittedly this work is on players running around 20kms per game across a massive field (http://oilersnation.com/2014/4/26/wanye-at-the-anzac-game) but that technology of having a small tracking unit carried by each player, in training and match day, would advance the stats quickly and efficiently for any team willing to invest. It would also be scalable, allowing a farm team to adopt similar processes.

I'm a believer, I'd love to see the Flames make a clean jump ahead on their competitors by being smart about their use of it.

All teams already have "stats collectors" The Flames themselves hire a couple of collectors and then a bunch of volunteer mules who run the data to and from the executive. I'm not exactly sure what they collect (if just traditional data or modern shot data and tracking as well) but nevertheless it is done. I suspect the collection part is more low level though as pretty much anyone with a one page guide can count shots and saves, blocks, etc.
So the role you speak of is probably a none paid one is what I am saying.
And my guess is their pitch would be hey come hang out get some free food from the executive box and watch the games live for free as your pay.

The next step is developing the evidentiary and logic bubbles to the point where they debunk or perhaps prove long held traditional ideas that some might believe to be myths. Myths like an enforcer on the ice makes the skill player better in the long term. Or that shut down defencemen need to be larger physical players. Or that one scoring line is better than distributed scoring throughout the lines. Or even that your top defenceman should play 30 minutes a night.

The 30+ minute defenceman is one I'd like to see quantified better. I remember when Ray Bourque played for the Bruins and it seemed like he was on the ice ALL the time . It always looked to me that he was playing to survive and would avoid physicality and it would end up hurting his team. How effective is that guy really late in the third if you have ridden him all game ? Sure he has more skills than the guy stapled to the bench - but at that point - is he more effective ?

The comparative one for me is starting pitching in baseball. Baseball has had a 5 man rotation for the last 45 years and a 4 man before that. It seems to me whenever I watch baseball that pitchers do pretty well their first 2 times through the lineup - the first 18 batters - if they get 15 out (a .167 obp) - they are only through the 5th inning. After that - and about 75 pitches they seem to falter. Conventional wisdom in baseball is that in order to be a stud starter you must consistently get to the 6th inning or later and throw 100 pitches or more. The idea that 5 guys have to pitch to their physical limit once every 5 days - throwing those 100 pitches or more and the 6 guys in the bullpen might throw 200 total over those same 5 days seems like a waste of resources to me. Are those 5 guys really 5 times better on average than the 6 relievers ? By pitch count they are. Is this dichotomy science based or based on tradition. Why not plan on 2 guys for 60-75 pitches each every day - and have a 3 or 4 day rotation 2 man tag team, instead of relying so heavily on 5 guys - spread the load to 8. Your best 4 pitchers are going to get out more often but not kill them selves by pitching 100+.

I'd argued a few years back that the Oilers should look at installing video tech in the ceiling of the new arena to track the movements of each and every player, every shift.

It would allow them to gather data on not just their own players and opposition, but could be employed during junior games as well to collect some data on the WHL players/prospects as well.

As to collecting data, the easy stuff is just building data scrapers from the publicly available stuff off of NHL.com and other sites.

I mentioned before that RussianMachineNeverBreaks is working (and I believe Kent is part of the group) on something designed to remain public.

The trick is in deciding which metrics you feel are most representative for players and roles, giving a determined amount of weight to a statistical category for a particular player/role.

Eg: CorsiRel works well for forwards, but is less useful for D. Straight Corsi may have some more value for D as it relates a total game situation relative to opposition's puck control. Just as Fen, Fenclose, and QualComp adjusted as well as other variants are useful for determining some team strengths.

I tend to ignore some of the categories like PDO, sh%, and straight Corsi until the field is narrowed down a bit. With defenders I generally give more weight to Qualcomp and ES/TOI before resorting to the Corsi. It is these elementary steps in discriminating the data that teams need to figure out and each one is going to find differences in how they interpret the data in much the same way that scouting directors differ slightly in the way they rank prospects.

The first point you reference, with regards to enforcers on the ice, is probably related in WOWY numbers. Check Gazdic's WOWY's and you'll virtually every player was worse with him than without.

Regarding Bourque: if we had the stats, he may have shown well in scoring chances (the old Roger Neilson system) for/against. So even though it looked like chaos, he created more than he surrendered and his coach lived with it.

I think this is what the Oilers have in mind with Justin Schultz (not that he is Bourque, just in that he will create more chances for than against). I'm not convinced of it, but it is probably the direction they are headed with him.

Hard to answer that last bit without all the biometric data. I don't know what the recovery time is on a 60-75 (+warm-up) pitch count would be.

Additionally that's further complicated by a few factors, specifically the indefinate duration of a baseball game, the extent to which additional reliever innings impacts their effectiveness (max effort vs. reserve effort), and lasty how it would affect your ability to pitch to the platoon advantage.

I'm less au fait with the detail (still learning the intracasies of the game itself) but I am very interested in what can be used by those with far more expertise (yourself, Kent and many others that contribute and comment here) to better a team's performance.

I'd love to see the Flames install a hybrid of your in stadia tracker and the analytic output tool of an EPL, NFL or AFL club.

WOWY numbers only capture the effect on linemates, if I understand correctly. I'd be interested to see the results for teammates. It wouldn't work for two players who dress for the majority of games together, but since enforcers are more likely to be bounced in and out of the lineup, there are probably some good results to be had.

Now, to be fair, at least Albertans, Edmontonians included, can count to thirteen.

Unlike some other prairie franchises.

Seriously, though, I'm no expert. I've picked up enough over the years to be familiar with what aspect of the game a statistic is theorized to reflect.

Corsi, CorsiRel, Fenwick and similar "possession" stats all basically relate to one old hockey quote from someone whom I know well enough not to name around here.

He said "you miss 100% of the shots you don't take". In other words, you can't score if you don't shoot and you can't shoot if you don't have the puck.

The most important stat in hockey is the score. The score derives from goals which (Steve Smith fiascos aside) derive from shots on goal. Shots on goal derive from having the puck on your stick. Measure shots, blocked, on net, or at net, and voila.

Fenwick is a refinement thereof and is very useful in determining puck possession teams while weeding out what is called "garbage time" in football parlance.

Some of this stuff can be used to improve performance. Eg: Taylor Hall last year reached out on Twitter to the analytics crowd to ask how he could improve his Corsi numbers after reading up on them. The response was, partially from the work that Tyler Dellow had done as well as a host of others, to carry the puck into the zone rather than dumping it in.

In most situations though, analytics can superficially be used to improve a team's efficiency when it comes to talent identification and acquisition/retention. Mikael Backlund is a good example. By his scoring boxcars he looks like an alright player. His advanced stats suggest though that he is an excellent possession player that the rest of the league will one day wake up to realize.

Rumours suggest he could have been acquired for a 2nd round draft pick last season or earlier. It is those kinds of subtle decisions that can propel a team forward significantly at a relatively minimal cost.

If analytics could improve a team's acquisition by trade or draft by even 5% it would have a massive impact on their ability to compete given the current level of parity in the league.

Enforcers can also be moved around the roster in some situations. Usually their level of play is such that you don't really need, or at least ought not to need, data to tell you they shouldn't be playing on the top line.

The data can tell you if the enforcer brought down the level of play if he spent time on a line with five bottom-six forwards over the course of a season.

You just have to compare the far-left columns in the larger Gazid when apart and teammate when apart columns to see that he dragged down the Corsi numbers of virtually every single player he was on the ice with. The smallest sample size is 14:37 (Ben Eager), so it is a pretty fair scope.

The first point you reference, with regards to enforcers on the ice, is probably related in WOWY numbers. Check Gazdic's WOWY's and you'll virtually every player was worse with him than without.

Regarding Bourque: if we had the stats, he may have shown well in scoring chances (the old Roger Neilson system) for/against. So even though it looked like chaos, he created more than he surrendered and his coach lived with it.

I think this is what the Oilers have in mind with Justin Schultz (not that he is Bourque, just in that he will create more chances for than against). I'm not convinced of it, but it is probably the direction they are headed with him.

Yes the WOWY numbers are a good start on the enforcers, but I think the traditional argument is the mere presence of a Gadzic - even on the bench changes the equation. Those who advocate this will say "by the eye" this team plays tougher when they have a guy like that. Quantifying this may be difficult - but ultimately could be very valuable

I guess I would like to know with a guy like Ray Bourque - what the tipping point was. I'm sure at 20-23 minutes he was great; and 23-27 he was probably better than most - and a better option than what was on the bench. Clearly if a guy played 60 minutes he'd get pummelled; so at what amount of playing time is his optimal usage. If a guy knows he is going to play 35-40 minutes - does his play suffer all game long - or only late in the third ?

Hard to answer that last bit without all the biometric data. I don't know what the recovery time is on a 60-75 (+warm-up) pitch count would be.

Additionally that's further complicated by a few factors, specifically the indefinate duration of a baseball game, the extent to which additional reliever innings impacts their effectiveness (max effort vs. reserve effort), and lasty how it would affect your ability to pitch to the platoon advantage.

Definitely complicated, but I think baseball still stuck in the past. In a 15 day period a starter is probably going to pitch 300 pitches. Is 3 x 100 more effective than 4 x 75 or 5 x 50? Does taking 10 middle/late innings away from your best five pitchers every 5 days and giving them to your next 3 best guys going to hurt you ?

I guess I see it as baseball is way different today than even 30 years ago - with video scouting- computerized pitching data etc. Batters know what the pitcher is going to throw and after 2 at bats I think have a much better chance of hitting that starting pitcher after the 4th inning. I haven't researched it but would be willing to bet the on base percentage much higher on the 3rd time through the lineup. Baseball is stuck in its traditions though - they won't give the win to a starter unless he pitches 5 innings - so that perpetuates the idea that 100 pitches every 5 days is the way to go.

All teams already have "stats collectors" The Flames themselves hire a couple of collectors and then a bunch of volunteer mules who run the data to and from the executive. I'm not exactly sure what they collect (if just traditional data or modern shot data and tracking as well) but nevertheless it is done. I suspect the collection part is more low level though as pretty much anyone with a one page guide can count shots and saves, blocks, etc.
So the role you speak of is probably a none paid one is what I am saying.
And my guess is their pitch would be hey come hang out get some free food from the executive box and watch the games live for free as your pay.

The most important part of stat analysis would seem to me to be the actual stats collected. ie. the method and consistency of said collection of stats.
You suggest that the team would have volunteers counting sog etc when they would probably be more worried about counting the number of free hot dogs they eat, nevermind the fact that 5 people may just a scoring chance differently than another 5 people.
Maybe they can collect scoring chance and SOG data like amateur boxing where 3 out of 5 stats collectors have to press a button within 1/2 a second of each other.

The point I'm making is that the analysis will only be as good as the collection.