Welcome back to this weeks Journal of Hockey Analytics to help with all your boring Mondays at work, and all your future hockey analytics research. In this week we look at how simple drafting rules can beat scouts, reviewing some raw season long zone entry data and even some junior level analytics to analyze Griffin Reinhart.

For all your links, continue past the jump!

I won't spend too long on this but over the weekend it was learned that Tore Purdy, aka JLikens of Objective NHL had passed away. He was one of the pioneers of Hockey Analytics who had done great initial work building up the ideas that we use in our day to day work now. Many people on twitter have been paying their respects and summary of the situation have been quite well written by The Edmonton Journal, Hockey Buzz and Puck Prediction.

On to this week's analytics work:

The most interesting article of the week came from Rhys J who compares the results of the Vancouver Canucks scouting staff over the last decade compared to extremely simple drafting rules. The results are very surprising [Canucks Army]

Cam Charron explains why the simple rules can beat scouts and gives some interesting opinions on the subject [Canucks Army]

Unhappy with the method that Rhys use, Daniel Wagner re-ran the experiment but this time making his selections based on the CSS rankings [Pass it to Bulis]

Garik of Lighthouse hockey watched every NY Islanders game of the season and has released his data on the Islanders Neutral Zone play. He has made his raw data available here for your future work if interested. [Lighthouse Hockey]

Similarly, Sens fan Manny has released similar data in raw google spreadsheet format for the Ottawa Senators. It includes zone entry stats, raw game data with every 5v5 in/out of zone entry/exit and game stats for team and players and season total. The info includes individual and on-ice player stats including one time shares and avg. time in zone per entry of both types. A lot of hard work was done by him. [Google Docs]

Neil Paine looks at which NHL goalie has been the hottest during these play offs [FIveThirtyEight]

Garik uses some CHL fancystats to try and predict what type of player Griffin Reinhart may be (if at all) [Lighthouse Hockey]

Eric Tulsky looks at neutral zone play and why Chicago is having difficulty against L.A. [FiveThirtyEight]

While Sean Mcindoe is more known to be a comedy writer, he has done a great job in combining analytics with humour. Most recently he looks at how long the Kings/Blackhawks can sustain their strong core [Grantland]

Anton Stralman is touted as a really good player putting up great corsi but never scores [mc79hockey]

Looking at basketball (the same ideas can apply to hockey) Nate Silver looks at when you should sign a basketball player to a max contract [FiveThiryEight]

This isn't so much a problem with the online hockey analytics community, but rather with the secret statistics offices. The willingness to share research data Is related to the strength of the evidence and the quality of reporting of statistical results [PLOS|one]

The Montreal Canadians use numbers to show that Galchenyuk has been "clutch" within the playoffs [Canadiens.com]

Garret Hohl asks if you should judge Dustin Byfuglien on his Corsi% or his Goal Differential [Arctic Ice Hockey]

Scott Cullen uses some analytics in his piece on the Islanders signing Halak, nice to see those being used in main-stream media pieces [TSN]

Byron Bader analyzes how teams have done at drafting through the different rounds over the year [Flames Nation]

I don't typically want to link to opinion pieces since a lot of the arguments repeat themselves every time and little new comes out of it. There are a few opinion pieces that kept coming up this week so I decided to include them.

While not talking about hockey, but rather the use of fans and stats within baseball, Bob Ryan makes the argument that the average fan just doesn't care about stats. It has some good carry over to hockey [Boston Globe]

Eric Fingerhut analyzes a Washington Post article by Neil Greenburg on the interest in fancystats and sports [The Fingerman]

These anti-stat columns have been around for as long as stats have been around. This historical piece from the 50s in Sports Illustrated was writing about how stats were ruining baseball [Sports Illustrated]

I will be away for work this summer in Europe and thus with the time change it will make things difficult for me to keep up to date with twitter. If you write anything about hockey analytics, or you have seen anything interesting please send it my way so I can keep these updated throughout the summer!

I am a Van Fan in Bytown. Living in Ottawa for work, I research Sports Analytics and Machine Learning at the University of Ottawa. I play hockey as well as a timbit but I compete in rowing with hopes of 2016 Olympic Gold. Follow me on twitter at @joshweissbock and feel free to send any questions or comments my way.

I'd like to to know what teams beat the potato. The Nucks are low hanging fruit as they have the worst drafting record of anybody. What about the Detroits and the California teams. We know how Sutter did :-(

For giggles I went and made my own fake scouting director for Vancouver (since that was the team in question)... I don't have that much time on my hands to do all the years but I did a redraft of 2000 with my own fake intern. I call him "Lazy Cheapskate" his marching orders were thus... we're cheap and don't want to spend any money on in-house scouts so we're just going to use CSS rankings of N.A. skaters (because goalies are voodoo and we're also ethnocentric and don't want guys from Euro leagues) as our list. The results...

Thanks for posting that. I like CA more than most hockey blogs but between the total lack of moderation to get rid of the conspiracy theorists and juvenile trolls and the idiotic doubling down on this "simple drafting method" it's been a little sad.

The flaw in the Sham model is really too bad, because it undermines what I think is a very good point.

I also think the flaw would be pretty easy to correct. A rule like "of the next 15 players on the ISS NA rankings, select the one with the most points per game" would probably produce solid results. And that's before you get into other pretty simple measures like controlling for scoring levels across the CHL leagues and mixing in defensemen.

Except running it for all 30 teams would completely defeat the point. If you're looking for an edge over the other 29 teams, you're going to do something differently from them, pretty much by definition.

For giggles I went and made my own fake scouting director for Vancouver (since that was the team in question)... I don't have that much time on my hands to do all the years but I did a redraft of 2000 with my own fake intern. I call him "Lazy Cheapskate" his marching orders were thus... we're cheap and don't want to spend any money on in-house scouts so we're just going to use CSS rankings of N.A. skaters (because goalies are voodoo and we're also ethnocentric and don't want guys from Euro leagues) as our list. The results...

Right, but the PitB model JUST used the CSS rankings, which I think are prone to a lot of different biases, especially towards big players.

My proposal was basically a mash-up of Sham and the CSS. At each spot, look at the next 15 players on the CSS rankings, and take the one with the most points per game. So for example if the top 5 CSS players were gone, you'd pick the highest scoring player from spots 6-20.

I really think a basic system like that (with tweaks here and there) would produce really strong results.

Was it really that effective a thought experiment though? Was it revelatory that the Canucks drafting has been historically bad? I think it would have been an interesting exercise to me if it showed a method that said "here's who we missed because of our reliance on guesswork and hunches". Statistics are already utilized in scouting and drafting, just not advanced ones and not particularly well. I've always hated the hunches -- let's take Libor Polasek because he's huge or Antoski because he fought Lindros to a standstill or Patrick White because he was a finalist for the Minnesota Mr. Hockey award or Honzik because...no idea. But this whole exercise still seems lazy to me. Critiquing the current system means more than just assuming that all current scouts do is play a hunch. That is setting up as much of a straw man as saying that all analytics can do is provide a sterile numbers-based approach.

Sure, I mean, as I said I thought the flaws in the method were unfortunate and undermined the larger point.

But I still think the concept of applying a basic drafting model to the Canucks, and showing that it can greatly surpass their actual record, can be very helpful in showing just how flawed the team's drafting record has been.

Except running it for all 30 teams would completely defeat the point. If you're looking for an edge over the other 29 teams, you're going to do something differently from them, pretty much by definition.

The 30 teams argument is a non-sequitur.

No I mean run an "all else being equal" simulation for all 30 teams independently.

For example, let's just say that Sham was going to take the highest scoring draft eligible CHL forward remaining on the board.

So when Sham is pretending to be the Canucks GM, he simply selects the highest scoring junior forward available.

And when Sham is pretending to be the Predators GM, he selects the highest scoring junior forward available.

That would mean no Jones, Ellis, Josi, Weber, suter & Hamhuis among others...

And when Sham is pretending to be the Red Wings GM, he selects the highest scoring junior forward available.

That would mean no Datsyuk, Zetterberg, Franzen. The overwhelming majority of gems that the organization has found in the last couple of decades.

Can Sham beat more than a handful of the 30 NHL teams?

Because if he can't, it's hardly evidence that a simple formula can beat a complex scouting system or some other such nonsense...

Right, but the PitB model JUST used the CSS rankings, which I think are prone to a lot of different biases, especially towards big players.

My proposal was basically a mash-up of Sham and the CSS. At each spot, look at the next 15 players on the CSS rankings, and take the one with the most points per game. So for example if the top 5 CSS players were gone, you'd pick the highest scoring player from spots 6-20.

I really think a basic system like that (with tweaks here and there) would produce really strong results.

First, I suspect the Sham model would beat more than a handful of teams.

Secondly, obviously the European and defensemen issues are why no one would ever seriously consider adopting Sham outright. The D issue, though, would be relatively easy to address - just value each defenseman point as worth, say, 1.3 of a forward point. (In other words, a d-man who scored 60 points would get credit for 78 points, levelling the playing field between forwards and D. 1.3 might not be the right number, but you get the idea.)

Finally, make the simple change I suggested above, and allow the model to select from a range on the CSS rankings.

I'd be willing to bet that a model like that would outperform a large portion of the league.

Was it really that effective a thought experiment though? Was it revelatory that the Canucks drafting has been historically bad? I think it would have been an interesting exercise to me if it showed a method that said "here's who we missed because of our reliance on guesswork and hunches". Statistics are already utilized in scouting and drafting, just not advanced ones and not particularly well. I've always hated the hunches -- let's take Libor Polasek because he's huge or Antoski because he fought Lindros to a standstill or Patrick White because he was a finalist for the Minnesota Mr. Hockey award or Honzik because...no idea. But this whole exercise still seems lazy to me. Critiquing the current system means more than just assuming that all current scouts do is play a hunch. That is setting up as much of a straw man as saying that all analytics can do is provide a sterile numbers-based approach.

To be honest, I actually don't think the Canucks' drafting from 1999 - 2006 (or even 1998 - 2007) was particularly bad (relative to 29 other teams) considering the draft position and the asterisk that has to be put next to Bourdon (RIP).

By and large this was the foundation for the best Canucks team ever and there were quite a few Canuck selections that COULD have been part of the 2011 team:

More than 15? No idea for the Sham model, which was basically cheating anyways.

But yeah, I'm fairly confident that a slightly more rigorous, but still basically simple system could top 15 teams.

I've never seen any in depth statistical analysis of drafting from Europe, or US high schools or the USNDTP, but would be interested to see it attempted. You definitely have to rely much more heavily on scouting for those areas.

"I'm fairly confident that a slightly more rigorous, but still basically simple system could top 15 teams."

I'm skeptical.

Not because teams don't miss a lot (they absolutely do) for some seemingly foolish reasons (such as gritty selections in the Bryan Allen/Luke Schenn mold).

Mostly because, on the whole, I suspect teams do well enough on the easier-to-predict players that tear up junior hockey and, even with the misses, make up for it with the harder to predict gems taken later.

Huh, how about that... saves me a lot of time and switching between sheets on excel.

And demonstrates the point... some teams are so poor at scouting that they may have well saved money and not bothered doing any themselves. I mean if someone using free universal resources (that don't even include half the hockey playing world) can do better then your paid exclusive resources then your doin' it wrong.

More than 15? No idea for the Sham model, which was basically cheating anyways.

But yeah, I'm fairly confident that a slightly more rigorous, but still basically simple system could top 15 teams.

I've never seen any in depth statistical analysis of drafting from Europe, or US high schools or the USNDTP, but would be interested to see it attempted. You definitely have to rely much more heavily on scouting for those areas.

I would be interested in seeing the results of this.

Tweet C/Sham, and see if you two can collaborate and post it on CA.
NMOO might even be able to get in on it to make sure it passes his standards for relevance.

More than 15? No idea for the Sham model, which was basically cheating anyways.

But yeah, I'm fairly confident that a slightly more rigorous, but still basically simple system could top 15 teams.

I've never seen any in depth statistical analysis of drafting from Europe, or US high schools or the USNDTP, but would be interested to see it attempted. You definitely have to rely much more heavily on scouting for those areas.

Actually,
pop me an email Jamie.. I'd be up for collaborating.
(anyone else that wants to help out can too)
austeane @gmail.com

I've been talking to Canucks management, and this would be an interesting thing to show them.

You know I think the actual tongue-in-cheek part of the original post -- that the Canucks or other NHL team drafting is so haphazard or flawed that you might as well use a nonsensical method -- is a good one. But I think it's being taken up quite literally and seriously and to do that I think it's not only that the "method" is problematic, it's that there has to be a much more rigorous critique of the existing system(s) to establish a baseline against which we can compare the actual draft records if we're going to make this make any sense. I agree that there's value to this but we can't start off with the assumption that the only way that teams currently draft is on the basis of gut instinct.

I think teams have all kinds of strategies going into drafts and they have different methods of valuing player potential. Right now we're talking about drafting outcomes in the most basic (and I'd argue the most useless) of ways -- not only to look only at a small group of scoring forwards from only one of the available pools tells us very little. And focusing only on points tells us nothing about the various skills that you need to actually ice a competitive team. Adjusting the defense value by 1.3 as someone suggested does nothing to evaluate the potential for a shut-down defenseman. Basing the potential on age 17 points also tells us nothing about growth spurts or skill development of late bloomers like Tanev or Lack, for example. Using the Sham method to me leaves you with a team like the Oilers and we know how successful they've been at collecting shiny point producers.

I think a lot of team are terrible at drafting. But it's not necessarily because they are doing everything on gut instinct. They might be like Gillis and drafting according to a certain rhyme (with little reason). They might be sticking to a particular pipeline they like. It would be good to know what needs to be replaced rather than coming up with a whole new method that might be just as half-baked. I mean I could say that tossing all the draft-eligible names into a hat and picking that way would be better than the draft records of many a team.

@NM00, Jamie, & others: Great discussion! This is really interesting. I still think it's important to say that Rhys' original point is useful because it brought up a baseline to evaluate your scouting, not replace it. Very valid criticisms now shown by PITB and Churko. I like the idea of some points/CSS/ISS mash-up for the baseline.

I think one point that you need to add to your evaluation is one that the Bourdon (RIP) and maybe Sauve picks bring up: luck and draft position. Meaning: whatever system you come up with, you have to run it on 30 teams and then come up with a standard deviation of games played or points, or some other metric. Over the last 13 drafts, what is the average GP/points a team has gotten out of its drafting, and what was expected based on their draft position. Then you can know if some scouting group is really underperforming.

Btw, in evaluating the NHL careers of draft picks, it would probably be best to go with TOI, somehow normalizing between dmen and forwards. Or at least a composite metric of TOI, points, and GP. Someone once noted that coaches are pretty reliable experts on players, and who they give ice-time to on the aggregate is a good indication of the talent of those players.

For all the criticism of the Canucks scouts, I would like to see the CA staff call their shot on this upcoming draft.

Its pretty simple. On or before the draft come up with CA's ranked list of players using whatever rudimentary method they decide.
Then after the draft, take each Canucks pick and say who the Canucks should have taken instead.

Then keep a running tally of the relevant stats or progress to compare each set of selections.

Maybe make it even more interesting by agreeing to make an annual donation of $1 to the Canuck for Kids fund for every extra game played by your picks versus the Canucks picks for that season.

Seems like material to fill articles for years to come. You're welcome.

Rhys' original point is that some algorithm that is purposefully built to be a bad evaluator, or only take one thing into consideration, should not beat a paid team of scouts, along with the GM's input. A baseline to see whether scouts+GM are looking for the right thing or only coke machines that hopefully can skate. It's okay to suspect that the Canucks have been below average at drafting, but it's entirely another thing to figure out how to show it with data, instead of just believing in the narrative.

Now, Rhys' original algorithm has been smartly criticized for sneaking in scouting without admitting it. Fair enough. A better algorithm against expected performance would offer a better baseline.

Rhys' choice of 17 year old scoring is based on a couple of assumptions that he's written about before: players that are good at scoring are probably good at a lot of the aspects of the game. (Remember, this is one of his concerns with the Horvat pick.) The second assumption is one that Cam Charron also talked about in his follow up post. GMs/scouts *might* put too much stock in "good character" plus a bunch of measurables like size and speed. They're asking, are NHL teams over-valuing these and not valuing enough how well the player plays?

These are really good questions. Nothing absurd about the intent, even if you disagree with the specifics.

Thought-provoking? Rhys did a good job of bringing up his points. His post has already led to some good discussion here, and two follow up posts by other Canucks' bloggers. There's a bunch more analytic bloggers outside of Canucksland who have read it, and who knows, maybe will follow up with some good better algorithms. And, judging by twitter, Rhys and Josh themselves are planning a follow up with a more complex algorithm in response to the criticism.

So, the proof is in the pudding: Rhys was thought-provoking enough to provoke a lot of good discussion.

"Rhys' original point is that some algorithm that is purposefully built to be a bad evaluator, or only take one thing into consideration, should not beat a paid team of scouts, along with the GM's input."

Disagree.

Flaws in the method aside, is it really surprising that a system (any system) can beat 1 of 30 teams?

While I agree that a lot of good discussion has come out of this, by and large it has been based on criticisms of the methodology.

"is it really surprising that a system (any system) can beat 1 of 30 teams?

Well... yes. How much do you imagine a NHL team spends on scouting? Let's say $1,000,000.00 per year, seems reasonable when you take into account salaries and expenses, now take that number and multiple it by 10 (for a ten year period... 2000-2009 since those are the years that you could expect to see results from) that's $10,000,000.00 total spent... if your investing ten million dollars in something it should be able to beat a system that costs $0.00. Sure any system could get lucky and beat out a team over a single draft but 10 drafts? That shouldn't happen.