Thursday, February 02, 2006

Study: What Correlates with Winning?

The simple answer is to have the most talent and the best coaching, but I prefer to go a little deeper and pinpoint what is worth investing in and what is worth not bothering to talk about

The old 'key' mantras you've heard time and time again:"Faceoffs and Puck Possession are the key to victory!""Special Teams are key to winning""Scoring the first goal and getting the early lead is key""Taking the body will wear down the opposition and win hockey games" "What the f*****g hell is wrong with that ref??"

So, which is it? What really correlates well to winning?

With the help of Greg "Cartman" of The Puck Stops Here, I decided to take a look at team statistics for certain categories and see how well they correlated with the overall standings as of February 1st.

It's table time!

The highest correlation comes from +/-. This is obvious, as a team that scores more goals than it allows will win more hockey games. We see this in the PYTHAGOREAN standings that I produced before and this isn't the best measure to use. This is an 'after-measure', rather than measuring an actual attribute. It shouldn't be compared to the others for that reason.

I put Power Play %, Penalty Kill %, Hits, Faceoffs, Shots For and Against, and Scoring the First goal as things that teams do and try to do well at in order to win games.

The best correlation? Scoring first! I'm sure CrankyTom Benjamin would say something like "Well, the best teams often score first because they are simply better...Stupid kids, get off of my porch!!" True, but we see scoring first is really far more important to a team's fortunes than any of the other stats. In second place, Power Play success %.

You can see how the Chicago Blackhawks, the Flyers, and the Carolina Hurricanes have PK% that really don't match their overall place in the standings. There are always more successful penalty kills than successful power plays, so good teams take advantage of the opportunities they are given. Giving a crappy team like the Dinner Jackets more Power Play opportunities isn't going to hurt you nearly as much as giving the Senators more chances to bulge the twine.

Not surprisingly, faceoffs have a low correlation to winning (told ya!) and hitting is even less telling of successful teams. I can theorize that lesser teams tend to have more thuggish players and better teams tend to have more skilled players that focus on scoring rather than hitting. It's true that the HITS stat is rather unreliable from arena to arena, but good teams don't necessarily have to rack up hits.

I also notice that Atlanta ranks highly in scoring first, shots per game and PP%, yet is 23rd overall. The obvious reason is their lack of goaltending. If I could have added SV% to this table, I would have. Perhaps SV% would correlate even better than scoring the first goal.

Obviously, this study is fairly crude and could always be improved.

Cartman had suggested some refinements. This is Stat Geek territory.

1. Using Raw Numbers - "On the level of this study it probablydoesn't matter much, but if one team is way ahead or behind the rest of thepack this information gets lost in rankings (the same thing if lots of teamsare very closely grouped)."

Now, the ranking system allows each stat to be a 1-30 variable. In order to correlate a set of data, you need to have that matching variable set. How can I correlate a 80.5% with a 3 ranking. I'm not a stats wiz, so if anyone knows, let me know.

2. Covariances - "Another thing one can do is try to calculate partial covariance. If we conclude that (for example) shots per game strongly influences winning percentages and maybe hits are quite strongly correlated with shots per game and also with winning percentage, but we want to remove the influence of shots per game before measuring the correlation with hits, this can be done."

Well, I don't have that much time on my hands :). It certainly might be worth a look.

Now, I'd love some feedback from the readers. How could this study be improved? What surprises/doesn't surprise you about the results? Did you learn anything?

Stats are almost worthless regarding tomorrows game. Especially this year when they are all screwed up due to a severely uneven schedule. Every item on your chart is part of the game, and teams that rank well in several categories will be ahead in the standings, than teams that are weak in several categories. As you properly stated stats are an 'after measure' and merely describe how a team got where it is. But tomorrow is a different story. In my wednesday 'Joke of the day' there were 81 faceoffs and 21 penalties, yet 7 even strength goals. Gee, you don't suppose a game like that will falsify the stats-(+/_), PK% and PP%.

Then a close look at Wed. night games, you will find a couple of games with real flow-CBJ 2/CGY 1- only 7 minor penalties + 2 over the glass minors, only 55 faceoffs. How will that affect the stats, yet a much preferable game to watch. StL gave Det a hard time last night, yet the game had good flow despite 69 faceoffs. That will do something completely different to the stats.

I think maybe we should devote the time to:Who designed that trapezoid, and why is it backwards ? Seems obvious it should be wider at the goal line than at the boards.Why weren't the faceoff circles moved back when they moved the blue line ?

- It would be interesting to see the correlation between fighting and winning this year. In previous years it's usually negative.- I wonder how age correlates with winning. People often say you need "experience" to win. As far as I know, nobody's ever proven it.- Shots for and shots against, individually, might not have a strong correlation, but I'd imagine it would be fairly high (>.7) if you looked at shot differential.

In response to your two comments:1. Using raw numbers would be superior since it would eliminate the problems you mentioned. You'd rank teams by win% and correlate it to each of the other variables (without ranking them on a 1-to-30 scale. However, it's unlikely this would have a major impact on the results.2. It's likely that many of the variables correlate with each other, which could skew the results. To check this with Excel, paste all the variables in one spreadsheet and go "Tools, Data Analysis, Correlation". I'm not sure how you would adjust the results, but it would still be useful to better understand the behaviour of the underlying data.

In response to the previous comment:The stats might not be useful for predicting the results of an individual game, but they could be useful for a GM designing a team. For example this proves that face-offs have almost no relation to winning. This could tell a GM not to overpay for a good face-off man at the deadline.

In response to the previous comment:The stats might not be useful for predicting the results of an individual game, but they could be useful for a GM designing a team. For example this proves that face-offs have almost no relation to winning. This could tell a GM not to overpay for a good face-off man at the deadline.

Exactly! You could see Jason Strudwick score two goals in one game, and think he's a great goal scorer based on that one game. The long term trend would show otherwise.

The next time I do this study, probably at year-end, I'll see if I can add AGE, SV%, and Shot differential. It's easy to get the 1-30 rankings, but more actual work to use the raw data :)Fights? You'd assume weaker teams have more goons and tend to fight more (when losing, it's easy to get pissed off and wanna go). I don't know how I could get team-by-team fight numbers. NHL does not publish them, for obvious Bettman-related reasons.

well first off, plus/minus is NOT "scoring more goals" it is scoring more even handed goals (well and shorties too) BIG DIFFERNCE. the sob story of how well a team needs to be on special teams, has always been a sob story when it comes to the true measure of a team-- even strenth

second off, yea you need some correction as some goals are counted three times :0) correlate a shortie first goal.

ya got a shot (yea!) a first goal (yea!) AND a plus. (or 4 if you are doing team toal pluses instead of average =/- for a team's players)

dude you just won the game with the first shot! lol

i'd also prefer the two shots merge into only one column called shot differential.

as for other nasties, checking? going to corelate low, and i don't need to see the stats to know that.

"out checking" pah, you can't throw a legal check when you got possesion. you ain't got possesion? you ain't scoring. remember 100% of all shots not taken never score!

possible other nicities would be, puck control factors.

does the team who controls the pucks in their own offensive zone win most times? i'd bet there would be your highest correlation.

if defense does win championships, what would correspond to defensive stats worth measuring?

their ain't none, even fantasy has given up and generally uses "penalty minutes" as a POSITIVE to measure grit and meaness, and make scott stevens worth owning..

Some people appear to be quite strongly against statistical analysis of hockey and make some pretty negative posts after a study is attempted.

Its obvious to me that no study will ever fully understand every nuance of hockey from its statistics. In part this is because much of what happens in a hockey game is not captured in statistics at all. Nevertheless, I see it as a good thing to attempt study to see what if anything can be determined.

Some questions that had been brought up in the blogosphere were addressed here. For example, there was discussion as to whether or not faceoffs are a meaningful statistic. Here Jes attempted to quantify just how meaningful they are.

I definitely think that SV% is something you should look at... much like ERA for pitchers in baseball, the game of hockey really begins and ends with the goalies. I was not surprised that 1st goal was the most highly correlated as it confirms the optimal strategy of grab a lead, play defense and counter-attack your opponents mistakes.. the Lemaire thing.

It might also be interesting to look at Power Play differential/PP% and wins. As in, "Is there a high correlation between teams that score a lot on the PP along with yielding fewer chances to the opposition and winning games."

I stumbled across a really impressive analysis of the RTSS stats here. It's somewhat raw, but it gives an idea how events affect the score over time - something I have yet to see in other statistical analyses.

Regarding faceoffs: everyone likes to point out how irrelevent they are (correlating them with wins, pointing out the difference between the average and the best is small, etc). There are a couple of things that make them slightly more important, IMO:1) Situational wins - Some faceoffs are worth more than others. Winning the big ones should have a stronger correlation to winning the game than winning, say, the opening faceoff.2) Setting up FO wins - Related to #1. I once read an article about Yanic Perreault in which he said he would often 'shark' his opponent, giving up unimportant face-offs and setting up a win in an important one. It's conceivable that you could have a losing face-off% and make a significant contribution by winning the right ones. It can be a really interesting chess match within the game.

Cranky Tom Benjamin would say something like "Well, the best teams often score first because they are simply better...Stupid kids, get off of my porch!!" True, but we see scoring first is really far more important to a team's fortunes than any of the other stats. In second place, Power Play success %.

Well, no. Cranky Tom Benjamin would say that scoring any goal will correlate with winning just as well as scoring the first one.

How does scoring the second goal correlate with winning? Scoring the third goal? I think you will find that every goal correlates at about the same rate.