November 28, 2006

Individual Player Contributions

I've been working on many methods to scale out the line effects, this one appears to be the most promising at this point. Not to make any comparisons, but David Johnson has created one as well.

Method

I assume players scores are added linearly and that the line effects are primarily caused by pairs (this method can be extended to 3's, 4's or even 5's). Before going into the details I'll provide some quick short had I12 = I12 = total ice time player 1 and player 2 spent together, S1 = score - goals/hour for player 1, G12 = G21 - expected goals (based on shots for/against) while player 1 and player 2 were on the ice. Now I assume that I12*(S1+S2)/2 ~ G12 that is to say that the average between the two players goals per hour multiplied by ice time should be approximately the number of goals for/against. So if one player is very defensive and the other is terrible defensively they should be average defensively together. Now I know G12, that is to say I know how many (expected) goals for and against for any combination of players and I also know how much ice time every player has spent with any pair. But, I do not know S1 and S2, these are the individual scoring statistics (units: goals/hour). Now depending on the team there are 30 or so players who have played a game with the team so there are 30!/(28!*2!) = 435 equations, with 30 unknowns. Using these variables I can simply calculate the coefficients using a regression (no constant though). I wrote my own regression code for this matrix and as such I don't have error details: I don't know how well it performs.

Benifits:

The benefit is that this algorithm will not alter the actual statistics significantly so if, for example, one Sedin has 1 extra goal compared to the other they will still be rated equally.

It also allows significantly different scores for players who do spend significant time together given significant scoring differences, due to the fact that: a lower S1 can be made up by a higher S2.

It doesn't chase low minute players statistics as the coefficients will be small and will have a smaller squared error.

The scoring rates solutions (S1, S2) aren't very comparable between teams or even fully understandable how they got there.

Since S1 and S2 aren't very logical, this leaves me multiplying by ice time to get an approximate "individual plus minus" statistic.

I haven't tested it with past data and this season doesn't have enough data to make these results anything but full of problems, but I primarily posting this for a reader response, that is to say, for people to criticize or compliment the results to see if I should continue this development. I find it interesting to see the offensive numbers (Plus) vs. defensive numbers (Minus). D just calculates the difference between them. I'm just posting the Northwest division to start with.

3 comments:

1 - It's possible that some pairs may be greater than the sum of their parts (i.e. a set-up guy and a finisher will produce more than a two set-up guys). Maybe it's not linear. Linear is nice for it's simplicity, though.

2 - Since shots for/against are the main criteria for productivity (understandable, to remove goaltending from the equation), is there a risk that a 'shoot-from-anywhere' type of player might be overstated?

1 - Technically speaking a nonlinear model is ideal, and I hope to do this in the future. As with all non-linear math it's a lot more complicated. Assuming coaches aren't stupid (putting two set-up guys together with no finisher) then this is reasonably accurate, but we all know coaches. Of course the more combinations of variables you use the less accurate the resulting values will be (you'll be chasing the random variations). So I'll have to try and find a balance...

2 - Yes. My best example here is Josh Green [top ranked offensively, but no goals this year?]. This guy keeps taking shots and none go in. The problem here is telling what is random variation and who can't score. The shot quality stuff scales out a lot of problems, but it can't make up for players who simply can't score...

Is Rycroft really the best player on Colorado? Is Iginla really the second worst player on Calgary? Is Hamrlik the 4th worst and Phaneuf the 6th worst? Not sure about that.

I briefly looked at using shots in my rankings but seemed to get worse results so I am not sure how good they are. Maybe if you incorporated your shot quality stuff into the process the results will be better.

Another thing I would really try to do is make players on different teams comparable. One of the biggest problems with +/- is you cannot compare players across teams because +/- is so team dependent. Producing a single number that can do a decent job in comparing players on different teams who play with and against different players and in different situations was the main goal in creating my rankings.