Weighing Fantasy Stats

In lots of leagues winning one category is as good as winning any other, but you gotta know some are more equal than others. Imagine bored as hell Marty McFly slips you back next year’s NHL stats. After you call your bookie (is that how that works?) you start thinking about your fantasy team, cuz you’re a weirdo. You’ve got the stats, but how do you balance them?

Pairwise Comparison

Fil Salustri‘s design site has an introduction to pairwise comparison (PWC), but tl;dr, you compare everything to everything else to establish relative importance. Then you use linear proportions (or whatever you want) to establish weights. If you go that route you should also add weight to the bottom ranked value since it’ll be zero unless there’s a tie, and if any criteria are actually valueless they should be omitted 1.

A worked example of pairwise comparison. Criterion 5 should be given a small boost since by virtue of inclusion it isn’t useless.

We’re only trying to get reasonable weights that outperform gut feelings. Pairwise comparison is great because it only needs ordinal, qualitative information to produce reasonable results. For a number of criteria, n, linear proportionality will give a max weight of 2/n, which is probably low for a lot of applications, but not unreasonable. I abuse the system by stringing several together, but I think the principle holds.

One decision I made was to not allow ties. Since throughout the project I used R2 relationships to decide “matchups”, I settled on a single, universal bias toward differentiation, rather than making just as questionable calls about what is and isn’t a tie.

System diagram for the project

Criteria

Three major points of differentiation stand out in fantasy stats: cost, predictability, and how they translate into wins.

Cost

Certain stats are harder to come by. They’re the big numbers people look for in the draft and you won’t find them on the waiver wire. Even if we found that penalties are key to winning fantasy leagues we would still pursue stats that are harder to come by, since we might not get a chance later. Accounting for cost also approximates trade value, and incorporates the wisdom of the crowd, which shouldn’t be ignored.

Note that this is a separate problem from preventing premature reaching for undervalued players. Weights should be applied before a reach-preventing measure is implemented. Weights fall more into player valuation than draft strategy.

We can sidestep the problem of modelling cost because pairwise comparison gives reasonable weights with qualitative data only. Instead, I quantified cost by checking the R2 of publicly available draft ranks, using last year’s stats as independent variables.

The column of alternate weights is proportional to the R2 of each stat. I would be comfortable using that, I just don’t like the sensitivity to the quality of data. That’s part of the reason I used NHL.com rankings, instead of a more popular source like Dobber (who only post public prospect and keeper rankings anyway): for PWC it shouldn’t matter much, we only need relative results. The differences aren’t that big and once we include all the factors in our final weights the differences should be negligible. There’s no concrete justification for going through the extra steps of PWC, but I use it as an exercise.

Goals are lower than your gut might say, but last year over 80 players put 20 to 30 pucks in the net, which puts you a lot less than a goal per week behind Steve Stamkos and Sidney Crosby. For goalies, GAA sitting above wins is surprising, but it’s a valuable insight into how goalies (*cough*Jonathan Quick) are valued. Passing the sniff test, powerplay points are clustered among the quickest to go players and peripherals like +/-, PIM, and hits are cheap.

Win Translation

Variance makes sports interesting. The better team doesn’t win every time. Predicting a fantasy category win is a lot easier. The team that scores more goals over the season still won’t win every week, but some stats are more reliable than others. I’m not 100% on what to call this characteristic, but I think it’s the most promising opportunity for exploiting market inefficiencies.

I pulled a small sample (two 10 team leagues) of fantasy team stats and category records and grabbed R2s again.

No shock, high event categories are more predictive. Goalie wins are the notable exception, because wins are limited to the number of starts; a dark horse can’t come up and win 6 games in a week. The other three goalie stats are weak indicators of wins, juxtaposed with the cost we saw GAA carry, while cheap as dirt hits and penalty minutes are worthwhile. +1 for Matt Martin.

To over-explain: I’m not saying draft for hits. I’m saying you should account for week to week variance when building a model, and that’s a criteria hits does really well in.

Predictability

So far we’ve assumed perfect data, via Time Bandits or whatever, but that isn’t what we have to work with. I pulled up my last year’s league-wide projections (so those R2 are embarrassing). This is a spot where PWC excels: I’m not trying to pull out how good I am at predicting any specific stat; I want to know how hard in general different stats are to predict, irrespective of my model’s success/failure.

No surprise seeing +/- at the bottom 2 and powerplay points at the top. Again, GAA is an expensive stat to pursue, with unsure outcomes. The personal shock: I predicted assists a hair better than shots, since I always thought shots were easiest. This may have been an outlier year, but this is the empirical data I’ve got.

Weighing the Attributes

Here’s where I start stretching pairwise comparison: weighing weights. Now that we have an idea of predictability, translation, and cost, how important are each of those factors? And how about the fact every category ends up being worth the same? I’ve measured each stat on three axis, now to weigh those axis.

I believe projectability is the most important quality; if you can’t predict a stat you can’t make good decisions about it. Win translation is more important than cost, because wins are what we’re looking for: draft strategies and transactions are subservient to that. And, since all categories are scored equally, we should prioritize difficult to obtain ones.

I added a second column, adding an extra point to each attribute and normalizing to get a more uniform set of weights. I had an iteration where I got 0.35, 0.30, 0.25 and 0.10, but at some point I lost that breakdown and I haven’t been able to reconstruct my thought process, but those don’t seem unreasonable.

Position Adjustment and Results

It’s alarming seeing hits and penalty minutes so high, but it’s hard to justify why you would prefer goals: they’re harder to predict, don’t translate as well to wins, and aren’t worth any extra. Meanwhile, my good friend who only picks plus players is handicapping himself badly.

There is one major factor that hasn’t been accounted for yet: balance between goalies and skaters. This analysis has been position-agnostic, but these raw weights put Braden Holtby in the same bin as Nick Bjugstad and Mike Green. A typical fantasy team will have about five times as many skaters as goalies, contributing to only a couple more categories. A reasonable correction is to divide the categories by roster spots. If 6 skater categories are spread across 10 lineup spots, and 4 goalie categories are contributed by 2 guys a goalie stat is worth 3.33 skater stats. That number changes with a league’s roster rules, but is easy to adjust and justify.

I took last year’s stats, converted them to standard deviations, and applied the weights. This is purely dummy data to look at the split between goalies and skaters. Goalies are ranked a lot higher in this model than is usually accepted but I don’t think it’s worrisome: my dummy data has disadvantages versus real draft rankings:

1. It would be unreasonable to project a realistic spread of goalie outcomes: save percentages are so unstable over a season that expecting any goalie to fly above the pack is ridiculous. You can make educated guesses based on career rates, but any model that expects 48 wins from Braden Holtby and .926 from Ben Bishop is over-confident, and don’t forget about Carey Price’s knee lubricating P.K. Subban’s transition to Nashville.2. There are no adjustments for replacement level. Raw output is less important than marginal improvement over more easily obtained players.3. There is no mechanism to prevent reaching. There are lots of simple to implement strategies to avoid picking a player who’ll be around on your next turn. None of those are in effect here.

These lists don’t look too close to Yahoo’s end-of-year Season Total rankings, but they do resemble Yahoo’s MVP list, which captures players who have disproportionate presence on top public league teams. That list is full of the expected breakout stars (Shayne Gostisbehere, Artemi Panarin, Evgeny Kuznetsov), but puts Brent Burns as #1, ranks several goalies much higher than the Season Total rankings, and includes Oliver Ekman-Larsson, who season totals puts all the way down at 29th. That’s a weak indicator something is being tapped into.

Conclusions

Not all stats are equal, even if every category is worth the same number of points. Pairwise comparison is a decent approach to systematic weights. Your confidence in your projections should form the lion share of the weight, followed by how the stat translates to category wins, and then by the consensus desirability. Win translation offers the most promise of exploiting inefficiencies.

Assists, powerplay points, and shots on goal do well across these metrics and you should aggressively pursue them. The three common non-sum stats–goals against average, save percentage, and plus-minus–perform very poorly, but once the divide between skaters and goalies is taken into account goalie stats–and especially goalie wins and shutouts–are much more valuable. Goals are less desirable on their own than grit peripherals, but a GM seeking powerplay points and shots will bring in goals without trying.

Although the specific weights I found aren’t directly applicable to most leagues, this process should give useful weights in any category-based fantasy league.

1 Since I am weighing on multiple axis I didn’t bother with this step: letting an attribute score zero on an axis is less of a sin, since it’s importance will show up in other axis. ^