Introduction

In my last article I was looking at prediction limits for machine learning and sports. More specifically I answered the question of how much of the standings are because of luck (aka, random chance, stochastic process etc). By using classic test theory and looking at the variance of the observed win percentage over 7 seasons between 2005-2006 and 2011-2012 and comparing it to a theoretical league where the level of teams talents are normally distributed. From this we were able to conclude that luck explains ~38% of the variance in the standings.

This is interesting, much higher than one might initial think, but it makes sense and I will discuss this further on. As my area of research is in Machine Learning and using Machine Learning to make predictions in hockey. What I am curious today is to answer the question, is there a theoretical limit to predictions we can make in hockey?

Background

This subject is not totally brand new, it was looked at before by a guy wanting to know the theoretical limit for predictions in the NFL. His results found "The actual observed distribution of win-loss records in the NFL is indistinguishable from a league in which 52.5% of the games are decided at random and not by the comparative strength of each opponent." I a similar method as the author to calculate the prediction limit in the NHL.

Data & Methodology

Using the work from the first part of this, I have an observed standard devation (SD) of win % of 0.09. This is looking at the win loss records of all teams between the 2005-2006 and 2011-2012 for a total of seven seasons. I also looked at the SD of an "all skill" league where the better team always wins and an "all luck" league where each game is a 50/50 chance of winning. To determine these I used a 10,000 iteration monte carlo method. You can see the code for the monte carlo here. On each iteration each team was given a random strength and a full schedule was run. After all the iterations the SD of win% for the "all skill" league was 0.3 and for the "all luck" league 0.053. I used an F-Test to compare how similar they are to the observed league and I got p=0.02 for the "all luck" league and p=4.8x10^-16 for the "all skill" league. I graphed them below to give a visualization.

They are not close, the "all luck" league appears to be more similar than the "all skill" league. So to figure it out I modified the monte carlo and tried varying degreess of luck and skill (i.e. 10% luck, 90% skill). The rule used was: if rand() < luck then game is has 50% chance of winning; else better team wins. I kept trying various percentages until I found one that was closest to the observed NHL. I tried 50% skill, 75% luck; 25% skill, 75% luck, 23% skill, 77% luck; and 24% skill, 76% luck. Their SDs and F-Test values respectively are: 0.1584 (p=0.002), 0.0923 (p=0.908), 0.874 (p=0.894) and 0.898 (p=0.992). I've graphed the distributions below:

Results

After trying the Monte Carlo method of these leagues and comparing them to the observed league it appears that the 24% skill league is the closest to the NHL observed league. To put this in the same words as the original author to avoid confusion and misinterpretation:

The actual observed distribution of win-loss records in the NHL is indistinguishable from a league in which 76% of the games are decided at random and not by the comparative strength of each opponent.

To relate this to theoretical limits in prediction we know that 24% of outcomes are determined by the better team, and 76% are by luck. With luck you win half the time. This would suggest the theoretical limit in prediction for hockey is 24% + (76%/2) = 62%.

Discussion & Conclusion

62% seems low but it makes sense. If we look at a hockey game there are very low events (goals) (unless you're watching a Timbit league). A single goal can make all of the difference between winning a game and lossing it. In my own Machine Learning experiments the best I have ever gotten is 59.3% which is close to the theoretical limit. I haven't seen hockey in machine learning before so I have nothing to compare it to.

The original author gets results of approximately 76% which falls in line with the football predictions in machine learning. I would hypothesize that basketball would have an ever higher limit with the large number of events in a single game (200-250 points a game) and that machine learning can predict basketball in the 80s. Tennis has a large number of events in a game so I would assume a similar prediction and soccer would be interesting to look at its limit too.

Credits

Thanks to the people who helped me with this: Michael Guerault, Adam Kubaryk and Patrick D.

Previous Work

I am a Van Fan in Bytown. Living in Ottawa for work, I research Sports Analytics and Machine Learning at the University of Ottawa. I play hockey as well as a timbit but I compete in rowing with hopes of 2016 Olympic Gold. Follow me on twitter at @joshweissbock and feel free to send any questions or comments my way.

Just to be clear, a specific team in a specific season, may not be governed by 24% skill/luck limit but some other value. And so some teams may have higher talent or be better skilled in any given year. However, the league wide distribution over the long term indicates this is the average.

The question is how do we map this finding onto a specific team in a specific year?

Interesting thought... I wonder if luck plays a lesser role when the relative strength of teams is further apart? Like if 1 were to run a similar analysis of playoff game outcomes, would luck be far less of a factor in 1 vs 8 match ups instead of 4 vs 5?

Two additional points...The amount of luck is higher in the regular season thanks to Gary's goofy loser point..as studies have shown shootouts are essential random.But luck in the playoffs increases due to the closer spread of skill...it would be interesting to see how the two what the top linitvis for playoffs? My guess ~56%...getting close to a flip of the coin..

You ever read some of Tom Tango's stuff on this? He identified tennis and basketball as being highly predictable. He suggested that the NBA should consider either shortening games or reducing the number of games in a regular season to give chance a bigger role, IIRC - the idea being that nobody cares about a league in which you can be pretty certain about who will win. Women's tennis suffers from this acutely - if the expanded matches to best of five sets, you'd lose a ton of randomness and make it incredibly predictable.

It's a kind of cool question from a structural perspective - what's the optimal quantity of chance and skill from an entertainment perspective?

The problem that you and others make in these kinds of analyses is to assume away the impact of home ice (or field or court) advantage. In leagues with high parity, a fair amount of what you assign as luck will be no more than a home team advantage.

Let's say, for instance, that home ice advantage is five goals for the home team (just trying to keep the math simple), and that you have 4 teams in a league. Each team's "true skill" level is defined by how many goals they would beat an average opponent by on neutral ice (negative means they would lose).

A 1
B 2
C -1
D -2

So, if results are entirely deterministic via skill and home ice with no luck involved, if A plays B on A's home ice, then the result is 1 - 2 + 5 = 4. So A wins by 4 goals despite having lower skill than B. But this wasn't lucky, it was deterministic.

If each team played the other teams both home and away, and the results were based ENTIRELY on skill and home ice advantage, then each team would have a 3-3 record (winning all home games and losing all road games). Without considering home ice advantage in your analysis, you would draw the conclusion that outcomes are determined entirely by luck because everyone's record is .500, even though they are completely deterministic.

There's clearly a big role of luck in hockey, but this type of math always overestimates it.

Great work!... Could you clarify?...How come the variance explained by luck is only 38% ? Why isn't it 38%+ 1/2 of 62% or 69%...? In other words doesn't luck events explain 2/3 of outcomes -when you tak e into account the time the better skilled team wind because of luck? Thx..