Every play is a data point!

Bayesian player evaluation

Many times, in sports we need to evaluate players with limited information. Should we sign a QB for whom we have only observed 30 pass attempts? What is our best estimate for the probability distribution for his yards/attempt? What about a kicker that has only taken 5 field goals of 50+ yards? How can we decide on whether to sign him or not? To answer similar questions, we can rely on one of the most important theorems in probability theory, namely, the Bayes theorem. Bayes theorem essentially is the crown jewel operates the same way the scientific method operates. We begin with a prior belief on a hypothesis and then as we collect more evidence, i.e., data, we update our belief for this hypothesis.

Let us assume that we are evaluating a new kicker in practice. We ask the kicker to take 20 50-yard field goals. He makes 16 of them. What can we say about his success rate at 50-yard field goals?

In order to get a good estimate of the probability distribution of the kicker’s success rate σ, we will make use of the Bayes theorem:

In the above equation π(σ) is the prior probability distribution for the success rate of the kicker, while π(σ|data) is the posterior distribution we estimate taking into consideration the data we observed (in our case the 16/20 FGs). f(data|σ) is the likelihood of observing the data given the success rate σ. Finally, f(data) is the total probability of observing the data:

What is the prior distribution that we can use? We can simply look into all NFL kickers and use the distribution of their collective success rate in 50-yard FGs. The average success rate is around 70%. There are some kickers that are exceptional and way above average in 50-yard FGs (e.g., Justin Tucker), while the majority of the kickers are around average. Therefore, one could use a Beta distribution for the prior π(σ), with an average of 0.7. Given that the average of a Beta distribution is given by α/(α+β), where α and β are the distribution parameters, we choose α=5 and β=2. This gives us the following prior distribution:

The next element we have to calculate is the likelihood function f(data|σ). Simply put we need to calculate the likelihood of observing 16 successful kicks and 4 misses, given the success rate σ. This is nothing more than the binomial distribution:

Finally, the total probability of observing the data is:

Combining all of these we obtain the following posterior probability function for the success rate σ of our kicker at 50-yard FGs:

As we can see there is smaller uncertainty associated with this posterior probability since we now have some data to support this probability. For example, but calculating the area under the posterior distribution between σ = 0.8 and σ = 1, we find that the kicker has a 42% chance of being an 80% or better kicker at 50-yard field goals. A generic NFL kicker (i.e., one that is drawn from the prior distribution) has only a 34% probability of being an 80% or better kicker at 50-yard field goals. With more data we can further update our beliefs. For example, if he we give the kicker another 30 attempts and he makes 25 of them, our updated posterior distribution for the success rate is:

Using this posterior distribution we now can say that our kicker has a 58% probability of being an 80% or better kicker at 50-yard FGs.

It should be evident that Bayes theorem is a very powerful tool that allows us to make probabilistic inferences, updated for every new data point we obtain. Brian Burke of ESPN has used a similar analysis to find that Garoppolo has an edge over rookie QB’s. In particular, there is a 64% chance that Garoppolo is better than a generic-first round QB.