Post navigation

A Model for Football Pass Prediction (source code + dataset)

In this blog post, I will discuss the data challenge of the Machine Learning for Sport Analytics workshop (MLSA 2018) at PKDD 2018. The challenge consisted of predicting the receivers of football passes (pass prediction).

I will first briefly describe the data and then give an overview of my model called FPP (Football Pass Predictor) that was accepted as a paper in the workshop.

The dataset

The football pass prediction dataset consists of records describing thousands of football passes made during fifteen football matches of a Belgium team against other teams. Each record is a football pass. It gives the X, Y positions of the 14 players of each team (but at any time, not all players are on the field), the timestamps at which the pass started and ended, and the player who sent and the one who received the pass. Some limitation of the data is that all records of the fifteen matches are shuffled so each pass cannot be analyzed within its context in the overall football game. Besides, it is unclear if the X, Y positions were recorded at the time that the pass started or ended. The name of teams and players are also not provided as well as whether a team is playing on the left or right side of the field (although this information ça be inferred from player positions).

The goal

The goal of the challenge is to predict which player will receive each pass. However, no evaluation criteria were proposed for the challenge. Moreover, the organizers did not split the data into some training and testing data to evaluate solutions. Thus, I decided to simply use the accuracy as evaluation measure. The accuracy is, the number of correct predictions divided by the total number of predictions (records). Moreover, I also considered the accuracy if two predictions are made instead of one.

The Football Pass Predictor model

Since the dataset is quite simple and I had not much time, I designed a simple model to solve the problem of pass prediction. The model consists of a set of heuristics. After defining each heuristic, I fine tuned its parameters by hand to achieve a high accuracy. If I had more time, I would have use a genetic algorithm to automatically tune parameters. I tried many heuristics and kept the ones increasing accuracy. I will give an overview of the model below.

The model is based on the observation that few passes are intercepted (less than 15 %). Thus, it make sense to only predict that passes will succeed. The model uses four heuristics:

A player generally prefers to send the ball to the closest player of the same team.

A player is less likely to send the ball to a player if this player is close to a player from the opposite team.

A player is less likely to send the ball to a player if this player is close to two players from the opposite team.

A player generally prefer to send the ball forward than backward.

Using these heuristics, the proposed model called FPP (Football Pass Predictor) can achieve more than 33% accuracy for one guess, and more than 50% for two guesses. This is considerably more than a random prediction model, which achieves about 8%.

I also tried to use more complex heuristics such as checking if a player of the opposite team is between the sender and a potential receiver by calculating angles but it did not improve accuracy.

The source code of the proposed FPP model can be downloaded from my website (it includes the dataset, which was originally obtained from the workshop website): http://www.philippe-fournier-viger.com/foot2018/ The model is implemented in Java and released under the GPL 3 open source license.

Besides, a simple video presentation of the paper can be found here (HTML5 video for playback on various devices).

Conclusion

In this blog post, I discussed the problem of football pass prediction and presented the FPP (Football Pass Predictor) model, which is simple but achieves quite high accuracy. It would certainly be possible to further improve the model.