Tuesday, 16 October 2012

Results of Guess the Goal (& Cryptic Tease)

A week or so ago I put out a challenge to readers to try and deduce the number of goals scored by 10 players from the 2011/12 season based on each players underlying stats for the season on Shots (S), Shots in Box (Sin) and Shots on Target (SoT). I prioritised judging the players relative order first over the actual goals scored. From a FPL perspective I believe it's not really actually that important to predict the precise number of goals scored by Player A, moreso that Player A will be better than Player B.Thanks first and foremost to JohnDoe2008 (aka SuperGrover) who took part and got the relative orders almost correct, or as best as possible seeing as there were a couple of big outliers in the data.

I intentionally picked these 10 players to try and represent a healthy mix of players performing on, above and below the curve, not to make this harder, but to examine the difficulty posed when making predictions like this.

Please note: Goal totals exclude penalties and free-kicks. These aren't normal and are not representative of the shot data. Most notable exclusions are Aguero and Adebayor who both scored 3 penalties last season.

Player

Name

S

Sin

SoT

Goals

Player 1

Suarez

126

98

46

11

Player 2

Dempsey

142

84

53

15

Player 3

Ba

109

75

44

13

Player 4

Cissé

39

31

21

13

Player 5

Agüero

130

100

51

20

Player 6

Holt

77

68

33

13

Player 7

Hernández

47

42

19

9

Player 8

Adebayor

100

87

46

14

Player 9

Fletcher

71

55

27

12

Player 10

van der Vaart

102

46

42

10

Below are three charts showing the straight up correlation and R-squared value between each of the shot data columns and goals scored.

As can be seen the relationship between individual shot data is very poor when taken at face value. However, you can see some trends hidden in there, particularly the Shots In Box data if you remove the two data points that are further from the line, the outliers, which are now picked out in red on the charts below.

It will not surprise you to learn the the the outliers were Suarez and Cisse, with the latter essentially scoring the same number of goals (actually 2 more) from roughly a third of the opportunities.

The first thing this got me to do was re-evaluate the importance of Sin over or alongside SoT. There is reason to include both factors as major drivers in any model. But the bigger question this asked was what, if any, is the major difference between the underlying performance data for Suarez and Cisse, subjectively one of the most wasteful strikers last season and the most clinical? There is one, and I've found it, or almost, but I will have to leave this to a later post for now.