Animating the Premier League using gganimate

Ever wonder what an evolving gif of each premier league team’s goal difference vs points would look like made in R? Look no further! Most of this is going to be setting up the data (as always) instead of actually plotting the data. To get the data into shape, we're going to be using the tidyverse and lubridate, which you can install the usual way via install.packages(). To animate the data we'll be using the gganimate package. This is not on CRAN and so must be installed from GitHub, which you can do so via the devtools package

devtools::install_github("thomasp85/gganimate")

To get started let’s attach the relevant packages

library("tidyverse")
library("lubridate")
library("gganimate")

We’re going to use the last full season of matches in the premier league, which was the 17/18 season. The data was sourced from football-data.co.uk

We’re only interested in the date, teams, result and home/away goals. These variables come between the variables Date and FTR. We also need to convert Date to a date object

prem = prem %>%
select(Date:FTR) %>%
mutate(Date = dmy(Date))

Cumulative points per day per team

There’s probably a better way to do this, but here is mine. I added a column for each team onto the data then, using a for loop (I know I’m sorry) I transferred the “H”, “A” and “D” status of the full time result into points for each time in their respective column. For you non-football heads, thats 3 for a win, 1 for a draw and 0 for a loss.

You can see where Arsenal beat Leicester 4-3, there is a 3 in the Arsenal variable. Now, it would be nice to have this data in long form, for plotting purposes later, so we’ll use gather(). I then don’t want any rows with an NA in the Points variable, as these only occur if a team hasn't played on that day.

At the moment, we only have one row for each match on each day. Later, we’ll need to work out the position of each team on each day. To do this, we need the points for each team on each day, even if they didn’t play. So I’m going to create an empty data set of days and teams, join it then fill in the NA’s with 0’s.

We have a row for each day there was a premier league match, even if that team didn't play. Here you can see Arsenal won on the first day of the season (they beat Leicester 4-3) and gather any more points til the won again on the 9th of September.

Cumulative goal difference per team per day

We’re going to take the exact same process to do this job. Do let’s start by overwriting those columns of points in prem with columns of NA’s ready for the goal difference

prem[sort(unique(prem$HomeTeam))] = NA

Now, using a for loop again (again, I’m sorry) for each home team and away team we calculate the goal difference by simply minusing the away team goals from the home team goals or vice versa.

You can see now for when Arsenal beat Leicester 4-3, instead of having a 3 in the Arsenal variable, we have a 1 to signify Arsenal won by 1 goal. Now we follow the same process as before in that we gather the data into long format, join with the empty data set of days, turn the NAs into 0’s and then calculate the cumulative goal difference over the season.

Now we can see not only when Arsenal picked up points, but when they dropped points as well. For example, on the 27th of August, they got beat by 4 goals as their goal difference shifted from 0 to -4.

We’re not done there! For the gif, we want to be able to display the current status of the team on each day i.e. Champions League (4th or above), Europa League (5th - 7th), Top Half (8th - 10th), Bottom Half (11th - 17th) or Relgations Zone (18th or below). To do this, on each day, we first need to retrieve the order of each team based on their points and goal difference

Note that here I’m using a factor to reorganise the legend in the plot we’re about to make. We’re looking for a path of a teams points and goal difference over a season, with a colour scheme for where they are in the table at that point. This is what that looks for one team (here I'm using Newcastle United)

Bear in mind we’re going to have 20 teams on the graph and so instead of just using points, we’re going to use labels with the team’s name on.

Now, adding gganimate is relatively pain-free. The package comes with lots of functions titled transition_*(). These dictate by what variable your gif will change. We want our gif to be over time i.e. the variable Date. There is a specific transition function that works with time, called transition_time(). gganimate is also lovely in the way that we can just add these functions to regular ggplots

We've only added one function here. Easy! If you are wanting to split it up by something more arbitrary (a character variable let's say), then you would use transition_states(). Then all that is needed is the animate function! Within the animate() function, the nframes argument is the total number of frames whilst the fps argument is the total number of frames per second. If we wanted our gif to be a bit quicker, we'd go for a higher frame per second.

Thanks for your response. I’m glad you enjoyed the blog and the gif! Using your method, I’m sure the data would end up in the same place (and yours might even be quicker!). However, the two pieces of code are not the same. I’m wanting to populate 20 separate columns (one for each team), where as your method populates two separate columns (one for the home team and one for the away team).