Visualising Running Totals with Line Charts

Cumulative line charts feature in loads of great and popular visualisations across the football analytics community. Most commonly, they are seen in xG or shot counts throughout a game. In the example from Ben Mayhew below, we can see how a great visualisation gives us so much more detail than a total xG figure would. It gives us periods of dominance from a team and the spread for both teams over a game:

1) Import our data

Data for this tutorial comes from http://football-data.co.uk/ – a great resource for match-by-match results data from a number of global leagues. Feel free to use any of the leagues provided to follow along with.

Download a csv of a season’s worth of matches (or part of an ongoing season) and put it in the file with your script.

After that, import your data with pandas and assign it to a dataframe:

In [2]:

#Import our data and assign it to 'data'data=pd.read_csv("1718EPL.csv")#Show the top of the dataframedata.head()

Out[2]:

Div

Date

HomeTeam

AwayTeam

FTHG

FTAG

FTR

HTHG

HTAG

HTR

…

BbAv<2.5

BbAH

BbAHh

BbMxAHH

BbAvAHH

BbMxAHA

BbAvAHA

PSCH

PSCD

PSCA

0

E0

11/08/17

Arsenal

Leicester

4

3

H

2

2

D

…

2.32

21

-1.00

1.91

1.85

2.10

2.02

1.49

4.73

7.25

1

E0

12/08/17

Brighton

Man City

0

2

A

0

0

D

…

2.27

20

1.50

1.95

1.91

2.01

1.96

11.75

6.15

1.29

2

E0

12/08/17

Chelsea

Burnley

2

3

A

0

3

A

…

2.23

20

-1.75

2.03

1.97

1.95

1.90

1.33

5.40

12.25

3

E0

12/08/17

Crystal Palace

Huddersfield

0

3

A

0

2

A

…

1.72

18

-0.75

2.10

2.05

1.86

1.83

1.79

3.56

5.51

4

E0

12/08/17

Everton

Stoke

1

0

H

1

0

H

…

1.76

19

-0.75

1.94

1.90

2.01

1.98

1.82

3.49

5.42

5 rows × 65 columns

2) Transform our data into a usable format

Our data is a match-by-match look at a season and this won’t help us much for a line chart. We need our data to be the data that we want to plot – a list of the cumulative totals for each team over the season.

Ideally, we will do this for every team altogether, rather than one team at a time. So let’s firstly create a list of the unique teams in our dataframe.

We’ll then create a dictionary that will iterate over our teams and give each a list that starts with a 0, as each team obviously starts with 0 points.

In [3]:

#Create a list of unique teams from the home team columnTeams=data.HomeTeam.unique()#Create a dictionary called TeamLists. There will be an entry for each team with the list [0]TeamLists={Team:[0]forTeaminTeams}

With a starter list ready for each team, we just need to run through each match, find out who won and add a new entry into the correct team’s list with their points.

Let’s do this by working through each line of our dataframe, learning who the home team and away team are, then running an if statement to learn the result. Once we know the result, we can add each team’s points with the append method:

In [4]:

#For each row in our dataframe, I want to do the following:forrowindata.itertuples():#Add the home and away team names to the correct variableHome=row.HomeTeamAway=row.AwayTeam#If the home team goals (FTHG column in the dataframe) are higher than the away team, give the correct points to each teamifrow.FTHG>row.FTAG:TeamLists[Home].append(3)TeamLists[Away].append(0)#If the home team goals are less than the away team, give the correct pointselifrow.FTHG<row.FTAG:TeamLists[Home].append(0)TeamLists[Away].append(3)#In any other case (a draw), give the correct pointselse:TeamLists[Home].append(1)TeamLists[Away].append(1)

We have stored the lists inside the TeamLists dictionary, so let’s check out the Arsenal entry in there.

In [5]:

TeamLists["Arsenal"]

Out[5]:

[0, 3, 0,...3, 0, 3, 0, 3]

Ah, we have just appended the points, but done nothing to run these as cumulative totals throughout the season.

To achieve this, we somehow need to access the previous game and just add our result to this. Our lists tutorial goes through accessing certain values, but we can navigate backwards through a list with a negative value in square brackets, e.g. myList[-1].

So let’s reset our Teams and TeamLists variable so that they do not contain our previous data. With that all cleaned up, we can repeat our for loop above, but instead of appending the points – we will append the sum of points and the previous value.

Let’s check out Arsenal again – hopefully this makes more sense as a running total.

In [7]:

TeamLists["Arsenal"]

Out[7]:

[0, 3, 3,...57, 57, 60, 60, 63]

Perfect! Let’s get onto putting this into an easy visualisation:

3) Put our data into a basic viz

Matplotlib makes it ridiculously simple to create a line chart. The .plot function ideally takes at least 2 arguments, the x and y location of each point on the line. The points provide one of the coordinates of each point, we just need to create a list containing numbers 0-38 for our matchdays (0 is the starting point).

We can do this by using the range function within the list function. For this, range needs two numbers, the starting number and the end number + 1:

In [8]:

Matchday=list(range(0,39))

Now let’s take advantage of matplotlib’s beautifully easy plotting, by using .plot along with our matchday and team lists:

In [9]:

#Create a line plot with matchday and teamlist figures for two teamsplt.plot(Matchday,TeamLists["Southampton"])plt.plot(Matchday,TeamLists["Swansea"])

Out[9]:

[<matplotlib.lines.Line2D at 0x1ad48ed3828>]

Plenty that needs to be done to improve this, but a really solid start!

4) Styling the visualisation

Just like any default visualisation, the style will have been seen a million times. Whether you’re following along here, or creating visualisations in Excel or elsewhere, it is a great idea to get a clean style that identifies as your own.

#Create the bare bones of what will be our visualisationfig,ax=plt.subplots()#Add our data as before, but setting colours and widths of linesplt.plot(Matchday,TeamLists["Man City"],color="#6CABDD",linewidth=2)plt.plot(Matchday,TeamLists["Swansea"],color="#231F20",linewidth=2)#Give the axes and plot a title eachplt.xlabel('Gameweek')plt.ylabel('Points')plt.title('Man City v Swansea Running Points')#Add a faint grey gridplt.grid()ax.xaxis.grid(color="#F8F8F8")ax.yaxis.grid(color="#F9F9F9")#Remove the margins between our lines and the axesplt.margins(x=0,y=0)#Remove the spines of the chart on the top and right sidesax.spines['right'].set_visible(False)ax.spines['top'].set_visible(False)

And there we go, a clean look at two teams’ running points over the season. Still loads that we could change to improve it, such as new fonts, add all other teams in greyed out lines or add some data labels. Give it a go and take a look through the documentation/Google when you get stuck!

In this tutorial, we have seen how to take a match-by-match dataset and transform it into a format that allows for a line chart with the running total. You can apply the same logic to any metric throughout a match, a different variable through a season or even a single player’s running goals total throughout a career.

Find our other visualisation tutorials here, and show us what you come up with @FC_Python!