A football blog revolving around numbers. Focus often on Liverpool F.C.

Main menu

Category Archives: Expected Goals

Post navigation

While I probably watched Liverpool play before then, the first match I remember watching was on the 4th January 1994, when a nine-year-old me saw them come back from three goals down, which would become something of a theme. As is the want of memory, the events that leave an indelible mark are the ones that stand-out; my first actual football memory is Paul Bodin missing that penalty and not really understanding the scale of the disappointment. Turned out Wales’ last World Cup match was in 1958 when some no-mark seventeen-year-old called Edson Arantes do Nascimento scored his first international goal and knocked them out in the quarter-final.

Other early memories include one of God’s defining miracles, with a hat-trick notched up in four minutes and thirty three seconds and learning about player aging curves when I realised that the slow yet classy guy in midfield used to be one of the most devastating and exciting wide-players the game had ever seen. My first match at Anfield was Ian Rush’s last there in a red shirt, while subsequent visits took in thrilling cup matches under the gaze of King Kenny and the best live sporting experience of my life as I bounced out of Anfield full of hope in April 2014.

While a league title has proved elusive during my supporting life, Europe has provided the greatest thrills, with tomorrow marking a third European Cup Final to go along with two finals in the junior competition. A European Cup Final once every eight years on average, with all three in the last fourteen years is pretty good going for a non-super club, albeit one with significant resources.

Real Madrid are clearly going to be a tough nut to crack, with Five Thirty Eight, Club Elo and Euro Club Index all ranking them as the second best team around. The same systems have Liverpool as the fifth, seventh and eleventh best, so under-dogs with a good chance at glory overall.

According to Club Elo, the 2018 edition of Liverpool will be the best to contest a European Cup Final this century but on the flip-side, Real Madrid are stronger than either of the AC Milan teams that they faced in 2005 and 2007. Despite this, Liverpool are given a slightly better shot at taking home Old Big Ears than they had in 2005, as the gap between them and their opponents is narrower. The strides that the team made under Rafa between the 2005 and 2007 finals meant that the latter was contested by two equal teams.

Liverpool should evidently be approaching the final with optimism and further evidence of this is illustrated in the figure below, which shows the top-fifty teams by non-penalty expected goal difference in the past eight Premier League seasons. The current incarnation of Liverpool sit fifth and would usually be well-positioned to seriously challenge for the title. As the figure also illustrates, the scale of Manchester City’s dominance in their incredible season is well-warranted.

Top-fifty teams by non-penalty expected goal difference over the past eight Premier League seasons. Liverpool are highlighted in red, with the 17/18 season marked by the star marker. Data via Opta.

Liverpool’s stride forward under Klopp this past season has taken them beyond the 13/14 and 12/13 incarnations in terms of their underlying numbers. In retrospect, Rodgers’ first season was quietly impressive even if it wasn’t reflected in the table and it set the platform for the title challenge the following season.

Compared to those Luis Suárez-infused 12/13 and 13/14 seasons, the attacking output this past season is slightly ahead, with the team sitting sixth in the eight-season sample, which is their best over the period. Including penalties would take the 13/14 vintage beyond the latest incarnation, with the former scoring ten from the twelve (!) awarded, while 17/18 saw only three awarded (two scored).

The main difference between the current incarnation though is on the defensive end, with the team having the fifth best record in terms of non-penalty expected goals conceded this past season in the eight-year sample. The 13/14 season’s defence was the seventh worst by the club in this eight-year period and they lay thirty-fourth overall. These contrasting records equate to an eight non-penalty expected goal swing in their defensive performance.

While the exhilarating attacking intent of this Liverpool side is well-established, they are up against another attacking heavyweight; could it be that the defensive side of the game is the most decisive? The second half of this season is especially encouraging on this front, with improvements in both expected and actual performance. This period represents the sixth best half season over these eight-seasons (out of a total of 320) and a three-goal swing compared to the first half of the season. This was slightly offset by a reduction in attacking output of two non-penalty expected goals but the overall story is one of improvement.

The loss of Coutinho, addition of van Dijk and employing a keeper with hands (edit 2203 26/05/18: well at least he gets his hands to it usually) between the sticks is a clear demarcation in Liverpool’s season and it is this period that has seen the thrilling run to the European Cup Final. The improved balance between attack and defence bodes well and I can’t wait to see what this team can do on the biggest stage in club football.

Expected goals has found itself outside the confines of the analytics community this season, which has brought renewed questions regarding its flaws, foibles and failures. The poster-child for expected goals flaws has been Burnley and their over-performing defence, even finding themselves in the New York Times courtesy of Rory Smith. Smith’s article is a fine piece blending comments and insights from the analytics community and Sean Dyche himself.

Smith quotes Dyche describing Burnley’s game-plan when defending:

The way it is designed is to put a player in a position that it is statistically, visually and from experience, harder to score from.

Several analyses have dug deeper into Burnley’s defence last season, including an excellent piece by Mark Thompson for StatsBomb. In his article, Mark used data from Stratagem to describe how Burnley put more men between the shooter and the goal than their peers, which may go some way to explaining their over-performance compared with expected goals.

Further work on the EightyFivePoints blog quantified how the number of intervening defenders affected chance quality. The author found that when comparing an expected goal model with and without information on the number of intervening defenders and a rating for defensive pressure, Burnley saw the largest absolute difference between these models (approximately 4 goals over the season).

If there is a quibble with Smith’s article it is that it mainly focuses on this season, which was only 12 games old at the time of writing. Much can happen in small samples where variance often reigns, so perhaps expanding the analysis to more seasons would be prudent.

The table below summarises Burnley’s goals and expected goals conceded over the past three and a bit seasons.

Burnley’s non-penalty expected goals and goals conceded over the past four seasons. Figures for first 13 games of 2017/18 season. Data via Opta.

Each season is marked by over-performance, with fewer goals conceded than expected. This ranges from 5 goals last season to a scarcely believable 15 goals during their promotion season in the Championship.

The above table and cited articles paint a picture of a team that has developed a game-plan that somewhat flummoxes expected goals and gains Burnley an edge when defending. However, if we dig a little deeper, the story isn’t quite as neat as would perhaps be expected.

Below are cumulative totals for goals and expected goals as well as the cumulative difference over each season.

Burnley’s cumulative non-penalty goals and expected goals conceded (left) and the cumulative difference between them (right) over the past four seasons. Figures for first 13 games of 2017/18 season. Data via Opta.

In their 2014/15 season, Burnley were actually conceding more goals than expected for the majority of the season until a run of clean sheets at the end of the season saw them out-perform expected goals. After the 12 game mark in their Championship season, they steadily out-performed expected goals, yielding a huge over-performance. This continued into their 2016/17 season in the Premier League over the first 10 games where they conceded only 12 goals compared to an expected 19. However, over the remainder of the season, they slightly under-performed as they conceded 39 goals compared with 36 expected goals.

The above illustrates that Burnley’s over-performance in previous Premier League seasons is actually driven by just a handful of games, rather than a systematic edge when examining expected goals.

This leaves us needing to put a lot of weight on their Championship season if we’re going to believe that Burnley are magical when it comes to expected goals. While the descriptive and predictive qualities of expected goals in the Premier League is well-established, there is less supporting evidence for the Championship. Consequently it may be wise to take their Championship figures with a few grains of salt.

This season and last has seen Burnley get off to hot starts, with what looks like a stone-cold classic example of regression to the mean last season. If we ignore positive variance for a moment, perhaps their opponents got wise to their defensive tactics and adapted last season but then you have to assume that they’ve either forgotten the lessons learned this season or Dyche has instigated a new and improved game-plan.

The cumulative timelines paint a different picture to the season aggregated numbers, which might lead us to conclude that Burnley’s tactics don’t give them quite the edge that we’ve been led to believe. In truth we’ll never be able to pin down exactly how much is positive variance and how much is driven by their game-plan.

However, we can state that given our knowledge of the greater predictive qualities of expected goals when compared to actual goals, we would expect Burnley’s goals against column to be closer to their expected rate (1.3 goals per game) than their current rate (0.6 goals per game) over the rest of the season.

When asked how his Liverpool team would play by the media horde who greeted his unveiling as manager two years ago, Jürgen Klopp responded:

We will conquer the ball, yeah, each fucking time! We will chase the ball, we will run more, fight more.

The above is a neat synopsis of Klopp’s preferred style of play, which focuses on pressing the opponent after losing the ball and quickly transitioning into attack. It is a tactic that he successfully deployed at Borussia Dortmund and one that he has employed regularly at Liverpool.

Liverpool’s pass disruption map for the past three seasons is shown below. Red signifies more disruption (greater pressure), while blue indicates less disruption (less pressure). In the 2015/16 and 2016/17 seasons, the team pressed effectively high up the pitch but that has slid so far this season to a significant extent. There is some disruption in the midfield zone but at a lower level than previously.

Liverpool’s zonal pass completion disruption across the past three seasons. Teams are attacking from left-to-right, so defensive zones are to the left of each plot. Data via Opta.

The above numbers are corroborated by the length of Liverpool’s opponent possessions increasing by approximately 10% this season compared to the rest of Klopp’s reign. Their opponents so far this season have an average possession length of 6.5 seconds, which is lower than the league average but contrasts strongly with the previous figures that have been among the shortest in the league.

Examining their pass disruption figures game-by-game reveals further the reduced pressure that Liverpool are putting on their opponents. During 2015/16 and 2016/17, their average disruption value was around -2.5%, which they’ve only surpassed once in Premier League matches this season, with the average standing at -0.66%.

The Leicester match is the major outlier and examining their passing further indicates that the high pass disruption was a consequence of them attempting a lot of failed long passes. This is a common response to Liverpool’s press as teams go long to bypass the pressure.

Liverpool’s diminished press is likely a deliberate tactic that is driven by the added Champions League matches the team has faced so far this season. The slightly worrisome aspect of this tactical shift is that Liverpool’s defensive numbers have taken a hit.

In open-play, Liverpool’s expected goals against figure is 0.81 per game, which is up from 0.62 last season. Furthermore, their expected goals per shot has risen to 0.13 from 0.11 in open-play. To add further defensive misery, Liverpool’s set-piece woes (specifically corners) have actually got worse this season. The team currently sit eleventh in expected goals conceded this season, which is a fall from fifth last year.

This decline in underlying defensive performance has at least been offset by a rise on the attacking side of 0.4 expected goals per game to 1.78 this season. Overall, their expected goal difference of 0.79 this season almost exactly matches the 0.81 of last season.

Liverpool’s major problem last season was their soft under-belly but they were often able to count on their pressing game denying their opponents opportunities to exploit it. What seems to be happening this season is that the deficiencies at the back are being exploited more with the reduced pressure ahead of them.

With the season still being relatively fresh, the alarm bells shouldn’t be ringing too loudly but there is at least cause for concern in the numbers. As ever, the delicate balancing act between maximising the sides attacking output while protecting the defense is the key.

Klopp will be searching for home-grown solutions in the near-term and a return to the familiar pressing game may be one avenue. Given the competition at the top of the table, he’ll need to find a solution sooner rather than later, lest they be left behind.

Liverpool enter the season with aspirations of challenging for the title after an at times hugely promising and exciting first full season under Jürgen Klopp. The prospect of European adventures returning on Tuesday or Wednesday nights is tantalizing close providing they negotiate their Champions League qualifying round.

The story so far

Liverpool’s tally of 76 points last season was their joint-third best tally over the last decade and only their second top-four finish since the Benitez years. In fact, after a run of four top-four finishes, Liverpool haven’t registered back-to-back Champions League qualifications since Rafa left and have on average finished sixth during that time with 65 points on the board.

With the above in mind, it’s tempting to view a season of consolidation as the priority for the coming season, alongside beginning to re-establish the team as a European force. Liverpool’s underlying performance last season is encouraging, with their goal return reasonably in-line with expectation and their expected goal difference placing them well in contention for a title push.

Drilling further into their expected goal numbers, sees a team that experienced fluctuating under-lying performance over the course of the season with a significant decline once 2017 was rung in. The graphic below illustrates this alongside a longer-term outlook encompassing the past five seasons.

The heights of 2016/17 are close to those of the Suárez-powered team under Rodgers, while the low-point is more in-line with Klopp’s early tenure at the club. The past season thus illustrated that the team was capable of title-contending performances at times but also switched to a team competing for the fourth-place trophy at best.

Upping the pace

Closer examination of the downturn in performance using my ‘team strategy analysis‘ shows a drying up of shot generation via high-quality chances born of fast-paced attacks from deep and after midfield-transitions.

Sadio Mané was evidently missed due to AFCON duties and injury over the latter half of the season and this is borne out by the numbers. According to my model, he was second best in the EPL (0.11 per 90) in terms of xG-contribution (the sum of expected goals and assists) from fast-paced attacks following a midfield-transition. For fast-attacks from deep, he ranked sixth for xG-contribution (0.12 per 90).

Thankfully, Mohamed Salah, the club’s major acquisition so far, brings complementary qualities to the table and adds much-needed depth to the wide-forward ranks. James Yorke of this parish has already praised the signing earlier this summer and my only addition is that Salah showed up quite highly for xG-contribution (0.07 per 90, ranking eleventh in Serie A) for fast-paced attacks following a midfield-transition. The addition of Salah improves what was already a healthy front-line attack.

Defensive issues

According to the Objective Football website run by Benjamin Pugsley, Liverpool conceded just 8.1 non-penalty shots per game, ranking second over the past eight seasons behind a Pep-infused Manchester City last year. Shots-on-target conceded (3.0 per game) told a similar story, ranking joint-sixth over the same period. However, they combined these extraordinary shot-suppression numbers with the highest expected goals per shot in the league (0.11), which is the worst value I have over the past five seasons. When Liverpool conceded shots, they were of high quality, which ultimately saw them sit fifth in terms of expected goals against last season.

Klopp’s tactical system deserves credit for melding a highly exciting attack with strong defensive aspects in terms of shot-suppression. The optimistic take here is that tweaks and a greater familiarity with his counter-pressing tactics could bring about improvements in shot quality conceded, thereby seeing better defensive numbers. It’s worth noting the period during November and December 2016 when their expected goals against was the lowest it has been consistently over a 19-game span in the past five seasons, so the current squad is capable of sustained excellence in this realm.

The pursuit of Virgil van Dijk does suggest that the club are aiming to recruit a new starting centre-back. That saga remains running at the time of writing as the world waits to find out just how costly a single ice cream can be. Centre-back depth is an issue that needs to be rectified; Lucas Leiva made six appearances as a centre-back last term is all the evidence needed for that statement.

The other aspect of Liverpool’s defense that could improve is in the goalkeeping stakes. From a pure shot-stopping perspective, Karius has the best pedigree; in my goalkeeper shot-stopping analysis, Karius came 31st across the data-set with a rating of 91%, which is a pretty decent indication that he is an above-average shot-stopper. Mignolet fared much worse with a ranking of just 25%, which puts him at best as an average shot-stopper during his Liverpool career to date. I haven’t looked at numbers for the Championship but Mark Taylor’snumbers for Ward at Huddersfield were not encouraging. Playing Karius would be a bold move by Klopp given his limited exposure to English football thus far but Mignolet doesn’t provide much confidence either -personally, I would go with Karius.

Title talk

If I’ve learnt anything while sifting through the data for this preview, it’s that Manchester City should be strong favourites for the title this coming season.

Can Liverpool challenge them, while also competing in Europe? At present, I’d side with no given the depth issues of last season have yet to be addressed and the remaining questions marks in terms of the defense.

Liverpool’s other transfer saga involving Naby Keita could be a game-changer given that he could have a transformative impact on the team’s midfield but the likelihood of him signing appears to be receding by the day. Midfield depth is also potentially an issue unless Klopp is happy to rely on youth to cover midfield absentees over the season.

With potentially five teams in the Champions League group stages, progress to the latter rounds could have a strong bearing on league form post-Christmas. Six into four is likely the maths heading into the new season and Liverpool should be well in the mix.

Like this:

Goalkeepers have typically been a tough nut to crack from a data analytics point-of-view. Randomness is an inherent aspect of goal-scoring, particularly over small samples, which makes drawing robust conclusions at best challenging and at worst foolhardy. Are we identifying skill in our ratings or are we just being sent down the proverbial garden path by variance?

To investigate some of these issues, I’ve built an expected save model that takes into account shot location and angle, whether the shot is a header or not and shot placement. So a shot taken centrally in the penalty area sailing into the top-corner will be unlikely to be saved, while a long-range shot straight at the keeper in the centre of goal should usually prove easier to handle.

The model is built using data from the past four seasons of the English, Spanish, German and Italian top leagues. Penalties are excluded from the analysis.

The model thus provides an expected goal value for each shot that a goalkeeper faces, which we can then compare with the actual outcome. In a simpler world, we could easily identify shot-stopping skill by taking the difference between reality and expectation and then ranking goalkeepers by who has the best (or worst) difference.

However, this isn’t a simple world, so we run into problems like those illustrated in the graphic below.

Each individual red marker is a player’s shot-stopper rating over the past four seasons versus the number of shots they’ve faced. We see that for low shot totals, there is a huge range in the shot-stopper-ranking but that the spread decreases as the number of shots increases, which is an example of regression to the mean.

To illustrate this further, I used a technique called boot-strapping to re-sample the data and generate confidence intervals for an average goalkeeper. This re-sampling is done 10,000 times to create a probability distribution built by randomly extracting groups of shots from the data-set and calculating actual and expected save percentages and then seeing how large the difference is. We see a strong narrowing of the blue uncertainty envelope up to around 50 shots, with further narrowing up to about 200 shots. After this, the narrowing is less steep.

What this effectively means is that there is a large band of possible outcomes that we can’t realistically separate from noise for an average goalkeeper. Over a season, a goalkeeper faces a little over 100 shots on target (119 on average according to the data used here). Thus, there is a huge opportunity for randomness to play a role and it is therefore of little surprise to find that there is little repeatability year-on-year for save percentage.

Things do start to settle down as shot totals increase though. After 200 shots, a goalkeeper would need to be performing more than ± 4% on the shot-stopper-rating scale to stand up to a reasonable level of statistical significance. After 400 shots, signal is easier to discern with a keeper needing to register more than ± 2% to emerge from the noise. That is not to say that we should be beholden to statistical significance but it is certainly worth bearing in mind in any assessment plus an understanding of the uncertainty inherent in analytics can be a powerful weapon to wield.

What we do see in the graphic above are many goalkeepers outside of the blue uncertainty envelope. This suggests that we might be able to identify keepers who are performing better or worse than the average goalkeeper, which would be pretty handy for player assessment purposes. Luckily, we can employ some more maths courtesy of Pete Owen who presented a binomial method to rank shot-stopping performance in a series of posts available here and here.

The table below lists the top-10 goalkeepers who have faced more than 200 shots over the past four seasons by the binomial ranking method.

I don’t know about you but that doesn’t look like too shabby a list of the top keepers. It may be that some of the names on the list have serious flaws in their game aside from shot-stopping but that will have to wait another day and another analysis.

So where does that leave us in terms of goalkeeping analytics? On one hand, we have noisy unrepeatable metrics from season-to-season. On the other, we appear to have some methods available to extract the signal from the noise over larger samples. Even then, we might be being fooled by aspects not included in the model or the simple fact that we expect to observe outliers.

Deficiencies in the model are likely our primary concern but these should be checked by a skilled eye and video clips, which should already be part of the review process (quit sniggering at the back there). Consequently, the risks ingrained in using an imperfect model can be at least partially mitigated against.

Requiring 2-3 seasons of data to get a truly robust view on shot-stopping ability may be too long in some cases. However, perhaps we can afford to take a longer-term view for such an important position that doesn’t typically see too much turnover of personnel compared to other positions. The level of confidence you might want when short-listing might well depend on the situation at hand; perhaps an 80% chance of your target being an above average shot-stopper would be palatable in some cases?

All this is to say that I think you can assess goalkeepers by the saves they do or do not make. You just need to be willing to embrace a little uncertainty in the process.

One of the most enduring aspects of football is the multitude of tactical and stylistic approaches that can be employed to be successful. Context is king in analytics and football as a whole, so the ability to identify and quantify these approaches is crucial for both opposition scouting and player transfer profiles.

One such style I identified was ‘fast attacks from deep’, which were a distinct class of shots born of fast and direct possessions originating in the defensive zone. While these aren’t entirely synonymous with counter-attacks, there is likely a lot of overlap; the classical counter-attack is likely a subset of the deep fast-attacks identified in the data.

These fast-attacks from deep typically offer good scoring chances, with above average shot conversion (10.7%) due to the better shot locations afforded to them. They made up approximately 23% of the shots in my analysis.

So what do they look like?

To provide an overview of the key features of these attacks, I’ve averaged them together to get a broad picture of their progression up the pitch. I’ve presented this below and included a look at attacks from deep that involve more build-up play for comparison.

Comparison between fast-attacks from deep and attacks from deep that focus on slower build-up play. Vertical pitch position refers to the progression of an attack towards the opponent’s goal (vertical pitch position equal to 100). Both attack types start and end in similar locations on average but their progress with time is quite different. The shading is the standard deviation to give an idea of the spread inherent in the data. Data via Opta.

Fast-attacks from deep are characterised by an initial speedy progression towards goal within a team’s own half, followed by a steadier advance in the attacking half. This makes sense qualitatively as counter-attacks often see a quick transition in their early stages to properly establish the attacking opportunity. The attack can then be less frenetic as a team seeks to create the best opportunity possible from the situation.

Over the past five seasons, the stand out teams as rated by shot volume and expected goals have been various incarnations of Arsenal, Manchester City, Chelsea and Liverpool.

The architects

Player-level metrics can be used to figure out who the crucial architects of a counter-attacking situation are. One method of examining this is how many yards a player’s passing progressed the ball during deep fast-attacking possessions.

Below I’ve listed the top 10 players from the 2016/17 season by this metric on a per 90 minute basis, alongside some other metrics for your delectation.

Top players ranked by ball progression per 90 minutes (in yards) during fast-attacks from deep for the 2016/17 Premier League season. xGoals and Goals per 90 are for possessions that a player is involved in (known as xGChain in some parts). Players with more than 1800 minutes only. Data via Opta.

While the focus was often on him kicking people rather than the ball, we see that Granit Xhaka stands alone in terms of ball progression, with Daley Blind a long way behind him in second place. Xhaka’s long-range passing skills are well known, so combining this with the most passes per 90 in such situations propels him to the top of pile.

The graphic below illustrates Xhaka’s passing during deep fast-attacks, with his penchant for long passes spread all over the midfield zone evident. For comparison, I’ve included Eden Hazard’s passing map as someone who played many important passes that were limited in terms of ball progression as they were typically shorter or lateral passes in the final third.

Passes played by Granit Xhaka and Eden Hazard during fast-attacks from deep during the 2016/17 season. Solid circles denote pass origin, while the arrows indicate the direction and end point of each pass. Data via Opta.

Evidently there is a link between position and ball progression, as players in deeper positions have greater scope to progress the ball as they have more grass in front of them. The likes of Coutinho, Özil and De Bruyne residing so high up the rankings is therefore impressive.

Passes played by Philippe Coutinho and Kevin Dr Bruyne during fast-attacks from deep during the 2016/17 season. Data via Opta.

Coutinho’s passing chalkboard above illustrates his keen eye for a pass from midfield areas through opposition defensive lines, as does De Bruyne’s ability to find teammates inside the penalty area. De Bruyne’s contribution actually ranks highest in terms of xG per 90 for the past season.

The finishers

While ball progression through the defensive and midfield zones is important for these fast-attacks from deep, they still require the finishing touches in the final third. There are fewer more frustrating sights in football than watching a counter-attack be botched in its final moments.

The graphic below summarises the top players in this crucial aspect by examining their expected goal and assist outputs. Unsurprisingly, Kevin De Bruyne leads the way here and is powered by his exceptional creative passing.

The list is dominated by players from the top-6 clubs, with Negredo the only interloper inside the top-10 ranking. Middlesbrough’s minimal attacking output left few scraps of solace for Negredo but at least he did get a few shots away in these high-value situations to alleviate the boredom.

Conclusion

The investigation of tactical and stylistic approaches carried out above merely scratches the surface of possibilities for opposition scouting and player profiling.

Being able to identify ‘successful’ attacking moves opens the door to examining ‘failed’ possessions, which would allow efficiency to be studied as well as defensive aspects. This is an area rich with promise that I’ll examine in the future, along with other styles identified within the same framework.

Leicester City’s rise to the top of the Premier League has led to many an analysis by now. Reasons for their ascent have mainly focused on smart recruitment and their counter-attacking style of play, as well as a healthy dose of luck. While their underlying defensive numbers leave something to be desired, their attack is genuinely good. The pace and directness of their attack has regularly been identified as a key facet of their style by writers with analytical leanings.

Analysis by Daniel Altman has been cited in both the Economist and the Guardian, with the crux being that the ‘key’ to stopping Leicester is to ‘slow them down’. Using slightly different metrics, David Sumpter illustrated this further at the recent Opta Pro Forum and on the Sky Sports website, where his analysis surmised that:

For Leicester, it’s about the speed of the attack.

An obvious and somewhat unaddressed question here is whether the pace of Leicester’s attack is the key to their increased effectiveness this season? Equating style with success in football is often a fraught exercise; the often tedious and pale imitations of Guardiola’s possession-orientated approach being a recent example across football.

Below are a raft of numbers comparing various facets of Leicester’s style and effectiveness this season with last season.

Comparison between Leicester City’s speed of attack and shot profile from ‘fast’ possessions. A possession is a passage of play where a team maintains unbroken control of the ball. Possessions moving at greater than 5 m/s on average are classed as ‘fast’. All values are for open-play possessions only. Data via Opta.

The take home message here is that the average pace of Leicester’s play has barely shifted this season compared to last. Only Burnley in 2014/15 and Aston Villa in 2013/14 have attacked at a greater pace than Leicester this season over the past four years.

The proportion of their shots generated via fast paced possessions has risen this year (from 27.5% to 32.1%) and Leicester currently occupy the top position by this metric over this period. In terms of counter-attacking situations, their numbers have barely changed this season (20.1%) compared to last season (20.8%), with only the aforementioned Aston Villa having a greater proportion (21.3%) than them in my dataset.

What has altered is the effectiveness of their attacks this season, as we can see that their expected goal figures have risen. Below are charts comparing their shots from counter-attacking situations, where we can see more shots in the central zone of the penalty area this season and several better quality chances.

Their improvement this year sees them currently rank first and second in expected goals per game from fast-attacks and counter-attacks respectively over the past four season (THAT Liverpool team rank second and first). Based on my figures, Leicester’s goals from these situations are closely in line with expectations also (N.B. my expected goal model doesn’t explicitly account for counter-attacking moves).

The figure below shows how this has evolved over the past two seasons, where we see fast-attacks helping drive their improved attack at the end of 2014/15, which continued into this season. There has been a gradual decline since an early-season peak, although their expected goals from fast-attacks has reduced more than their overall attacking output in open-play, indicating some compensation from other forms of attack.

Rolling ten-match samples of Leicester City’s expected goals for in 2014/15 and 2015/16. All data is for open-play shots only. Data via Opta.

The effectiveness of these attacks has gone a long way to improving Leicester’s offensive numbers. According to my expected goal figures in open-play, they’ve improved from 0.70 per game to 0.94 per game this season. About half of that improvement has come from ‘fast’ paced possessions, with many of these possessions starting from deep areas in their own half.

Examining the way these chances are being created highlights that Leicester are completing more through-balls during their build-up play this season. The absolute numbers are small, with an increase from 11 to 17 through-balls during ‘fast’ possessions and from 6 to 12 during ‘fast’ possessions from their own half, but they do help to explain the increased effectiveness of their play. Approximately 27% of their shots from counter-attacks include a through-ball during their build-up this season, compared to just 11% last season. Through-balls are an effective means of opening up space and increasing the likelihood of scoring during these fast-paced moves. Leicester’s counter-attacks are also far less reliant on crosses this season, with just 2 of these attacks featuring a cross during build-up compared to 9 last season, which will further increase the likelihood of scoring.

Speed is an illusion. Leicester’s doubly so.

Overall, attacking at pace is a difficult skill to master but the rewards can be high. The pace and verve of Leicester’s attack has been eye-catching but it is the execution of these attacks, rather than the actual speed of them that has been the most important factor. Slowing Leicester down isn’t the key to stopping them, rather the focus should be either on denying them those potential counter-attacking situations or diluting their impact should you find yourself on the receiving end of one.

Whether they can sustain their attacking output from these situations is a difficult question to answer. If we examine how well output is maintained from one year to the next, the correlation for expected goals from counter-attacks is reasonable (0.55), while goal expectation per shot is lower (0.30). Many factors will determine the values here, not least the relatively small number of shots per season of this type, as well as a host of other intrinsic football factors. For fast-attacks, the correlations rise to 0.59 for expected goals and 0.52 for expected goals per shot. For comparison, the values for all open-play shots in my data-set are 0.91 and 0.63.

Examining the data in a little more depth suggests that the better counter-attacking and/or fast-paced teams tend to maintain their output, particularly if they retain managerial and squad continuity. Leicester have a good attack overall that is excellent at exploiting space with fast-attacking moves.

Retaining and perhaps even supplementing their attacking core over the summer would likely go a long way to maintaining a style of play that has brought them rich rewards.

Over on StatsBomb, I’ve written about Leicester’s attacking exploits this season, specifically focusing on the style and effectiveness of their attack. That required a fair amount of research into various aspects relating to the speed and directness of teams attacks, which I’ve looked into since I started looking at possessions and expected goals.

One output of all that is a bunch of numbers at the team and player level stretching back over the past four seasons about fast-attacks and counter-attacks, some of which I will post below along with some comments.

As a brief reminder, a possession is a passage of play where a team maintains unbroken control of the ball. I class a possession moving at greater than 5 m/s on average as ‘fast’ based on looking at a bunch of diagnostics relating to all possessions i.e. not just those ending with a shot. The final number is fairly arbitrary as I just went with a round number rather than a precisely calculated one but the interpretation of the results didn’t shift much when altering the boundary. Looking at the data, there is probably some separation into slow attacks (<2 m/s), medium-paced attacks (2-5 m/s) and then the fast attacks (>5 m/s). Note that some attacks go away from goal, so they end up with a negative speed (technically I’m calculating velocity here but I’ll leave that for another time), so these are attacks towards the goal.

Counter-attacks are when these fast-paced moves begin in a teams own half. Again this is fairly arbitrary from a data point-of-view but it at least fits in with what I think most would consider to be a counter-attack and it’s very easy to split the data into narrower bands in future.

All of the numbers below are based on my expected goals model using open-play shots only. I don’t include a speed of attack or counter-attacking adjustment in my model.

So, without further ado, here are some graphs…

Top-20 offensive fast-attacking teams

Top 20 teams in terms of fast-attacking expected goals for over the past four seasons.

Champions Elect Leicester City sit atop the pile with a reasonable gap on THAT Liverpool team, with a fairly big drop to the chasing pack behind. Arsenal and Manchester City are quite well represented here illustrating the diversity of their attacks – while both are typically among the slowest teams on average, they can step it up effectively when presented with the opportunity.

Top-20 offensive counter-attacking teams

Top 20 teams in terms of counter-attacking expected goals for over the past four seasons.

Number one isn’t a huge shock, with this years Leicester City narrowly ahead of the 12/13 iteration of Liverpool. A lot of the same teams are found in both the fast-attacking and counter-attacking brackets, which isn’t a great surprise perhaps.

Southampton this year are perhaps a little surprising and it is a big shift from previous seasons (0.056-0.075 per game), although I’ll admit I haven’t paid them that much attention this year. Their defense is the 6th worst in this period on counter-attacks also (3rd worst on fast-attacks). When did Southampton become a basketball team?

What is particularly noticeable is the prevalence of teams from the past two seasons in the top-10. A trend towards more-transition orientated play? Something to examine in more detail at another time perhaps.

Top-20 defensive fast-attacking teams

Top 20 teams in terms of fast-attacking expected goals against over the past four seasons.

Most of the best performances on the defensive side are from the 12/13 and 13/14 seasons, which might give some credence to a greater emphasis more recently on transitions along with an inability to cope with them.

The list overall is populated by the relative mainstays of Manchester City, Liverpool and West Brom along with various fingerprints from Mourinho, Warnock and Pulis

Top-20 defensive counter-attacking teams

Top 20 teams in terms of counter-attacking expected goals against over the past four seasons.

Interestingly there is a greater diversity between the counter-attacking and fast-attacking metrics on the defensive side of the ball than on the offensive side, which might point to potential strengths and/or weaknesses in certain teams.

Spurs last season rank as the worst defensive side in terms of counter-attacking expected goals against, and are narrowly beaten into second spot for fast-attacks by the truly awful 2012/13 Reading team.

Top-20 fast-attacking players

Top 20 players in terms of fast-attacking expected goals per 90 minutes over the past four seasons. Minimum 2,700 minutes played.

Lastly, we’ll take a quick look at players. For now, I’m just isolating the player who took the shot, rather than those who participated in the build-up to the goal. A lot of this will be tied up in playing style and team effects.

Jamie Vardy is clearly the standout name here, followed by Daniel Sturridge and Danny Ings. Sturridge leads the chart in terms of actual goals with 0.21 goals per 90 minutes, with Vardy third on 0.18.

Vardy’s overall open-play expected goals per 90 minutes stands at 0.26 by my numbers over the past two seasons, so over half of his xG per 90 comes from getting on the end of fast-attacking moves. He sits in 16th place over all for those with over 2,700 minutes played, which is respectable but he is clearly elite when it comes to faster-paced attacks.

Top-20 counter-attacking players

Top 20 players in terms of counter-attacking expected goals per 90 minutes over the past four seasons. Minimum 2,700 minutes played.

Danny Ings sits on top when it comes to counter-attacking, which bodes well for his future under Jürgen Klopp at Liverpool, providing his injury hasn’t unduly affected him. Again, Sturridge leads the list in terms of actual goals with 0.13 per 90 minutes, with Vardy second on 0.12. The sample sizes are lower here, so we would expect a greater degree of variance in terms of the comparison between reality and expectation.

One of the interesting things when comparing these lists is the divergence and/or similarities between the overall goal scorer chart. For example, Edin Džeko and Wilfried Bony sit in first and fourth place respectively in the overall table for this period but lie outside the top-20 when it comes to faster-paced attacks. A clear application of this type of work is player profiling to fit the particular style and needs of a prospective team, which Paul Riley has previously shown to be a useful method for evaluating forwards.

Moving forward

I wanted to post these as a starting point for discussion before I drill down further into the details in the future. The data presented here and that underlying it are very rich in detail and potential applications, which I have already started to explore. In particular, there is a lot of spatial information encapsulated in the data that can inform how teams attack and defend, which can help to build further descriptive elements to team styles along side measures of their effectiveness.

Football is a complex game that has many facets that are tough to represent with numbers. As far as public analytics goes, the metrics available are best at assessing team strength, while individual player assessments are strongest for attacking players due to their heavy reliance on counting statistics relating to on-the-ball numbers. This makes assessing defenders and goalkeepers a particular challenge as we miss the off-ball positional adjustments and awareness that marks out the best proponents of the defensive side of the game.

One potential avenue is to examine metrics from a ‘top-down’ perspective i.e. we look at overall results and attempt to untangle how a player contributed to that result. This has the benefit of not relying on the incomplete picture provided by on-ball statistics but we do lose process level information on how a player contributes to overall team performance (although we could use other methods to investigate this).

As far as football is concerned, there are a few methods that aim to do this, with Goalimpact being probably the most well-known. Goalimpact attempts to measure ‘the extent that a player contributes to the goal difference per minute of a team’ via a complex method and impressively broad dataset. Daniel Altman has a metric based on ‘Shapley‘ values that looks at how individual players contribute to the expected goals created and conceded while playing.

Outside of football, one of the most popular statistics to measure player contribution to overall results is the concept of plus-minus (or +/-) statistics, which is commonly used within basketball, as well as ice hockey. The most basic of these metrics simply counts the goals or points scored and conceded while a player is on the pitch and comes up with an overall number to represent their contribution. There are many issues with such an approach, such as who a player is playing along side, their opponent and the venue of a match; James Grayson memorably illustrated some of these issues within football when WhoScored claimed that Barcelona were a better team without Xavi Hernández.

Several methods exist in other sports to control for these factors (basically they add in a lot more maths) and some of these have found their way to football. Ford Bohrmann and Howard Hamilton had a crack at the problem here and here respectively but found the results unsatisfactory. Martin Eastwood used a Bayesian approach to rate players based on the goal difference of their team while they are playing, which came up with more encouraging results.

Expected goals

One of the potential issues with applying plus-minus to football is the low scoring nature of the sport. A heavily influential player could play a run of games where his side can’t hit the proverbial barn door, whereas another player could be fortunate to play during a hot-streak from one of his fellow players. Goal-scoring is noisy in football, so perhaps we can utilise a measure that irons out some of this noise but still represents a good measure of team performance. Step forward expected goals.

Instead of basing the plus-minus calculation on goals, I’ve used my non-shot expected goal numbers as the input. The method splits each match into separate periods and logs which players are on the pitch at a given time. A new segment starts when a lineup changes i.e. when a substitution occurs or a player is sent off. The expected goals for each team are then calculated for each period and converted to a value per 90 minutes. Each player is a ‘variable’ in the equation, with the idea being that their contribution to a teams expected goal difference can be ‘solved’ via the regression equation.

For more details on the maths side of plus-minus, I would recommend checking out Howard Hamilton’s article. I used ridge regression, which is similar to linear regression but the calculated coefficients tend to be pulled towards zero (essentially it increases bias while limiting huge outliers, so there is a tradeoff between bias and variance).

As a first step, I’ve calculated the plus-minus figures over the previous three English Premier League seasons (2012/13 to 2014/15). Every player that has appeared in the league is included as I didn’t find there was much difference when excluding players under a certain threshold of minutes played (this also avoids having to include such players in some other manner, which is typically done in basketball plus-minus). However, estimates for players with fewer than approximately 900 minutes played are less robust.

The chart below shows the proportion of players with a certain plus-minus score per 90 minutes played. As far as interpretation goes, if we took a team made up of 11 players, each with a plus-minus score of zero, the expected goal difference of the team would add up to zero. If we then replaced one of the players with one with a plus-minus of 0.10, the team’s expected goal difference would be raised to 0.10.

Distribution of xG plus-minus scores.

The range of plus-minus scores is from -0.15 to 0.15, so replacing a player with a plus-minus score of zero with one with a score of 0.15 would equate to an extra 5.7 goals over a Premier League season. Based on this analysis by James Grayson, that would equate to approximately 3.5-4.0 points over a season on average. This is comparable to figures published relating to calculations based on the Goalimpact metric system discussed earlier. That probably seems a little on the low side for what we might generally assume would be the impact of a single player, which could point towards the method either narrowing the distribution too much (my hunch) or an overestimate in our intuition. Validation will have to wait for another day

Most valuable players

Below is a table of the top 13 players according to the model. Vincent Kompany is ranked the highest by this method; on one hand this is surprising given the often strong criticism that he receives but then on the other, when he is missing, those replacing him in Manchester City’s back-line look far worse and the team overall suffers. According to my non-shots xG model, Manchester City have been comfortably the best team over the previous three seasons and are somewhat accordingly well-represented here.

Top 13 players by xG plus-minus scores for the 2012/13-2014/15 Premier League seasons. Minimum minutes played was 3420 i.e. equivalent to a full 38 match season.

Probably the most surprising name on the list is at number three…step forward Joe Allen! I doubt even Joe’s closest relatives would rate him as the third best player in the league but I think that what the model is trying to say here is that Allen is a very valuable cog who improves the overall performance level of the team. Framed in that way, it is perhaps slightly more believable (if only slightly) that his skill set gets more out of his team mates. When fit, Allen does bring added intelligence to the team and as a Liverpool fan, ‘intelligence’ isn’t usually a word I associate with the side. Highlighting players who don’t typically stand-out is one of the goals of this sort of analysis, so I’ll run with it for now while maintaining a healthy dose of skepticism.

I chose 13 as the cutoff in the table so that the top goalkeeper on the list, Hugo Lloris, is included so that an actual team could be put together. Note that this doesn’t factor in shot-stopping (I’ve actually excluded rebound shots, which might have been one way for goalkeepers to influence the scores more directly), so the rating for goalkeepers should be primarily related to other aspects of goalkeeping skills. Goalkeepers are probably still quite difficult to nail down with this method due to them rarely missing matches though, so there is a fairly large caveat with their ratings.

Being as this is just an initial look, I’m going to hold off on putting out a full list but I definitely will do in time once I’ve done some more validation work and ironed out some kinks.

Validation, Repeatability & Errors

Fairly technical section. You’ve been warned.

One of the key facets of using ridge regression is choosing a ‘suitable’ regularization parameter, which is what controls the bias-to-variance tradeoff; essentially larger values will pull the scores closer to zero. Choosing this objectively is difficult and in reality, some level of subjectivity is going to be involved at some stage of the analysis. I did A LOT of cross-validation analysis where I split the match segments into even and odd sets and ran the regression while varying a bunch of parameters (e.g. minutes cutoff, weighting of segment length, the regularization value). I then looked at the error between the regression coefficients (the player plus-minus scores) in the out-of-sample set compared to the in-sample set to choose my parameters. For the regularization parameter, I chose a value of 50 as that was where the error reached a minimum initially with relatively little change for larger values.

I also did some repeatability testing comparing consecutive seasons. As is common with plus-minus, the repeatability is very limited. That isn’t much of a surprise as the method is data-hungry and a single season doesn’t really cut it for most players. The bias introduced by the regularization doesn’t help either here. I don’t think that this is a death-knell for the method though, given the challenges involved and the limitations of the data.

In the table above, you probably noticed I included a column for errors, specifically the standard error. Typically, this has been where plus-minus has fallen down, particularly in relation to football. Simply put, the errors have been massive and have rendered interpretation practically impossible e.g. the errors for even the most highly rated players have been so large that statistically speaking it has been difficult to evaluate whether a player is even ‘above-average’.

I calculated the errors from the ridge regression via bootstrap resampling. There are some issues with combining ridge regression and bootstrapping (see discussion here and page 18 here) but these errors should give us some handle on the variability in the ratings.

You can see above that the errors are reasonably large, so the separation between players isn’t as good as you would want. In terms of their magnitude relative to the average scores, the errors are comparable to those I’ve found published for basketball. That provides some level of confidence as they’ve been demonstrated to have genuine utility there. Note that I’ve not cherry-picked the players above in terms of their standard errors either; encouragingly the errors don’t show any relationship with minutes played after approximately 900 minutes.

The gold road’s sure a long road

That is essentially it so far in terms of what I’m ready to share publicly. In terms of next steps, I want to expand this to include other leagues so that the model can keep track of players transferring in and out of a league. For example, Luis Suárez disappears when the model reaches the 2014/15 season, when in reality he was settling in quite nicely at Barcelona. That likely means that his rating isn’t a true reflection of his overall level over the period.

Evaluating performance over time is also a big thing I want to be able to do; a three year average is probably not ideal, so either some weighting for more recent seasons or a moving two season window would be better. This is typically what has been done in basketball and based on initial testing, it doesn’t appear to add more noise to the results.

Validating the ratings in some fashion is going to be a challenge but I have some ideas on how to go about that. One of the advantages of plus-minus style metrics is that they break-down team level performance to the player level, which is great as it means that adding the players back up into a team or squad essentially correlates perfectly with team performance (as represented by expected goals here). However, that does result in a tautology if the validation is based on evaluating team performance unless there are fundamental shifts in team makeup e.g. a large number of transfers in and out of a squad or injuries to key personnel.

This is just a start, so there will be more to come over time. The aim isn’t to provide a perfect representation of player contribution but to add an extra viewpoint to squad and player evaluation. Combining it with other data analysis and scouting would be the longer-term goal.

I’ll leave you with piano carrier extradionaire, Joe Allen.

Joe Allen on hearing that he is Liverpool’s most important player over the past three years.

The narrative surrounding Arsenal has been strong this week, with their fall to fourth place in the table coming on Groundhog Day no less. This came despite a strong second half showing against Southampton, with Fraser Forster denying them. Arsenal’s season has been characterised by several excellent performances in terms of expected goals but the scoreline hasn’t always reflected their statistical dominance. Colin Trainor illustrated their travails in front of goal in this tweet.

I wrote in this post on how Arsenal’s patient approach eschews more speculative shots in search of high quality chances and that this was seemingly more pronounced this season. Arsenal are highly rated by expected goal models this season but traditional shot metrics are nowhere near as convinced.

Analytical folk will point to the high quality of Arsenal’s shots this season to explain the difference, where quality is denoted by the average probability that a shot will be scored. For example, a team with an average shot quality of 0.10 would ‘expect’ to score around 10% of their shots taken.

In the chart below, I’ve looked at the full distribution of Arsenal’s shots in open-play this season in terms of ‘shot quality’ and compared them with their previous incarnations and peers from the 2012/13 season through to the present. Looking at shot quality in this manner illustrates that the majority of shots are of relatively low quality (less than 10% chance of being scored) and that the distribution is heavily-skewed.

Proportion of total shots in open-play according to the probability of them being scored (expected goals per shot). Grey lines are non-Arsenal teams from the English Premier League from 2012/13 to the present. Blue lines are previous Arsenal teams, while red is Arsenal from this season. Data via Opta.

In terms of Arsenal, what stands out here is that their current incarnation are taking a smaller proportion of ‘low-quality’ shots (those with an expected goal estimate from 0-0.1) than any previous team by a fairly wide margin. At present, 59% of Arsenal’s shots reside in this bracket, with the next lowest sitting at 64%. Their absolute number of shots in this bracket has also fallen compared to previous seasons.

Moving along the scale, Arsenal reside along the upper edge in terms of these higher quality shots and actually have the largest proportion in the 0.2-0.3 and 0.3-0.4 ranges. As you would expect, they’ve traded higher quality shots for lower quality efforts according to the data.

Arsenal typically post above average shot quality figures but the shift this season appears to be significant. The question is why?

Mesut Özil?

One big change this season is the sustained presence (and excellence) of Mesut Özil; so far this season he has made 22 appearances (playing in 88% of available minutes) compared to 22 appearances last season (54%) and 26 matches in his debut season (63%). According to numbers from the Football in the Clouds website, his contribution to Arsenal’s shots while he is on the pitch is at 40% compared to 30% in 2014/15. Daniel Altman also illustrated Özil’s growing influence in his post in December.

Özil is the star that Arsenal’s band of attacking talent orbits, so it is possible that he is driving this focus on quality via his creative skills. His attacking contribution in terms of shots and shot-assists is among the highest in the league but is heavily-skewed towards assisting others, which is unusual among high-volume contributors.

Looking at the two previous seasons though, there doesn’t appear to be any great shift in Arsenal’s shot quality during the periods when Özil was out of the team through injury. His greater influence and regular presence in the side this season has probably shifted the dial but quantifying how much would require further analysis.

Analytics?

Another potential driver could be that Wenger and his coaching staff have attempted to adjust Arsenal’s tactics/style with a greater focus on quality.

Below is a table of Arsenal’s ‘volume’ shooters over the past few seasons, where I’ve listed their number of shots from outside of the box per 90 minutes and the proportion of their shots from outside the box. Note that these are for all shots, so set-pieces are included but it shouldn’t skew the story too much.

The general trend is that Arsenal’s players have been taking fewer shots from outside of the box this season compared to previous and that there has been a decline proportionally for most players also. Some of that may be driven by changing roles/positions in the team but there appears to be a clear shift in their shot profiles. Giroud for example has taken just 3 shots from outside the box this season, which is in stark contrast to his previous profile.

Given the data I’ve already outlined, the above isn’t unexpected but then we’re back to the question of why?

Wenger has mentioned expected goals on a few occasions now and has reportedly been working more closely with the analytics team that Arsenal acquired in 2012. Given his history and reputation, we can be relatively sure that Wenger would appreciate the merits of shot quality; could the closer working relationship and trust developed with the analytics team have led to him placing an even greater emphasis on seeking better shooting opportunities?

The above is just a theory but the shift in emphasis does appear to be significant and is an interesting feature to ponder.

Adjusted expectations?

Whatever has driven this shift in Arsenal’s shot profile, the change is quite pronounced. From an opposition strategy perspective, this presents an interesting question: if you’re aware of this shift in emphasis, whether through video analysis or data, do you alter your defensive strategy accordingly?

While Arsenal’s under-performance in terms of goals versus expected goals currently looks like a case of variance biting hard, could this be prolonged if their opponents adjust? It doesn’t look like their opponents have altered tactics thus far based on examining the data but having shifted the goalposts in terms of shot quality, could this be their undoing?