Premier League preview week: Is this the year advanced stats, Expected Goals finally go mainstream?

Arsenal is among the first Premier League clubs to fully embrace analytics. (Getty)

Welcome to FC Yahoo’s Premier League preview week. We’ll take a look at each team in our aggregated predicted table, counting down from No. 20 to No. 1, and also reflect on some issues surrounding the league as kickoff approaches on Friday. Follow along with everything here.

It was exactly a month ago at an upscale hotel in central London that Danny Murphy — yes, the same Danny Murphy who used to patrol Premier League midfields for Liverpool and Fulham — sat on stage, in a slick white dress shirt, and in his new element. Murphy, now a popular BBC pundit, was part of a five-man panel at an Opta event discussing the future of data in soccer. And he was on a roll.

Almost 30 minutes in, he was animatedly explaining the value of using chances created, as opposed to assists, to judge playmaking ability when the panel’s moderator chimed in: “That’s relatively new,” he pointed out, referring to the chances created stat. Murphy responded: “Yeah, it is. It is. Exactly.” And he continued, his hands gesturing, his weight shifting side to side.

Not 30 seconds later, though, Murphy’s head snapped to his left. Duncan Alexander, Opta’s head of UK content, had interjected. “It’s interesting that you said that’s relatively new,” he began. “But back in ’96, Opta was collecting chances created data. It just took longer to reach the general public.”

OK, it didn’t just take longer. It took ages. Sometimes over a decade. But finally, it seems, the most basic of advanced stats are beginning to find their way into mainstream media coverage of the Premier League. Sky Sports is using per-90 metrics on TV broadcasts. Sky, the BBC, the Guardian, the Telegraph and others have run explainers on expected goals. Gradually, everyone is catching on.

Nothing you’ll see on a TV screen or in a Guardian article is eye-opening to anybody working inside a Premier League club. Opta, the sports data company headquartered in London, has been nurturing expected goals since last decade. Arsene Wenger has discussed it at news conferences. Wenger’s club, Arsenal, purchased an entire data company three years ago.

Now, several years after many clubs began integrating complex data analysis into scouting and recruitment, the same models and stats, albeit simplified versions of them, are bleeding out into the open.

“These things take a while to seep through to public consciousness,” Alexander said at the Opta event.

And now that they have, fans will be able to better understand the game they love.

Explaining and assessing expected goals

“Expected goals.” That’s the stat you’ll hear about most. It’s often abbreviated as xG. And, once you get comfortable with it, it’s a relatively simple concept.

Expected goals, on a game-by-game or full-season basis, is a measure that blends chance quantity with quality. In a way, it’s a beefed-up, more involved and more accurate version of pure shot totals. On an individual shot basis, it’s an intuitive numerical way to assess chance quality that is easy to digest.

Expected goals models, which are developed based on analysis of hundreds of thousands of shots from old games, assign a number, between 0.0 and 1.0, to every shot taken in a match. The number is the probability that the shot, based on a variety of factors, would result in a goal if taken by an average player. An xG value of 0.5 means there is a 50 percent chance the shot results in a goal. A value of 0.01 means there is a 1 percent chance.

Shot location. The farther away a player is from goal, the less likely he is to score. And the farther a player moves away from the center of the field — as his visible angle of the goal decreases — the less likely he is to score.

Shot type. A shot with the foot is more likely to result in a goal than a header.

Assist type. For example, a chance created by a through ball is more likely to yield a goal than a chance created by a cross or a long ball.

Take-ons. If a shot comes after an attacker has beaten a defender with a dribble, the shot is more likely to result in a goal.

Speed of attack. Shots at the end of attacks that cover a lot of ground in a short time (i.e. counterattacks) are generally more likely to result in goals than shots at the end of slow buildups. This is a feature of Caley’s model.

1-v-1s. They are, naturally, more likely to result in goals than other shots.

There is one subjective factor: Big Chances. Opta’s game coders are responsible for tagging some shots as “Big Chances,” or “chances where a player should reasonably be expected to score.” This is a way of accounting for factors, such as an empty net, that data alone can’t pick up.

(Note: There is not one all-powerful xG model that every statistician uses; Caley’s model, for example, is different than Opta’s, which is probably different than Arsenal’s. But each likely has its own way of accounting for all these factors, even if the categorizations or methods of calculation are different.)

The idea is not that a computer will spit out the exact probability of a goal being scored at the moment it leaves a striker’s foot. The idea is that xG models provide an as-good-as-possible estimate of that probability based on objective factors if a league-average player were on the end of the chance. Those estimates do not and should not serve as the be-all, end-all of soccer analysis, but they are particularly useful in a few main ways.

How to use expected goals

Expected goals can be used when analyzing a game retrospectively. Let’s take a look back at an early-season Liverpool-Burnley game from last August. The Reds lost 2-0. But did they really play that poorly? The xG map actually gave a fairly comprehensive answer:

Liverpool was unlucky to lose. A quick glance at the map tells us that. But so did a traditional post-match stat package. Jurgen Klopp’s team bossed the game, with 81 percent possession, and 26 shots to Burnley’s three. On most days, their apparent dominance would have been sufficient.

But xG tells us more than the simple stats. It shows us that, for all their monopolization of the ball, the Reds didn’t really create that many quality chances. Burnley conceded the ball, and conceded low-percentage shots from outside the penalty area, but held firm at the edge of the box. Liverpool’s efforts to penetrate Burnley’s rear guard were ineffective, and its attack, despite all the possession and shots, stalled. It wasn’t a terrible performance; but it wasn’t actually all that dominant either.

Because not all shots are created equal, shot totals, or even shots on target totals, can be deceiving representations of how a game transpired. Expected goals isn’t perfect either — for example, it won’t pick up that 2-v-1 break when a winger’s square pass is just out of the reach of a striker at the back post — but it is generally a good tool to judge “who had the better of the play.”

Expected goals numbers can also be broken down and used for descriptive analysis. On a very simple level, xG per shot ratios can tell us how selective a team is with its shooting. On a more complex level, because xG has so many components, those components can be isolated to tell us about various styles and tendencies that might not be obvious to the naked eye. But that’s another discussion for another day.

The other main utility of expected goals is its predictive capacity. Let’s say we want to predict how a team will perform in the upcoming 2017-18 EPL season. One stat to use would be that team’s point total from last season. But, in reality, goal differential is a better predictor than points. And total shots ratio (shots taken minus opponent shots taken) is a better predictor than goal differential. And — you guessed it — expected goals is a better predictor than total shots ratio.

Thus, cumulative expected goals (and expected goals against) totals from the previous campaign are a better starting point for analysis of the upcoming season than 2016-17 order of finish.

What do the numbers tell us about the 2017-18 title race?

Expected goals is one reason Manchester City is the clear favorite in the Premier League this season. Although Pep Guardiola’s team finished third last time out, 15 points off Chelsea’s pace, it underperformed its expected goal differential (xGD) by around 10, a significant margin considering top teams with top attacking talent and top goalkeepers usually overperform their xGD.

That’s exactly what Antonio Conte’s Chelsea did. The Blues overperformed their xGD by almost 20, with the full disparity (and then some) coming from the attacking end. Tottenham overperformed its xGD by more than any team in the league.

City led last season’s xGD table by a significant margin with an expected differential of around 50. Therefore, with better goalkeeping — Ederson has been brought in to replace the dreadful Claudio Bravo — and better finishing — Sergio Aguero led the league in expected goals, but had an uncharacteristically average year when it came to actually scoring them — regression to the mean should propel the Citizens to the top of the table.

What can the numbers tell us about individual players?

Here’s where things get a bit more complicated. Expected goals can tell us something about strikers, but exactly what it tells us differs on a case-by-case basis. (It’s also a matter of constant debate.)

We can, abstractly, break goal-scoring talent down into two separate abilities: the ability to put oneself in the position to score and the ability to put a chance away. Expected goals measures the former. Can it tell us about the latter?

Let’s look at two examples. One: Harry Kane. Kane burst onto the scene in 2014-15 with 31 goals in all competitions, and 21 in the Premier League. He scored 18 non-penalty goals (side note: always use non-penalty goals when assessing goal-scoring) on roughly 14 expected goals.

Many speculated that Kane’s breakout season might have been slightly flukey. Some even dared use the term “one-year wonder.” But then in 2015-16, Kane improved his xG output to roughly 19 and still outpaced it with another 20 non-penalty goals. This past season, his xG total fell to around 12, but he scored a whopping 24 non-penalty goals, the most in the league.

Kane has now finished at an above-average rate in his age 21, 22 and 23 seasons. That’s probably enough evidence to suggest that his xG overperformance is not randomness, it’s skill.

The Cherries’ top scorer tallied 14 non-penalty goals last season, but on approximately 7 xG. King, unlike Kane, has no history of finishing at such a high rate. He scored six goals on roughly five xG in 2015-16. He scored just five total goals in 64 appearances for Blackburn in the Championship the three years prior.

So what’s more likely? That King suddenly turned into a world-class finisher in 2016-17? Or that his season was somewhat flukey?

To definitively answer that question, far more in-depth analysis (including non-statistical analysis) is required. But xG can give us hints as to which of the explanations for King’s breakout campaign is more plausible. It’s the second one.

To take xG beyond strikers and goal-scoring, the same principles can be applied to calculate xA, or expected assists. Expected assists are to chances created and assists what expected goals are to shots and goals scored. They’re a better representation of a playmaker’s ability to set up not just chances but quality chances.

Whether it’s xA or xG, or another single-number metric, extrapolation requires care and discretion. Just like no two chances are the same, no two xG figures are the same. The numbers must be placed in context, and will never tell the whole story. But they can certainly help tell it.

How data has already changed soccer

For those inside the sport, data-driven analysis is not a new concept. It is used, to varying degrees at different clubs, to identify and evaluate transfer targets. It is used to scout and prepare for upcoming opponents. It is used to review performance and pinpoint areas of potential improvement.

Shooting models in particular have already influenced the way the game is played on the field. The 35-yard wonder goal has largely gone extinct, probably because analytics have enlightened teams and players to the inefficiency of a shot closer to midfield than the goal line. In many cases, a pass is the more efficient option.

Over time, that knowledge has been drilled into players, both on and off the training pitch. As a result, goals aren’t as majestic, but attacks are generally more efficient. Analytics aren’t the only reason, but six of the seven highest-scoring seasons in the history of the Premier League have occurred between 2009-10 and the present. (The 2016-17 season ranked second.)

As the data-driven analytical capabilities of clubs continue to improve, the sport will continue to evolve. It’s a sport that notoriously loathes change. But it’s one rife with inefficiencies waiting to be exploited.

And as the mainstream media, whose job it is to tell the sport’s most compelling and influential stories, recognize the evolution, its coverage will evolve too. Sky’s xG/90 charts sent pockets of analytics Twitter into binges of joy. But it’s just the start. Advanced stats are coming. There’s no turning back now.