Pages

Monday, 11 April 2016

This model is wrong but it may be useful

Expected Goals (xG/Goal Likelihood) models are increasingly
common and try to add a bit more understanding to what’s going on in a game to
try and get beyond just the scoreline or the top level shot stats.

None of this is to knock a lot of good work that’s being
done in this field (there’s a list of useful links at the end of this), but
anyone can build a model. Technically
you could call what Lawro does in his predictions a model even if it’s probably
a fairly simple ‘mental
flowchart’ to pop out the score at the other end.

The aim of this piece is give a very basic introduction to
looking at Expected Goals. It’s not even
that big a leap from when managers talk about restricting the other teams ‘big
chances’.

All figures that follow come using Opta data but will have
been butchered and filleted by my own fair hands so any errors/omissions etc.,
will be mine not theirs.

Analysis excludes Penalties and Free Kicks which are special
cases and will be dealt with separately some other time although there’s plenty
on free kicks from across the Big 5 Leagues on my blog.

If we start with a very simple model where we assume all shots
are created equal we get the following for Premier League data:

So if we apply the 2010-2014 conversion rate to 2014/15
activity (excluding set piece/own goals) we get the following:

This, as I’ve mentioned before is a model, just because you
call something a model doesn’t necessarily make it any good.

As always, the key thing with looking at any numbers is the
inferences, you could look at the numbers and say:

Chelsea were lucky

Chelsea were more skilful in converting chances

Chelsea created better than average chances

The model's crap

Many the time I’ve been in a meeting post a marketing
campaign and the conversation has gone something like this:

Boss: We forecast sales of £100k but only sold £73k, why was
that?

Marketing Wonk 1: The weather hasn’t been very good the last
couple of weeks?

Marketing Wonk 2: People’s budgets are stretched
post-Christmas?

Me (in my head, as I'm a coward): It’s because the £100k figure was a nice
round number you pulled out of thin air and has no real basis in fact

It’s a natural trait to try and rationalise any figures you
see after the event but generally you should always be wary of any explanation
you’re given and should try to at least run a quick sense test on things.

A nice example of this is the Baresi/Maldini stat of 23
goals conceded in 196 games that was doing the rounds last year even though
it’s completely wrong (there’s a good piece here
on it).

Going back to our crude expected goals model, obviously not
all shots are equal, if we do a basic inside/outside box split we get the
following:

Shots from inside the box convert at 4-5 times those outside
the box, so this split helps differentiate between those who are shot heavy
outside the box (e.g., QPR with 47% of shots coming from outside the box) with
those doing their work in better positions (Man C with only 27% of their shots
from outside the box).

At Leicester’s ‘Tactical
Insights’ performance analysis event a few weeks ago, when asked about
stats, Roy Hodgson said if Shots suddenly were the key thing he’d get people to
shoot from half way, he was being a bit flippant as even a 3-year-old knows a
shot at an open goal from 3 yards has a better chance of being a goal than a
punt from 40 yards.

What you’re left with then is a balance between more and
more detail (angle of shot, headers, defensive pressure) better explaining a team/player’s activity against over-complicating
things and creating a ‘black box’ approach which spits out a final number that may be harder to explain to players/management.

Using data is better than not using data, using detailed
data (e.g., Shot Location) is better still and even better again is combining
it with video.

In the chance below, Rooney has a simple tap-in so at the
point the shot is taken the likelihood of a goal is close to 1, if you treated
that shot the same as all others from that location then Rooney (and Man Utd)
would then be outperforming xG, but unless you had the video you wouldn’t know
if this was luck, good finishing, good positioning by the forward or good
chance creation.

Beyond the shot itself, subsequent versions would have a
more fluent xG at any given time (there’s a number of people building non-shots models) which basically say ‘I have the ball in this position, what’s the
likelihood I score before giving away possession’. In the case of the Rooney goal you could take
it several steps back with the likelihood increasing as each part of the move
is successfully completed.

The example below from Basketball gives a good example of
this in practice:

The concept that some areas are better to shoot from that
others isn’t a hugely difficult one to grasp although if the example
from American Football is to go by from earlier this year, maybe teams aren’t
fully optimising their activity.