Pages

Saturday, 30 March 2013

If Manchester United continue to score goals at their
current average rate, they will finish the season with 90 goals, their highest
total since 1999/00 when they claimed 97. It will also be their second highest
in their Premiership history, including the two seasons when a 22 team
Premiership each played 42 matches.

Equally, if we scale up their projected number of goals
conceded, they will also have allowed one of their highest goal totals over a
single season. The defence has so far yielded 31 in 29 matches, so a final
reckoning in the region of 40 wouldn’t be too surprising. United’s previous defensive low point saw
them concede 45 goals in the 1999/00 season, coincidentally the same year they
were at their most prolific at the opposite end of the field. Anyone with a
ticket for a United game at the turn of that decade were guaranteed goals.

No Premiership team has performed at a consistently high
level as Manchester United over the life of the EPL. For example, Arsenal and
Chelsea were locked together in the 1993/94 season, but in respectively 10th
and 11th positions, rather than scaling their more common recent
heights and Manchester City has not even been a permanent member of the elite
division. Therefore, if we look at United’s Premiership record, we are comparing
sides of genuinely consistent high quality.

Their two seasons where they concede more than usual, but
also score at a more prolific rate, hints at an ability to score more goals, if required and if you plot total goals
scored against total goals conceded for United over their Premiership life, the
correlation is reasonable (r^2 0.21) and
in the direction expected.

As well as conceding more often this term, United have also conceded
the game’s first goal at an elevated rate for future Champions. They have failed to land the opening blow in
around a dozen league games and they were similarly generous in 1999/00. However,
then as now they have stormed back to more often win matches where they have
trailed at rates well in excess of those you would expect from even a top class
EPL side.

This begs the question, is United, especially their attack,
in horse racing parlance, an unexposed “horse” ?

Teams rarely win competitions by goal difference (you have
to go back almost ten months to find the last occasion where the EPL title was
decided in such a way). During matches, sides try to maximize their likelihood of
winning the games, but they aren’t obligated to maximize the margin by which
they achieve victory.

International matches played in the early part of the
century saw the top tier nations typically record 2-0 wins against the emerging
European nations, as they adapted a safety first approach to winning such
matches, especially when faced with well organised defences, but mediocre
attacks.

United’s profile of greatly increased scoring ability when
required to overcome a clumsy or unlucky defence is shared by Arsenal at their
contending best, but no other teams exhibit it as a strongly repeatable trait.

An extremely high quality racehorse can often beat lesser
rivals with very little effort and in doing so record performance ratings well
below their true capabilities. The excess ability is obvious, but there is no
need to demonstrate it. The sight of United trailing to virtually half the
Premiership in 2012/13, yet still win "going away" in the majority of cases may
be a manifestation of this extra ability being required to win a particular football match.

Are Manchester United, football's equivalent of a racehorse winning with something in hand?

If this is the case, it has implications for every ratings
based attempt to predict match outcomes or even down to individual player
ratings. The ratings you record and use are already subject to the different
game states and match contexts in which they are measured, but increasingly for
the best teams you may not know just how good a side can be until they
experience a season where the concession of the opening goal swings toward
expectable extremes.

As ever, bookmakers prices are the quickest place to
validate these ideas. Average scoring rates, corrected and collected over time,
to allow for opponent strength and home field advantage and then converted via
a device such as a poisson to produce match odds, give good agreement with
quoted odds for a wide range of run of the mill Premiership games. But this
method often underestimates the chances of top sides, compared to quoted
bookmaker odds.

The professional modelers, it would seem are allowing for
hidden and rarely needed improvement in the Premiership’s unexposed dark horses, such as
Manchester United and Arsenal. When looking at the very best, there is unlikely to be a simple, linear correlation between their previously measured performance and their future predicted achievements.

Friday, 29 March 2013

Teams in the bottom half of the Premiership have reached the nervous thirties and with 18th placed Wigan a game behind, potentially we could see a logjam of sides separated by just four points. In this guest post I look at how struggling sides perform on average during the final run in and more importantly for those currently in the apparently safe mid table positions, the extremes of performance that relegation threatened teams can muster over a limited number of games.

Thursday, 28 March 2013

Two goals, separated by nine months, but with similar Champions League significance for firstly, Chelsea and then Tottenham. On Monday night*, Gareth Bale picked himself off the floor just as the clock was ticking into the red zone and then produced the kind of dipping, careering shot into West Ham's top corner that is still the exclusive preserve of the game's very best strikers of a football. One point became three and next year's Champions League participation inched slightly closer for this year's Europa League contenders.

Rewind to last May in Munich and the importance of corner kicks to the very best teams is reinforced. Hot on the heels of Manchester City edging the title on goal difference with an overwhelmingly superior corner conversion rate, Chelsea send the Champions League final into extra time when Didier Drogba thumps Mata's inswinging near post delivery into the top right hand corner of the net.

At first glance, two very different goals. There was nothing particularly challenging about setting up Bale with possession in the final third and all of the hard work started as the Welshman drew his foot back just before the strike. In contrast, Mata's delivery to, statistically the most dangerous area of a packed penalty area had to be exact and Drogba then needed to fight his way through a congested box to arrive first to the cross. Once he was in place for the header, a goal was very likely.

However, the goals do share some common ground in that both efforts ended up crossing the goal line at roughly the same point, just below the crossbar and to the left of the right hand upright from the striker's perspective.

Shot placement is likely to be a significant, minor contributing factor to include along with the x, y co-ordinates of the shot origin when estimating the average goal expectancy of a particular goal bound effort. A goal from Bale's position, wide of centre and well outside the penalty box was an extremely unlikely outcome, but intuitively you could assume that by choosing to aim for the top corner, compared to say the bottom corner, the chances of success were slightly tweaked in his favour. Similarly, Drogba's header may well have been marginally more taxing for the keeper had he directed his header downwards rather than at head height. Data collection can only improve shooting models such as these by including shot placement.

If shot placement tweaks goal expectancy defined by shot origin, conversion rates for different areas of the goal will also be significantly affected by position on the field from where such attempts originated. Attempts which are aimed at the top corner will show very different conversion rates if the sample is dominated by "Drogba" type efforts as opposed to "Bale" type ones. We can, however, begin to appreciate the average importance of shot placement by looking at aggregated conversion rates for all EPL sides spread over multiple seasons.

Shot placement conversion rates across different sections of the goal are readily available and can be manipulated to give likely average rates for individual locations. For example a shot along the ground to the centre of the goal to a point where the keeper is highly likely to be positioned, based on the average shot placement success rates for the last four seasons has a generic success rate of just over 10%. If a player manages to scrape the paint of the upright and crossbar with a shot to the very top corner, the average success rate jumps to over 70%.

The data to connect shot placement to shooting distance and angle isn't available, but we can use aggregated data for shot placement and hope that the shooting distances even out in an to attempt estimate the sustainability of the records of individual teams over recent seasons.

Such site as EPLIndex supply success and failure rates for six equal areas of the goal, but the most interesting are the four top and bottom corners of the net. Sample sizes, even for teams are small, so I've further aggregated left and right sided corners for both lower level and higher level shots to produce individual, seasonal team figures for efforts aimed at the top and bottom corners of the goal.

Small sample sizes are prone to large amounts of random variation around the actual talent level that is being measured and even by aggregating data the maximum number of shots aimed at an opponents top right and left sided corner of the net only hits highs of 40 over a season and drops to near single figures for some. Toss a fair coin once and you are guaranteed to be as far away from the true likelihood of a head appearing as it is possible to be. You'll see 100% or 0% heads compared to the true likelihood of 50%.

So we should expect poor converters of top corner chances to appear by random chance in such limited sample sizes, even when the true abilities of sides are closely matched. In my four season sample, Chelsea recorded a league highest 73% success rate with top corner shots during one season and Sunderland managed a low of 15% in another.

We need to try to know if the spread of the conversion rates seen by all Premiership sides over a four year period is characteristic of just randomness or is the spread significantly wider, possibly indicating an element of repeatably skill at work.

Maybe surprisingly, for shots aimed at the top corner there is no evidence that the spread of conversion rates differs much from those expected by chance if each side had league average conversion figures. Sunderland's 15% season was just a league average 43% side getting unlucky, the Wearsiders jumped to 63% in a subsequent season.

No side recorded consistently above or below average conversion rates for top corner shots over four seasons, again leading to randomness as the main cause of differing conversion records across teams. If you wanted to forecast a team's record in subsequent years, the league average would be a better indicator overall than their previous, individual achievements.

For shots aimed at the bottom corners, however, there is strong evidence for a substantial component of repeatable skill, based on the actual distribution of conversion rates. In addition, both Manchester clubs are above average over all four seasons. Along with Arsenal and Chelsea, they also occupy the top six performing seasons, until the almost inevitable arrival of Stoke City. Now if you wish to project conversion rates for bottom corner attempts, league average and past, actual team performance should be included.

Converting shots is a skill, but why converting in different areas of the goal a more repeatable skill compared to other targets will have to wait for another post.

* out of date reference because I wrote half of this post last month :-).

Tuesday, 19 March 2013

Step forward Italy. Not only did they record two victories in the recent rugby Six Nations competition, they also held England to a narrow seven point victory at Twickenham on the penultimate weekend. Thus ensuring that England's visit to Wales was not just about the hosts denying England the Grand Slam. The destination of the title was also very much up for grabs.

By holding England to their narrowest home winning margin against Italy, the Azzurri gave Wales the chance to win the title if they defeated England by at least seven points, assuming the Welsh maintained their overall try supremacy across the tournament as a whole. England went into the final game as narrow three point favourites, implying that they were almost a converted try superior to their hosts. But with the roof of the Millenium Stadium closed and passion aplenty, Wales pummeled England into first half submission, before drawing well clear in the second period with a brace of Alex Cuthbert tries.

Wales and England will of course meet again in the 2014 tournament, but an even more eagerly awaited rematch takes place in the opening group phases of the 2015 World Cup. The Rugby World Cup initially consists of four groups of five teams, so the meeting of two highly ranked teams at this early stage isn't particularly unusual. But as Simon Gleave points out here, Group A also contains Australia. So one of the currently 3rd, 4th and 5th rank sides in the world will fall before the knockout stages.

The draw for the 2015 competition took place on the 3rd of December 2012, a year after the conclusion of the 2011 tournament in which Wales finished 4th. So how did Wales return from the World Cup in New Zealand seeded inside the top eight, rise to fifth at the conclusion of the 2013 Six Nations, having retained the title they also won in 2012, but found themselves outside the seeded top eight for the all important 2015 world Cup draw which took place in December 2012.

48 hours before the draw, Wales lay in 7th place in the IRB rankings in front of Samoa and Argentina, who were 8th and 9th respectively and separated by the fourth decimal place of their rating figure. Nether of Wales' rivals were playing prior to the draw, but Wales entertained 3rd ranked Australia.

Crucially for Wales, the IRB ratings operates on a "points exchange" system, where the winning team always takes rankings points from the team they defeat, with the defeated team losing the same number of ranking points. In addition margin of defeat or victory is only accounted for where the margin is greater than 15 points. So a narrow defeat, even against one of the best teams in the world will always see the defeated team's ratings fall.

Australia defeated Wales on the game's last play, scoring 26 seconds before the clock turned red, the margin of defeat was just two points, although it may as well have been 15 and with Samoa and Argentina idle, Wales slipped to 9th in the seedings and into the 2015 Group of Death, ironically alongside Australia.

A more satisfactory approach would be to use a least squares or power rating approach, where unbalanced and incomplete schedules are accounted for and margins of victory or defeat are evaluated with full regard for the quality of the opposition faced. Teams which perform well, even in defeat against superior opponents aren't automatically penalised under this method and collateral form lines quickly appear between teams which rarely meet through results achieved against common opponents.

In short, ratings are created that best explain the observed match scorelines in previous contests and the difference in the ratings of two teams can be used to form a view of the likely margin of victory in any future matches.

From the start of 2011 to the 2015 draw in December 2012 Wales played 26 games against teams rated inside the IRB top 12 including 6 matches against Australia, each of which they lost, albeit usually by just a point or two and often in the dying seconds. Argentina played 16 such games and Samoa just 6. So not only is sample size an issue, but Wales, by testing themselves often and against the very best lay themselves open to leaking IRB ranking points by playing close, but losing encounters against the Southern Hemisphere's best. Had the final match against Australia, just two days before the draw, not taken place, Wales would have remained in the all important second pot of seeds.

In order to compare the the IRB ratings method, where margin of victory is ignored until it reaches an arbitrary 16 points, to a least squares based approach where m.o.v. is continuously applied, I calculated power ratings for international rugby sides firstly, from the start of 2011 up to the date of the draw on December 3rd 2012 and also post World Cup 2011, a period replete with narrow Welsh defeats.

Power Ratings Incorporating Strength of Schedule and Margin of Victory or Defeat.

Team.

Power Rating. Post 2011 World Cup until 2015 Draw.

Power Rating. Start 2011 Year until 2015 Draw.

New Zealand.

27.9

31

France.

16.2

18.6

South
Africa.

15.9

18.4

England.

14.3

17.6

Australia.

9.9

16.5

Ireland.

9.2

13.3

Wales.

8.9

15.3

Samoa.

8.2

13.1

Argentina.

3.3

6.1

Scotland.

1.4

8

Tonga.

0.9

2

Italy.

0

0

Wales were seeded in a comfortable eighth position under the IRB rankings at the conclusion of the 2011 World Cup, well clear of the Samoa. By December 2012 Samoa had overtaken them and escaped from pot three under the IRB rankings. Under the official rankings, Wales had leaked ranking points through narrow defeats to good sides and Samoa had largely remained static through inactivity, coupled to two narrow wins.

It's hard not to conclude that Wales' ranking suffered because the IRB ratings over value and are too sensitive to simply winning and ignores performance, even in defeat. Both the tabulated methods above that account for strength of schedule and margin of victory, keep Wales comfortably within the top eight nations.

So for almost two years prior to the 2015 draw and even during their poor run, post WC 2011, the Welsh had proved themselves superior to both Argentina and Samoa when actual scorelines took a more prominent part in the calculation.

England discovered that margins matter when they defeated Italy by only seven points on the penultimate weekend, handing a lifeline to Wales in this year's Six Nations and margins should probably also play a more prominent role in deciding World Cup seedings.

Thursday, 14 March 2013

Omar presented some very informative stats on his blog concerning the progress of teams in the knockout phases of such competitions as the Champions League, where away goals allow the visiting side to progress either at the end of normal and extra time. The only way that such ties can reach extra time is by the first leg score being identically reversed in the second match. In these instances, the home side has an extra 30 minutes of home advantage in which to progress and to balance this advantage, the away side can claim the decisive away goal if the goals scored in extra time are equally shared between the teams.

Omar's figures hint at the apparent fairness of this bargain because the number of home and away teams progressing without the need for a penalty shootout is reasonably equal. This is what we would expect if the quality of the home sides in the second leg was equivalent to that of the visitors over the course of the sample.

A theoretically based approach to the problem involves modelling the outcome of an extra 30 minutes between two equally matched sides to see if their chances of progressing in the extra time period is roughly equal. 1.4 goals per 90 minutes would be a typical goal expectancy for the home side compared to 1.0 for the visitors. However, over a 30 minute period these values would be greatly reduced. Figures of 0.57 goals and 0.41 would be typical values for the goal expectations of each team over final 30 minutes of the 90 in such a match up. So if both teams played in a similar manner to a normal game, these are the kind of goal expectancies we would see in extra time.

At the end of 90 minutes of the second leg, each team will have had an equal amount of playing time on their own turf to have built up a winning advantage. Therefore it is desirable that the extra thirty minute format shouldn't unduly favour one side over and above the difference in ability between the sides. So for two equally matched sides, the chances of progressing should be as near to 50/50 as possible.

Once extra time is reached, the home side has two routes to winning the tie. They can take and maintain a lead in extra time or they can keep the game scoreless and then progress on penalties. The away side has the same opportunities to reach the next phase, but also can progress with a score draw in the additional 30 minutes of play.

Chances Of An Equally Matched Home Side Progressing From Extra Time in the UCL.

Team.

Win in ET

Draw ET
period 0-0.

Win
Shootout.

Overall
Chance of Progressing.

Home Team

32%

38%

50%

51%

Chances Of An Equally Matched Away Side Progressing From Extra Time in the UCL.

Team.

Win in ET

Draw ET
period 0-0.

Score Draw
in ET.

Win Shootout.

Overall
Chance of Progressing.

Away
Team

21%

38%

9%

50%

49%

We can get a reasonable estimate of the chances of these individual outcomes occurring from a Poisson based calculation on the decayed pregame goal expectancy of our generic, equally talented home and away sides. For the shootout I've assumed each side has an equal chance of winning the penalty kick contest.

Either by accident or design, by allowing the away side the opportunity to still score an away goal in the additional 30 minute period, UEFA have almost entirely eliminated the home side's advantage of playing a larger proportion of the tie on home turf. The rules as they stand excellently perform the task of adding an extra half hour of potentially dramatic open play action, while still remaining fair to both sides.

Wednesday, 13 March 2013

A couple of days ago I wrote up a post on the age of Premiership players, most notably, Ryan Giggs. During the number crunching I worked out the weighted age by actual minutes on the field of each Premiership team for the current season and in addition I broke the numbers down for midfielders, strikers and defenders. The exercise is reasonably interesting because it paints a rough picture of where individual teams are playing with youth or experience, either through accident or design. It's unlikely that Villa's defence and QPR's attack had much in common to talk about when the teams met back in December.

Out of idle curiosity I then looked to see if there was a correlation between the weighted age of each group of playing positions and another. In short do some team units age gracefully together or get torn down an rebuilt in step.

Average Age of Premiership Positions, Weighted By Actual Playing Time. 2012-13.

Team.

Weighted
Age of Defence.

Weighted
Age of Midfield.

Weighted
Age of Attack.

Arsenal.

26.5

25.4

25.3

Aston Villa.

23.1

25.3

23.6

Chelsea.

27.5

24.5

26.0

Everton.

30.5

28.8

25.8

Fulham.

30.6

29.8

29.4

Liverpool.

27.2

24.6

25.0

Manchester
City.

27.4

26.8

25.7

Manchester
United.

27.8

28.5

26.5

Newcastle.

26.6

25.7

26.8

Norwich.

27.0

25.8

27.5

QPR.

28.6

26.8

30.7

Reading.

27.6

28.2

28.4

Southampton.

24.3

25.4

24.8

Stoke.

26.5

27.8

29.6

Sunderland.

25.4

25.4

26.6

Swansea.

26.3

26.0

27.2

Tottenham.

25.4

25.7

28.0

WBA.

29.1

27.0

26.2

WHU.

27.4

27.6

25.9

Wigan.

30.0

25.8

26.7

I didn't expect to see a correlation, so I was surprised to get r^2 values of around 0.2 and when I randomly jumbled up one set of averages the correlation was almost always close to zero. So there did seem to be a link, even though it made little intuitive sense.

If you look at the average, weighted overall age of all teams, two stand out. Fulham are particularly old, with an average weighted playing age of nearly 31 and Villa are unusually young. These two outliers almost guarantee that the average age of each positional group within their team will be very close. For example, for Fulham to have say a group of midfielders with a weighted average of 28, to maintain their overall average they would need strikers and defenders to average 33 years old of playing time. It is much more likely that at the extremes the weighted ages of the three separate groups will be similar and this proves to be the case for both Fulham and Villa.

However, the strong correlation between the weighted age of positional groups seen at Fulham and to a lesser degree at Villa may be almost totally responsible for the mild correlation seen when we plot the group of EPL teams as a whole. And that appears to be what has happened. If we remove Fulham and Villa from the regression, the r^2 value drops to nearly zero indicating that there is really no general connection between the ages of midfielders and strikers or strikers and defenders in the bulk of EPL clubs. A spurious post narrowly averted.

A more mainstream example of this (badly) explained phenomenon occurs with the apparently strong connection between possession and success. If you plot seasonal possession for the EPL against a success based metric such as points or shooting accuracy, you get a reasonably straight line with a healthy enough r^2 with good teams at one end and bad ones seemingly at the other. So the idea that possession is a good thing, essential for success appears to be confirmed.

I've made a few posts proposing that possession is a meaningless stat, therefore, I should try to explain this apparent contradiction. If you remove the big four teams, who invariably make possession count and repeat the regression, then the correlation virtually disappears. Just as Fulham and Villa currently drive an apparent, but bogus league wide aged based correlation, Manchester United and company do the same for possession relating to success over a season in the EPL. The correlation is strong for the big four, but almost non existent for the rest of the league.

The same is seen in Spain, remove the big two and possession doesn't correlate to success in Spain, put them back into the sample and you have r^2 evidence that it does.

Monday, 11 March 2013

Sir Stanley Matthews managed to play one game at English football's highest grade when he was five days into his fiftieth decade, but the increasing physical demands of playing top flight Premiership football means that, aside from goal keepers, we are very unlikely to see many players performing at such a level past even their fortieth birthday. Ryan Giggs is fast approaching that particular landmark and while he is unlikely to receive a similar elevation as Sir Stan, Giggs' footballing longevity stands shoulder to shoulder with that of Hanley's most famous son.

The average age of today's nominated Premiership squads gives a crude idea of the scale of Giggs' achievement, but we can really begin to appreciate just how unusual it is for a player to play into his fortieth decade if we produced a team average weighted by playing time. For example if a 30 year old keeper shares playing time with a 20 year old, the average age of the two keepers is obviously 25. But if the older keeper performs for the bulk of the minutes, then the weighted average will be much closer to his age of 30 and will give a more informative picture.

Above I've plotted the weighted average age for all Premiership teams from this year's renewal. The average has been calculated on actual minutes played, rather than appearances and I've used a player's actual age in years and days on gameday. The scale has been chosen to try to show the differences between teams, but without unduly exaggerating that difference inherent in using a none zero vertical axis.

It shouldn't surprise that Fulham consistently send out teams with weighted averages in the thirties. Hughes, Hangeland, Riise, Berbatov, Schwarter and Duff are each regular contributors who have each passed that birthday. The current team is old by Premiership standards and also compared to Roy Hodgson's first full season in charge in 2008-09, when only a couple of regulars were in their thirties and the weighted average age of the team was around 28.5 years. Martin Jol still has players from Hodgson's successful team of almost five years ago.

Villa are a young and inexperienced team and this is reflected in the figures and possibly should be a cause for concern in their current perilous state and the same worry applies to Southampton. More successful teams, which also appear to have youth on their side as well, include the trio of title wannabees in Arsenal, Liverpool and Tottenham.

Manchester United sit just above midtable, but to fully appreciate where Ryan Giggs stands in relation to other current players we need to break the figures down into positions. Keepers are a special case, not only are they less reliant on keeping up high levels of endurance fitness to maintain high levels of performance, they are also under represented in a side. A team only has one keeper on the pitch at one time. Therefore, I've split the players into defenders, forwards and midfielders and looked at the proportion of playing time seen by each age group compared to the group as a whole, with age measured at the start of this calendar year.

Playing Time and Age For Premiership Defenders. 2012-13.

Playing Time and Age For Premiership Attackers. 2012-13.

Playing Time and Age For Premiership Midfielders. 2012-13.

The timescale isn't a whole season, but already a typical distribution is appearing. Premiership team selection, weighted by minutes played shows that the majority of playing time goes to players in their mid to late twenties. Defenders would appear to be able to carry on for slightly longer before age begins to whittle them down in number and opportunity. Once a striker reaches thirty, they begin to rapidly disappear from the first choice eleven, with 32 appearing to be a significant birthday if this season is typical.

You would expect that midfielders play in the most physically demanding role, where there is little opportunity to rest and while the peak appears earlier than the other two positions, there is also a significant secondary peak centered around 30 years of age. Players represented in this older group include Lampard, Gerrard, Britton, Carrick and Nolan, possibly indicating that midfielders have a more mixed and diverse role compared to other positions on the pitch.

Steven Gerrard has played over 2500 Premiership minutes despite having celebrated his 32nd birthday and Frank Lampard has maintained his form, if not his playing time. Ryan Giggs is at the extreme right of the distribution at three years older than the oldest playing defender in the current season and four years older than the comparable striker. A marvelous achievement and one that he very nearly shares with his 38 year old team mate, Paul Scholes.

Friday, 8 March 2013

How you identify and separate the contribution of luck from the component of skill in football has been a recurring theme on this blog and so it was refreshing to see the issue addressed during some of the discussion groups at Sloan earlier this month. The impact of luck on both player and team performance metrics influences how these metrics correlate across seasons and has implications if we then use these measurements in a predictive way. Modelling luck instead of likely skill can inflate the worth of the "good" but temporarily lucky at the expense of equally good, but latterly unfortunate. Recognizing that luck exists in a sporting environment, where cause and effect makes for a more appealing and satisfactory narrative, is half the battle.

Luck in the most rawest of forms was on show on Tuesday night, when Nani's dismissal for a waist high challenge after an hour not only reset the game probabilities to near pre kick off levels, but also split the opinions of Twitter users and former professional referees alike. Of the latter group, some thought the decision harsh and others sided with the call that was made on the night. Similar incidents have resulted in widely different outcomes for the perpetrator. De Jong's assault on Xabi Alonso in a World Cup final saw Howard Webb produce a yellow and last weekend, Peter Crouch's unintentional, full bloodied kick to the head of West Ham's Matthew Taylor merely saw the Hammer being led unsteadily to the dressing room and no further action being taken.

In each incident a whole raft of minor incidents ranging from the reaction of the players, the importance or otherwise of the match, to the the mood and viewpoint of the official, combined to produce three entirely different conclusions to, by the letter of the law, three near identical events. Holland and Stoke possibly got lucky, although both ultimately lost the game, with Stoke fortunate to remain with 11 players, only to be beaten by a goal from Taylor's enforced replacement, Jack Collison.

More often randomness in the process is harder to spot, a ball that could have beaten the last defender instead fails through a slight lack of zip on the pass or a goalscorer distributing his talent to produce consolation rather than match meaningful strikes. Single incidents such as these, if we are lucky enough to spot them, can influence the outcome of individual matches, but over a season the signal shines through enough for Manchester United to finish near to the top of the league and West Ham or Stoke to finish slightly above 18th spot. The good or bad random content, that was so keenly felt at the time in a single match is often forgotten because a team usually finishes close to where their talent levels decree they should. Although the bitter taste lingers longer in knockout football.

Which finally brings us to match fixing. Teams or individuals have little control over luck or Nani would have chosen to get sent off in the last minute rather than the 58th and Peter Crouch would have poleaxed a defender instead of an attacking midfielder with an able and talented young replacement on the bench.

In games of pure chance an individual has no control over the result. You cannot "fix" a fair coin toss by paying off the person tossing the coin because they have no (or very little) way of influencing the outcome in either the short or longterm.

But football is different, Nani was unlucky in the current environment where "intent" is tacitly acknowledged by many officials, even if it isn't written into the rule book. So reluctantly, he has become the standard bearer for random misfortune. Skill, however has a choice. A player can do his utmost to find the top left hand corner with a shot or if he wishes, harness his skill to deliberately hook the ball wide and the current Europol list of allegedly fixed games implies that some do.

The revelation that both skill and luck coexist in football shouldn't need constantly restating, but much of the current analysis ignores the latter entirely. If Nani's dismissal helps to reinforce an acceptance that some some events are beyond the control of the participants, then his sacrifice won't have been in vain.

Thursday, 7 March 2013

Gareth Bale's near injury time winner at West Ham last week was as spectacular as it was dramatic. Surging runs and shots from distance are fast becoming his trademark and his Monday night goal contained both, but how does his effort match up with two similarly outrageous long range goals from last term. Using x,y data in this guest post, I try to reduce the Welshman's goal down to figures and probabilities, but there is a video for those who just want to relive the goal.

Wednesday, 6 March 2013

The NFL season ended last month, but I couldn't resist writing about a post on the University of Washington website from the Washington State climatologist, concerning the Seattle Seahawks apparent liking for rain and snow. The posting uses game day weather information to demonstrate that if it is raining or snowing the Seahawks perform noticeably better than if it is sunny or just overcast.

Since 2002 their record with precipitation is 17-4 for a winning percentage of 0.81 and a positive points differential of 12. Without rain or snow their record is a near palindromic 42-25 for 0.63 and a points differential of 5.

On the face of it the evidence is persuasive. However, two problems aren't really addressed. The sample with precipitation is very small, just 21 games and small samples can very easily lead to extreme results that aren't indicative of performance over a much larger number of matches. Also there is no account for team strength of the opposition. In 2003 there was intermittent rain for the visit of the then 3-6 Detroit Lions.

Extending the sample size isn't an option, although you could include the two seasons between the demolition of the indoor Kingdome and the move to Qwest Field when the Seahawks played at Husky Stadium. On the 28th of October 2001 the Miami Dolphins visited Husky Stadium, the game was played outdoor, the weather was sunny, temperatures hit 43 degrees F, humidity was 82% and an 8 mph north easterly wind blew. Shaun Alexander led the running game and Matt Hasselbeck was under centre. The Dolphins won 24-20 in regulation.

So there is a large quantity of available information concerning the prevailing conditions. I've therefore sorted all home games by precipitation. The original survey sticks to just home games, although logically the 'hawks should also play above their usual level of road form if they arrive in say Green Bay and are surprised, but presumably delighted to see snow falling.

I found 22 home games since 2000 where precipitation is mentioned on the game log. The majority description was rain, with a couple of showers or intermittent showers and two games were played in snow. The 'hawks won 16 and lost 6 (73%), slightly different from the quoted 17-4, although the phrase precipitation "at or in the immediate vicinity" of the stadium is used on the Washington site. In dry or overcast weather, the win loss record was 57-31 (65%). So slightly higher for the larger "dry" weather sample and a fair bit lower for the smaller "wet" games. Note how a couple of wins or losses one way or another can really bounce around the % figures if you are used limited sample sizes.

If we look at margin of victory, the strength of schedule issue is still unresolved. In my sample, Seattle win by and average of 8.5 points in the wet and 5 in the dry, compared to 12 and 5 respectively in the study. However, all we really have is slight discrepancies to eyeball. If we look at individual games, one, Arizona 0 Seattle 58 stands out. The Cardinals quarterback in week 14 was John Skelton, he threw four interceptions, took one sack, fumbled the ball twice and lost it once. The game day weather was rain. His only other NFL game when the match saw rain was in week 11 of 2011 at San Francisco, when he completed just 30% of his 19 passes, was intercepted three times, fumbled once and was sacked once, before he was mercifully replaced by Richard Bartel.

Such an outlier skews the margin of victory for Seattle in wet weather, but the narrative could just as easily be that John Skelton is a liability in the wet rather than Seattle, one of his beneficiaries, being a margin of victory revelation.

To insert context and team strength into the study we must look at each individual matchup. I could run my NFL model for the early seasons, but the results would be very close to the Vegas line for Seattle's games, so I will use the Vegas figures. Over the 13 seasons, Seattle beat the estimated margin of victory or defeat set by Vegas 109 times and failed 109 times, so the Vegas estimations of the quality of Seattle and their opponents would appear accurate over a large sample size. The handicap given or received by Seattle has now effectively turned each game into a coin toss and we can compare the success rate of Seattle against the handicap to see if either "dry" weather Seattle or "wet" weather Seattle produce against the spread results that are statistically different from the 50% overall rate that the Vegas line aims for. And neither are.

Seattle's results vary by weather, but by no more than you would expect from random chance once you account for the strength of their opponents and sample size. There is no evidence that they are either a wet or dry weather team. The omission of additional weather related factors, such as wind strength, possibly combined with evidence that Seattle are reliant on passing the ball would also help to eliminate other possible causes if the difference was significant. The story is interesting, but poorly constructed and certainly unproven.

Google trends is a great tool. Type in a word or phrase and Google will return the popularity of that search term over time. In 2033 if you enter "Harlem Shake", you should return a sharp peak separated by two flat, horizontal lines, unless Martin Jol's Fulham has embarked on a 20th anniversary tour. In short, it is an excellent tool for tracking net chatter from the last decade.

"Soccer Analytics" first makes an appearance in May 2008, registering around 50% of peak interest, before topping 100% in May of 2010 and then falling back to near it's initial levels again in 2013. "Football Analytics" makes an appearance a year earlier, but this term has probably suffered some cross contamination with America's most popular sport, the NFL. So it is probably wise to actually google the phrase and scan the results to ensure that you are researching the correct subject.

The Poisson distribution is a useful mathematical tool to predict the likelihood of rare events occurring if you know the average rate at which those events occur. It has long been the staple content of many sites used to discuss the modelling of the number of goals you would expect to see in football/soccer matches. The term "football poisson" predates "soccer analytics" on Google Trends by over two years and the reason for this disconnect is that much of what could today be termed "soccer analytics" was being presented on out and out gambling sites in the mid to early part of the last decade. Therefore, much of the initial ground work that exists in mainly American based sports, such as baseball, but appears to be lacking in football, is actually contained on the sub forums of bigger gambling sites or possibly behind pay walls.

The surge of interest in ways to mathematically describe the major events and outcomes surrounding a football match coincided with the relaxing of betting rules, especially in the UK. A decade or so ago the minimum requirement when betting on the outcome of a non live football game was for three games to be included as an accumulator. Therefore, the incentive to produce time consuming mathematical models to predict the outcome of individual football matches was limited. Even if you found one match where the bookmakers estimate was still wrong once they had applied an overround, you were required to pair that game with two others.

Once single bets became the norm, football modelling on the net took off, initially using the poisson approach, but often extending to neural nets, least squares or individual team ratings based on goals or shots. The poisson was and remains the most accessible route into estimating the massive array of gambling opportunities presented by the bookmakers. An average rate of goal scoring leads to team specific numbers, which leads to predicted correct scorelines and onto individual match odds. Easily derived spin offs include the likelihood of winning individual halves, winning or losing from a goal up against 10 men with 40 minutes to play or the chances of scoring the first goal...And then onto diagonally inflated bivariate models if you caught the bug.

I recall first seeing a Pythag for football, corrected for scoring environment being posted by a well known name from sabermetrics in the mid 2000's. Home field advantage was widely appreciated from the start, along with continental advantage in World Cups and regression towards the mean and by implication the importance of random variation due to luck was practiced, if not explicitly expressed. Most of the posters went under pseudonyms, based on exotic particles from physics (guilty) or pioneering communication satellites. The analytics sites of the present are partly retracing territory already covered by the gambling sites of the last decade.

Gambling organisation are obliged to price up many eventualities and the emergence of spread betting, where poor judgement from either side is punished by potentially large losses and good judgement is rewarded has extended the range of events which require modelling. A sensible business needs to know and manage the risks that it is taking. So it is reasonable to assume that bookmakers are confident that they have the best available estimate about a footballing event actually taking place. Modelling football's basic events is possible, because bookmakers and to a lesser degree punters have been practicing the dark arts on gambling orientated sites for at least as long as Google Trends has been tracking search results. The remnants of these sites provide a great primer for those interested in the base rate figures for such connected events as goal expectancy and the most likely time of the first goal. As has been stated, this type of analysis describes football as it is, but it really is essential knowledge.