Ultimate Poseur

Twitter

Tuesday, June 7, 2016

Carlos Carballo, 45, from Spain will be one of 18 referees at Euro 2016. If you see this guy officiating a match in Euro 2016, expect cards, lots of cards!

Euro 2016 is coming up in a few days, and there’s no shortage of coverage about players, tactics, managers etc. (The Guardian’s Euro 2016 section is particularly good.) But one thing a lot of the articles will be missing is a look at the one person with the most influence on the outcome – the referee.

The thing about refereeing is that it’s one of the most high-profile, but completely thankless, jobs around. If you do everything right, no one ever takes notice, but make one mistake in a crucial match and the whole world is on your case.

So refereeing is a high-pressure role that not everyone can handle. The 18 who will be going to the Euros are some of the best referees Europe has to offer.

One of them for example, Mark Clattenburg, had the privilege of officiating the FA cup final and the Champions League final in the span of a week. (Yup, he was the one who went full lizard tongue at Real Madrid defender Pepe!)

So how have these referees officiated over the years? What are their styles? Are some of them the kind to issue yellows early and often to stamp their authority?

Or are some of them the kind to avoid early bookings and let the game flow, taking the card out only if they really need to?

Based on data from Soccerway.com for around 5,000 matches over the past 14 years, with at least 200 matches taken for each Euro 2016 referee, we can understand how these officials have been dealing out yellow and red cards over their careers.

We were able to figure out four different things:
a) who books people the most
b) who takes out cards the quickest
c) who most gives out second yellows to send people off and
d) who gives out multiple cards in a match the most

Curiously, the answer to all the questions is the Spanish referee , 45-year-old Carlos Carballo. You may not know him by name, but if you watch the Champions League regularly, you will most definitely recognise him.

Now on to the analysis.

Carballo has officiated in 318 matches (that we have data for) and there have been 8580 appearances by players in those matches. Note that I’m saying appearances and not players because a player can make multiple appearances, ie. play in different matches. 1716 of those player appearances resulted in a yellow or red card, which makes the figure for Carballo around one in five.

This graph above looks at when referees first give out a card on average . As explained in the graph, there’s data of over 200 matches for each referee, so we looked at each match, noted when the first card was given and averaged those times. And voila, the Spanish referee is the earliest on average to give a card out, viz. the 27th minute.

Here, we’re looking at who converts yellow cards to reds the most. That is, once a referee has given a yellow in a match , how often does he give a second yellow to the same player to send him off? (So straight reds don’t come into the picture here.)

Looking at the data, Carlos Carballo again comes first giving out second yellows in 69 out of 1740 cases where he has already given a yellow card, or 4 out of every 100 player appearances.

Martin Atkinson, a 45-year-old referee from England is the most lenient when it comes to this, only issuing 2nd yellows in 21 out of 1,214 cases, or 1.7 2nd yellows for every 100 yellow cards.

The chart above may look complicated, but it’s simple in concept really. It asks the question: Who has historically tended to give out multiple cards (red/yellow) in a match? We looked at the data for each referee and then checked how many of those cards were the the first to be issued in a match, how many of them were the second and so on. How many referees just keep going on booking everyone in sight?

Looking at the distribution, and to absolutely noone’s surprise, Carlos Carballo again comes out on top. Only 16.8 % of all the cards he has given out over 12 years were first cards. Compare that to Svein Moen from Norway, for whom 33% of the 597 cards were the first card in a match.

In fact, if you look towards the bottom of the four charts, Moen is one name that keeps cropping up at the opposite end of the table from Carballo.

Moen, 37, only books someone in 1 out of every 10 player appearances, or at half the rate of Carballo. The Norwegian referee also makes his first booking on average in the 43rd minute, 16 minutes later than Carballo.

So, all things considered, Moen is pretty much the Christ to Carballo’s Antichrist!

If you want a walk through of all the code I used to get the data and analyse it, you can check my IPython notebook.

Tuesday, June 10, 2014

This is a version of the Kevin Bacon game but one which uses players at the world cup. What it does is challenge you to connect Wayne Rooney to another player at the world cup in as few steps as possible.

You can connect two players only if they've spent at least half a season together as part of the first team for a club. Chances are you may be able to connect Rooney ultimately to another player, but not in a way that's shorter than the "path" given by the visualization below. Play around with the "viz" a little to get a better idea of how it works.

T&Cs
There are a few caveats to keep in mind though. One is that you can only use club affiliations, not national team ones. So if you want to connect Rooney to Luis Suarez of Uruguay, you can't go "Rooney plays for England with Daniel Sturridge who plays in Liverpool with Luis Suarez". That's cheating!

Also, you can't use players that aren't there at the World Cup. So if you're trying to connect Rooney to Marco Veratti of Paris Saint Germain, you can't use his clubmate Zlatan Ibrahimovic along the way as he isn't at the tournament. (This is a visualization I've made especially for the World Cup after all.)

Another thing to keep in mind is that I only take into account teams that players have actually played for, not the team that holds their registration or who they're contracted to or whatever the proper legalese is. So Romelu Lukaku of Belgium is listed as playing for West Brom and Everton for the last two seasons respectively, even though he may have technically been a Chelsea player then.

Also, you can't use clubs that players are transferring to after the World Cup. So Ciro Immobile of Italy may be going to play for Borussia Dortmund next season, but you can't use future affiliations, just clubs that players have actually played for.

You also can't use stints with youth teams, Under-18 teams or B-teams along the way, only seasons together as part of the first-team squad count. Also, a lot of players, when they're young, especially in the Premier League, go on one-month loans to teams in lower divisions, haven't included them either.

I know a lot of players have been dropping like flies in the build-up to the tournament, so I'll remove missing players and update the data after the first round of matches is done, countries won't be able to modify their squads after that.

(Just as a point of explanation, if you look at the table above, you'll see expressions like 2011-H1 or 2003-H2, so when I say H1, I mean the first half of the calendar year, ie. Jan-Jun, or the second half of most European seasons. Similarly, H2 stands for the second half of the year or the months Jul-Dec.)Who Rooney isn't connected to
Note that I've used "almost" in the headline of my blog-post, that's because there's 36 of the 735 players at the World Cup that Rooney can't be connected to, and chances are no one else can be either.

I haven't really looked at why those 36 players are unconnected, it could be their age or the fact that they play for really small clubs in their own country and so have little to connect them with the stars of their national team, which would really set them on their way to being part of Rooney's network, don't know. The thing to keep in mind is that I've just used the 736 players (32 teams x 23-man squads) at the World Cup, so there may well be players who aren't at the tournament who could connect these 36 players to Rooney in some way.

This is the breakup of the 36 unconnected players by country:

If you look at the graph above, the fact that there are so many African teams with unconnected players isn't a surprise, but the fact that players from the USA and Australia are in there is noteworthy. Is that an indicator of how players are opting to stay in the MLS or A-League more instead of moving to play in Europe? Again, don't know, but if you want the list of the 36 unconnected players, it's in the dataset linked to further down the page.

And for those keeping score, of all the 699 players that are connected to Rooney, the player furthest out (Azubuike Egwuekwe of Nigeria) is, indeed, six degrees of separation away from Rooney!

What's the other table in the viz for?
Well, there were a number of things I could have done with the data but, sticking with the overall theme of "connectedness", what I chose to do is make it easier to find out if players from any two national teams have ever played together in the past.

Sometimes you get players in international matches going for those 50/50 balls a little harder with some opposing players more than others, and you can't really figure out why. There's a good chance that those players may have either played with each other in the past for the same club or for rival clubs in the same league. This viz will help you find out if that's the case.

Apart from that, it just lets you answer that very basic football-geek question of whether players from the two teams on the pitch ever played with each other in the past.

Note that I've used FIFA country codes here and most of them are pretty clear, but be wary of confusing codes like BIH for Bosnia, SUI for Switzerland, CIV for Ivory Coast and CRC for Costa Rica.

How this was done
First things first, here's a link to the dataset and the Pajek .net file.

I used transfermarkt.com, kicker.de, soccerway.com & footballdatabase.eu to compile the raw data. (Any mistakes in the data are my own and not that of the website, yada, yada..) I then coded it and created a Pajek format .net file, which was then processed using igraph & R to find the shortest paths from Rooney to every other player, selecting the most recent ones for the visualization. Kind of went down a rabbit hole on this one with all the computer science-y and "social network analysis" stuff I had to wade through, but I'm glad to have come out the other side alive!

As always, suggestions, criticism, bouquets and brickbats all welcome in the comments section below!

2014-06-11 UPDATE: Serves me right for not double checking before I uploaded my viz, but it turns out records for nine players were missing from the data I originally uploaded to Tableau, but I've corrected it now and all 736 players should be listed now. Also, because of those missing records, my original figure for unconnected players added up to 30 instead of 36 as it is now. All this happened because there was a stage between the processing of the raw data by R and formatting it for Tableau that wasn't automated and involved significant editing by hand, and so increased the scope for human error. Looks like I'll have to get better at things like programming to avoid slip-ups like this! Don't worry about it, the "viz" is still good, in fact, now it's even better, so use it confidently!

2014-06-12 UPDATE: Had to correct an issue resulting not from missing data, but from too much data! So I had entered multiple records for players like Nabil Bentaleb (ALG) and Remy Cabella (FRA) because there were multiple "shortest paths" to them from Rooney that were all relatively recent, and because I couldn't choose between them, I thought it would be better to just keep them all. But for some reason that confused Tableau and if you entered Bentaleb's or Cabella's name, Tableau wouldn't give an answer. So I deleted any instances of multiple records and everything's working properly now! On another note, a player from Costa Rica has been withdrawn, so will update the data to reflect that change after the first round of matches is over.

Thursday, June 13, 2013

The last few weeks have seen managerial changes at several big clubs around Europe, Moyes coming in at ManU, Mourinho going back to Chelsea, Mazzari taking over at Inter and we could see several more over the summer with PSG, Real Madrid and maybe even Barcelona on the hunt for new managers.

So being the good Samaritan that I am, I’ve created a simple Tableau visualization to help fans and supporters of big European clubs figure out who their next manager should be.

SO WHAT DO ALL THOSE OPTIONS REPRESENT?
Let me just explain what you see in the data viz a bit. The age filter is pretty self-explanatory, the default is an age range from 35 to 55, you can change that to whatever you want.

Now for the points range. I’ve awarded points to managers for their achievements over the past 10 years in the “big 7” leagues in Western Europe and UEFA’s Champions and Europa League.

I just added up all those points for those achievements over the past 10 years to come up with a manager’s score. For example, in the default display, Mourinho has a points score of 72, Ancelloti has 53 etc. The graph just plots managers’ scores against their ages. (Didn’t really have anything planned in mind when I did that. I guess one good thing about it is that it adds a visual element to the presentation, otherwise you’re essentially just looking at filtered views of a spreadsheet.)

So you adjust the age and points range filter, and you get a shortlist of managers in the middle. You click on a name there and on the right of the shortlist, you’ll get to see that particular manager’s record for the past 10 years.

And if you want to fine-tune things further, you can use the filters at the bottom to choose managers by nationality or if you want, you can use the “Seasons to consider” filter to only take the managerial record over the past five years into consideration, if you think the default of 10 years goes a little too far back.

With the “Led team to 1st place in” and other filters of its kind, you can deselect certain leagues like the Portugese Primeira Liga if you think winning a title there shouldn’t matter and that managers should only be judged by their achievements in the “bigger” of the big 7 leagues. You would be narrowing your selection pool if you did think that way but ultimately those choices are up to you.

(Also note that they work using the “OR” operator and not “AND”. If that sentence confused you, what I was trying to say was that if, for example, you take the “Led team to 1st place in” dropdown menu and you select France and England, you won’t just get someone who’s won titles in both France AND England like Carlo Ancelotti but also a larger group of managers who have won titles in either France OR England.)

That’s it for my explanation of what all the ‘options’ do for you. After this, I’ll be going a little into how I got all my information and explain some of the choices I've made. It might not make for very interesting reading, so if you’ve read this far and want to move on to other things, I appreciate your taking time out to read my post and hope you have a great day ahead! (Don’t forget to spread/mail/share/facebook/tweet the word about this post too!)

HOW I GOT THE DATA
First off, here’s the link to the complete dataset. Feel free to use it any way you want!

I just looked at the managers who have led their clubs to at least 3rd place in the seven big European leagues or at least the semifinals in the two European club competitions over the past 10 years. I ended up with a list of around 100 managers and using the website transfermarkt.co.uk. I got the details on their ages, nationalities, present club affiliations etc. and put them all together to get the dataset linked to above. Needless to say I, and not transfermarkt.co.uk, am responsible for any errors that may have crept into it. (There shouldn’t be any though, I re-checked everything as rigorously as I could. But if you find anything wrong, do let me know.)

MY REASONS FOR DOING THIS
I’m an Arsenal supporter and I felt the need to create this data viz primarily to answer the question, “Who should succeed Wenger?”. On the Arsenal-centric Tuesday Club podcast, according to Alan Davies & co, every time they ask themselves this question, they start off with a huge longlist of managers but eventually always narrow it down to David Moyes. (Guess ManU must have started off with the same longlist too!) I wanted to see what answer I would get after going about this task in a slightly more objective manner.

So if I want someone in the Wenger mould to succeed Arsene, how would I go about it? As a starting point, I would see what Wenger had achieved before he came over to Arsenal and see which present-day managers in their 40s (Wenger was 46 when he was appointed at Arsenal) have achievements that could match those of Wenger. Now in the 10 years before he took over at Arsenal, Wenger had won Ligue 1 with Monaco in 1987-88, was league runner-up in 1991-92, was a runner-up in the European Cup Winners Cup in 1991-92 and reached the semifinals of the European Cup/Champions League in 1989-90 and 1993-94.

So I felt that whichever manager Arsenal goes for next should be expected to demonstrate a similiar pedigree in the big 7 leagues as well as in European competition. That is why I restricted my search to managers who have done well in just those leagues and competitions for the past 10 years. Because of the level of competition, resources and players, these leagues are a cut above the rest and so a manager who does well over there can reasonably be expected to do well in the English first division too. I know, it's a subjective impression but it's one that should stand up to more rigorous analysis. (Would appreciate anyone who has the skills to do that statistical analysis and let me know if it is the case.)

Now this is not to say that whoever matches up with Wenger’s achievements will be “the next Wenger”. A resume tells us nothing about the style with which a manager makes his team play or how great he is in developing players. But it’s a good place to start as we can see which manager fulfills the absolute minimum criteria to be even considered for a shortlist.

TARGET AUDIENCE
So I may have done this for very personal reasons but ideally, fans and supporters of any big club in the seven leagues should benefit from this and get lists of managers they can fantasize over! I’m sure that they’d be using pretty much the same criteria as I did in selecting the next Wenger. But if they want a potential manager to satisfy, at a minimum, a different set of criteria, they can do that too.

SHOULD AGE MATTER?
Ideally, it shouldn’t and if you were offended by my inclusion of an age filter, I sincerely apologize. But let me see if I can at least make you appreciate my point of view. Now, if you want someone to have a long run at your club, you would expect them to be younger. That is an indisputable fact you can’t escape from.

I could be mistaken though in wanting things to be seen from such a long-term viewpoint. Tenures like that of Wenger and Ferguson are anomalies and besides. it wasn’t as if the two of them were granted 15-year contracts at the outset. They got there through a succession of three- or four-year contracts.
By discounting managers who happen to be on the wrong side of 55, I am possibly being ageist to my club's own detriment.

Now that I think about it, I should make a distinction between the question “Who is the next Wenger” (a question where age restrictions probably make sense) and the very different question “Who could I possibly see managing Arsenal next?” (where the age restrictions should be relaxed).

WHAT'S UP WITH THIS POINTS SCHEME?
It is completely arbitrary and subjective, I’ll own up to that. But in the absence of or my inability to create an impartial ranking comparing achievements in different leagues and competitions, a subjective ranking is better than nothing at all. And looking at it, you wouldn’t deny that it might correspond to a large degree to how those achievements might actually rank. (Again, would appreciate anyone who has the skills to do the statistical analysis and let me know how well it does correspond)

And that’s the thing, it’s more a reflection of how, say, leading a team to 3rd place in the Portugese League ranks compared to winning the Champions League rather than how much bigger/smaller an achievement the latter is. To get all jargon-y about it, it’s an ordinal ranking that’s been converted into a crude cardinal score.

(Let me explain. If I were to rank all those achievements in the table above from 1 to 6, getting a 3rd place in the Portugese League would count least for me, so I’ve ranked that as 6 and given a manager who secures that place 1 point. On the other hand, winning the Champions League would be ranked 1st and so I've given a manager who achieves that 6 points. Now the latter may be a greater achievement, we just don’t know how much greater an achievement it is. So it might just be a 5-point difference according to my scheme but it, in reality, could be much larger.)

LIMITATIONS OF THE DATA VIZ
1) It leaves out wins in domestic cups, supercups, club world cups etc. Don’t think they count and just don’t really care about them, sorry! (I know I'm being dismissive about them but when I tried to interrogate myself further on why I thought that way, my internal argument went along the lines of "performances in the cups don't translate into performances in the league or champions league". This means that my disregard for domestic cups is ingrained at such a fundamental level that even when I try to judge their utility, the cups are judged not on their own terms, but in terms of how they affect performances in the two competitions I do care about!)

2) As I mentioned earlier, the data viz tells us nothing about the manager’s style or how well he gets along with everyone (For eg. Mourinho’s excellent resume tells us nothing about how abrasive he is or how destructive he is when it comes to relationships within a club). So all that this visualization tells us is about how the manager looks on paper.

3) Club managers outside the seven domestic leagues are ignored in this. For example, Mircea Lucescu’s trophies with Shakhtar Donetsk in the Ukrainian league are ignored because I don’t think the Ukrainian league is up to the standard of any of the big 7. Because of this, even those managers are penalized who may have made a name for themselves in the big 7 leagues but then moved elsewhere. For example, managers like Luciano Spalleti of Zenit St. Petersburg in Russia or Guus Hiddink of Anzhi Makachkala.

On a side note, I wish I could have included the Russian league. It’s attracting top coaches, players and there’s a lot of money going around, but I doubt how well Russian coaches (like their players) “travel” outside Russia. So I don’t know whether Leonid Slutski, a 42-year-old Russian manager who led CSKA Moscow to the title recently, could do a “job” or do well in Western Europe. Also if there happens to be foreign coaches there, they’re inevitably ones who have made their name in the big leagues. So the names of foreign coaches from Russia will still appear in the results, just that they won’t have done as well under my points scheme.

4) It penalizes managers who may have taken jobs with national teams during this 10 year period. So people like Guus Hiddink and Fabio Capello have suffered.

5) The data viz also disadvantages managers who have had some time off from the game, not because they weren’t getting job offers but because they wanted to spend time with the family or take a ‘sabbatical’ (Guardiola) or were waiting for the right opportunity to come up (Benitez). Genuinely wish there was a way I could un-penalize this group.

6) Another problem is that with all these achievements, you can’t really control for all other factors and separate out what the contribution of the manager actually was. For example, how much was Chelsea’s success in the Champions League down to Di Matteo? I’m reminded of something (I'm guessing?) Guardian journalist Barry Glendenning said on a football podcast about how Di Matteo then was the equivalent of a teddy bear tied to the front of a runaway train. It could just be that the Chelsea team was that good (or that lucky!) and any manager (other than Villas Boas) could have won them the Champions League.

It also works the other way in that a manager could be really good but because his team isn’t that great, it’ll take time for him to build a name. For eg. Frank De Boer may be a great coach, he’s led Ajax to the third successive title in Holland. But because his Ajax team are never strong enough to compete in Europe, he will never do as well under the points scheme as he could with a bigger team than Ajax.

7) You don’t really get much of a diverse nationality pool because of the focus on the big 7 leagues. Now the data viz is about coaches with experience in these leagues rather than citizens of those countries, so you will get people like Pellegrini of Chile or Bielsa of Argentina, but I realize that managers of the country where a league is based will be favoured and get more chances there, and so be represented more. Anyway, below is the nationality breakup. (Also, if you’re English, look away now. But, actually, why should anyone be surprised at how few of them are English managers?)

8) There might be also be a bias towards managers of a certain age. If you’re interested in the age breakup of the managers, it’s here.

In a way it’s understandable because someone younger won’t get the same kind of opportunities to manage a big club, unless they happen to be a prodigy like Andre Villas Boas or unless they happen to be famous players shifting immediately to management like Didier Deschamps did when he managed Monaco. You usually don’t get to jump the rungs like that. Most people have to pay their dues the way Steve Clarke did. He had several jobs, including assistant manager at Chelsea, before he made the move up to manager at West Brom.

Now I was going to say that this bias towards older managers was because of a “recognition lag”. That it takes bigger clubs time to realize how good a manager is and that by the time they do, the manager would have already passed his 40s. The problem with this idea is that it assumes that a manager’s ability level is static and can’t improve with time. The thing is managers do improve with experience and that could be the reason why bigger clubs go for more seasoned managers and this isn't the outcome of some recognition lag.

(Addendum 2013-06-15: Realized a day after I posted this on my blog, that the ages of the managers in the data viz are their present ages, and not their ages when they were appointed or when they won their trophies in the past. So it's ok for the purposes of this data viz and for its age filter, but it wouldn't be right to make any broader statements on which age groups are shown a bias when it comes to big club appointments, if all I based it on were the present ages of managers. This doesn't mean that all that I've written in the previous two paragraphs is now invalid, but this section is definitely something I will have to review and look into further, possibly in an another post.)

FOOTNOTES
Only go on further if you want to punish yourself!

--If there's a question mark next to a club or country under “present affiliation”, it means that a manager’s position was in doubt at the time of writing and might well have changed by the time you read this.

--The Italian first division title was stripped from Juventus in 2004-05 and as far as I know, wasn't awarded to anyone, so Fabio Capello, the then Juve manger, doesn’t get the credit for it nor does anyone else

--Bobby Robson who got Newcastle to a semifinal place in the UEFA Cup in 2003/04 passed away a few years ago, but I’ve included him just to be complete and for his age, I’ve just put in how old he was at the time of death

--There are some choices I made that seem reasonable to me but are essentially debatable, so thought I should catalogue all of them.

1) In 2005/06. 2006/07 and 2007/08, Netherlands had these silly Champions League playoffs involving teams placed from 2nd to 5th. I'm ignoring them and just taking the positions after the regular season into consideration and the managers who led the teams into those positions, not those whose teams cleared the playoffs

2) This relates to Bayer Leverkusen’s 3rd place finish in the German first division last season (2012/13). They had a kind of a joint coaching arrangement between Sascha Lewandovski and Sami Hyppia but since Lewandovski was given the nominally superior designation of “coach” compared to Hyppia's title “team manager”, I'm giving the credit for the 3rd place finish to Lewandovski.

3) The biggest issue I faced was whom to give the credit for a trophy or a placing in the domestic league, if the club had more than one manager during a season. In some cases, it is pretty clear cut, like there is no way Andre Villas Boas could get the credit for Chelsea’s Champions League win in 2011/12 even though he managed them for the majority of the season and Roberto Di Matteo was only there for the past few months. Ok so technically, he did lead them through the group stages so he should get partial credit for Chelsea’s success. But there’s something unsatisfying about giving the credit for a trophy to two managers. So I’ll give it to Di Matteo because, unquestionably, he played a bigger part in the Champions League win than Villas Boas did.

In other cases though, it’s difficult to figure out so I (a) used my judgment if I have some memory of what went on or (b) just made a call based on how many months the caretaker manager was in charge after the first manager was fired/quit/resigned/moved to another club etc.

Cases where I went with the manager who began the season with the club• Germany
o 2008/09-Germany-2nd place-Bayern Munich: Given credit to Jurgen Klinnsman and not then-caretaker manager Jupp Heynckes
o 2010/11-Germany-3rd place –Bayern Munich: Given it to Louis Van Gaal and not interim manager Andries Jonker
o 2010/11-Champions League-3rd place-Schalke: Given it to Felix Magath and not caretaker manager Ralf Rangnick. Magath took them to the quarterfinals, Rangnick just won the quarterfinal against Inter before losing badly to ManU in the semifinals
o 2007/08-Germany-3rd place-Schalke: Given it to Mirko Slomka and not caretaker manager Michael Buskens• Italy
o 2008/09-Italy-2nd place-Juventus: Given it to Claudio Ranieri and not caretaker manager Ciro Ferrara who just came in for the last two matches of that season• Netherlands
o 2004/05-Netherlands-2nd place-Ajax Amsterdam: Given it to Ronald Koeman (even though he left in Feb 2005) and not to interim manager Danny Blind
o 2011/12-Netherlands-3rd place-PSV Eindhoven: Given it to Fred Rutten and not Philip Cocu
Cases where I went with the manager who ended the season with the club
• England
o 2008/09-England-3rd place-Chelsea & 2008/09-Champions League-3rd place-Chelsea: Given it to Guus Hiddink and not Luiz Felipe Scolari
o 2011/12-Champions League-1st place-Chelsea: Given it to Roberto Di Matteo and not Andre Villas Boas• Portugal
o 2004/05-Portugal-2nd place-Porto: Given it to caretaker manager Jose Couceiro who was appointed in February and not Victor Fernandez
o 2010/11-Portugal-3rd place-Sporting Lisbon: Given it to caretaker manager Jose Couceiro and not the first manager of the season Paulo Sergio
o 2011/12-Portugal-3rd place-Sporting Lisbon: Given it to caretaker Sa Pinto and not previous manager Domingos Paciencia

SHOUT OUTS
And lastly, I want to thank everyone who may have retweeted my link to this post. I don't have many followers, so it’s difficult for me to get the word out without shamelessly spamming Guardian comment sections or begging more influential tweeters for retweets. So thanks again everyone!

Sunday, March 17, 2013

We’ve got two weeks of international football coming our way, so I thought I’d do something on the World Cup and who gets to qualify for it. Specifically, whether certain confederations (to borrow FIFA’s terminology) like UEFA or CONMEBOL (the South American equivalent of UEFA) are over-represented at the World Cup.

(Before I get into this, I want to let everyone know that I don’t really have an agenda, I’m just doing this because it seemed like something worth doing. I mean, the headline could just as well have been something dry like ‘World Cup Qualification 1930 – 2014’.)

I tried to use various criteria to judge how over- or under- represented a confederation is. Population size, area of the globe covered, contribution to the world’s GDP, military expenditure and FIFA’s own world rankings. I go into why I selected these factors and how I calculated them in the methodology portion of this post below. (I could have gone into it now but I didn’t want to drive away someone who may be a casual reader and is just looking to play with the Tableau visualization a bit. Want to make sure she or he gets something out of this post too.)

Simply put, what I’m saying is that, if say, the countries in Africa together have 20% of the world's population, they might have grounds for asking for 20% of the 32 places in the World Cup too. (Or 31 places, given that the tournament hosts qualify automatically.)

WHAT’S IN IT
So what do you get in the visualization? There are three charts- top-left is a bar chart which shows us how many slots out of the 31 available should be given to a confederation according to the various criteria mentioned. You use the drop-down menu to select a confederation, and the bars will resize according to the confederation’s ‘strength’ in the respective field.

If you look to the right of the confederation drop-down menu, you will see an option for ‘Actual no. / percentage’. If you select ‘percentage’, what it does simply is recalculate with the base as 100 instead of 31 in the ‘Actual no.’ option. Incidentally, changing that will affect both the bar and the pie chart to its right. The pie chart is nothing but the distribution of seats among the various confederations according to the criterion you choose in the drop-down menu above it.

Finally, the line chart at the bottom gives us the number of slots awarded to each confederation over the years. So you tick the boxes of the confederation you’re interested in learning about and lines will appear along with a colour legend to let you know what’s what. Again, you can get the actual number of slots competed for or you could get the percentage of slots awarded. In this case, using percentage makes the figures comparable across years, because the base kept changing as the tournament got bigger, from 14 to 22 to 30 and now 31. So using percentage instead of the actual number of slots gives us a truer picture of how confederations have been treated by FIFA over the years.

WHAT TO LOOK AT
If you want some initial advice on what options to choose, I have just two words for you--dig in! Now you can pretty much guess what most options will result in. (With two exceptions that I will come to later.) For example, if you choose population for the pie-chart, you know that it will grant the majority of slots to the Asian federation because China’s there. Or that if you choose military expenditure, CONCACAF (Confederation of North, Central American and Caribbean Association Football) will get most of the slots because of the US and its huge defense budget.

What I did find interesting was that if you take the UEFA option in the bar chart, according to every single criterion I use, the 13 seats Europe has been allocated is more than it deserves. (That should be the default view you’re presented with as your data viz loads.) Apart from Europe, if you click on Oceania and look at the FIFA ranking, Oceania is surprisingly under-represented. Long the forgotten step-child of FIFA, its winner having to compete every 4 years in a playoff with a nation from Asia or South-America depending on which side of the bed Sepp Blatter’s gotten up from, it seems its member states have actually been doing well enough for the region to get its own automatic qualifying spot.

THE PLEA
The way things are now, spots at the World Cup are gained and lost through a long attritional process of negotiation and horse-trading and there is hardly any transparency to the procedure at all. There is no periodic reassessment of the slots a confederation is awarded, in the way UEFA does when it takes a Champions League spot away from Serie A and gives it to the Bundesliga; not because the German FA haggled harder but because German teams performed better and a consensually-agreed upon statistical formula rewarded them for that.
I realize that this is an issue that most people aren’t really aware of but if I’ve made this an issue that people discuss, or at least think about, then the purpose of this blog-post is served.

Things are going to get a bit boring from here on out, so if I’m already testing your patience with this long blog post, you can get on with the rest of your day. Thanks for stopping by!

All that I did was calculate weights according to how much each confederation contributed to the world’s area, its population, the GDP (PPP) and the military expenditure. Got the figures for the first three criteria from the CIA World Factbook and the military expenditure data from the SIPRI military expenditure database

Now why did I choose these criteria? I guess I took the ‘world’ in ‘World Cup’ a little too literally, and was determined to find out how much of the world was actually represented at the tournament. So factors like area covered and population size seemed natural indicators to use. GDP (PPP), I guess, was used as some sort of proxy for economic power and military expenditure as a proxy indicator for political power. I still have reservations over using military expenditure but in the absence of another readily-available indicator I could borrow to represent political influence, SIPRI’s data will have to do.

I kind of anticipated the objection people would make that the World Cup is not just about representation but also about merit and about the world’s best teams playing each other. So I used FIFA’s ranking data to arrive at some kind of meritocratic measure.

In order to arrive at the strength of a confederation, what I did was calculate the average number of points of the top 10 nations in each confederation and used that to arrive at a weight. Now I think that’s a relatively unsophisticated but still reasonable way of going about it but if anyone has a different and better idea of how it should be done, do let me know at ultimateposeur@gmail.com and I'll make sure to incorporate that method in the next visualization I make (whenever that is). I'd also be interested in seeing your take on this and in fact, I would welcome it if you could use the dataset provided and make your own graph, chart etc. with your software of choice.

The historical data for the line chart I got from, where else?, RSSSF.com.

FOOTNOTES INDICATOR-WISE
SIPRI military data
--Now all the SIPRI figures are from 2011 and I've taken a few shortcuts that academics might have heart-attacks over, such as using 2009 figures when there are no 2011 figures available for a country. Now this isn't meant for publication in an academic journal, so I think getting some sort of idea is better than having no insight at all.
--This is what I did, I took the figures for
Central African Republic from 2010, Benin from 2008, Equatorial Guinea put at 0, Iceland from 2009, Iran from 2008, Reluctantly put North Korea at 0, Libya from 2008, Luxembourg from 2007, Malawi from 2007, Mauritania from 2009, Myanmar at 0, Somalia at 0, Sudan from 2006, Qatar from 2008, Tajikistan from 2004, Turkmenistan from 1999, Uzbekistan from 2003, Yemen from 2008. Also, countries put at 0 are most likely not at 0, did it that way because SIPRI didn't have figures for them.

Population indicator
--Used figures from Gaza and West Bank in CIA Factbook for Palestine
--Used figures from French Polynesia in CIA Factbook for Tahiti

Historical timeline data
--Playoff places are counted as 0.5, seems to be the best way to deal with that problem
--What created additional problems for me was that FIFA used to adjust qualifying places according to which continent was hosting and whether a country from that continent was the defending champion. For eg. In Italia 90, countries from South America were competing for 2.5 places, instead of the 4 places on offer in Mexico'86 because Argentina was the defending champion. So then the Q. is whether I should consider CONMEBOL as up for 2.5 slots or 3.5 slots including Argentina?
--Up to and including 1982, Oceania didn’t have a separate group of its own but instead was treated as part of Asia. There were quasi-Oceania type groups though in the Asian zone for the 1978 and 1982 World Cups but not before that.
--Africa only got a separate slot of its own from 1970 onwards; there were combined Asia, Africa and Oceania groups before that.
--Just 13 teams were at the 1950 World Cup, but a lot of teams that had otherwise qualified withdrew, so I'm going to treat it as the 16 team tournament it was meant to be
--1938 was meant to be a 16 team tournament but only 15 teams competed
--In the inaugural World Cup, there was no qualification, so I'm going to just count the affiliations of those who were invited

COMMENTS INVITED -- BE NICE!
If you have to be critical, be gentle, imagine that I'm a friend standing in front of you and you're trying not to hurt my feelings but still hope to point out where I went wrong. Please don't use the fact you're not with me in person as a license to be mean!

Monday, June 11, 2012

Ok, maybe not the Guardian's homepage but have gotten on to Guardian football's homepage where they've listed my post on pre-tournament odds as one of their favourite/favorite things this week. Visual proof below!

Big thank you to Sean Ingle ( @seaningle ) the sports editor of Guardian online (and their resident betting aficionado!) for putting me on there. Has definitely made my day!

The interactive dashboard below helps you explore the 2011-12 season data for goals and assists of players from clubs in the first divisions of 8 leagues - England, Spain, Germany, Italy, Russia, Portugal , Holland, France.

This includes goals and assists for those clubs in all the domestic (Leagues, Cups, Supercups) and European (Champions & Europa League) competitions the clubs were involved in.

You can filter the data by league by ticking or unticking boxes, and you can also build custom lists and compare your favourite players and clubs using the two search boxes.

Imscouting.com, which is where I got the data from, doesn't give you the number of appearances each player made but they do note how many minutes they played. So I used that data to see how effective players were in scoring goals and making assists in a 90-minute period, which would be the duration of a typical match.

The default setting for minutes played in 2011-12 season is at 1000 min. , you might want to lower or raise that.

As mentioned before, all of this data has been obtained from the imscouting.com website.

----Have not taken the Club World Cup into account, which is why Messi is at 71 and not 73 goals in my dashboard.

-----Have taken some justifiable liberties, such as restricting players to a single club. So if like the Austrian player, Marc Janko, you transferred from Twente to Porto, only your current affiliation is listed.

---Do note that imscouting's data regarding assists is different from the opta stats used by whoscored.com etc. For eg. Fabregas is at 18 assists according to Opta but he's at 13 according to Imscouting.

Wednesday, June 6, 2012

Had created a graphic a month back comparing Arsenal's performances under Wenger to the champions of each Premier league season.Kept on hearing that this has been arsenal's worst season from all the pundits, so I wanted to see what measure would truly tell us what has been arsenal's worst season.

So what you'll see from the graphic below is the number of points we were behind the title winners each season (the figures in white font with the red bars) and also a truer comparison, the line chart above the bar chart, our points as a percentage or the winning points total.
So even if were 12 points behind in both 2004-05 and 2010-11, you can see that, on a percentage basis, we did worse in 2010-11 (85 per cent of winner's total compared to 87.5 % in 2004-05).

The surprising thing from this graph is that in terms of percentages and absolute points, 2005-06 was our worst ever season. We were 24 points behind and we were at the lowest percentage of 73.6 . (In case, you're wondering, in the season that's just gotten over, 2011-12, the percentage was 78.5, which is the 3rd worst.)
Of course, some of you might rightly point out that Arsenal got to the champions league final that year, so that year can't really be termed our worst. And you might be right. In terms of premier league points though, you can't deny it.