World Cup Prognostication

June 29, 2010

On Saturday morning, inspired by Andrew Moylan’s article on Wolfram’s Blog, I sat down to work out a simulation of the knockout stage of the World Cup competition. I used the bracket shown at right, and found elo ratings of the sixteen teams, as of that morning, at Wikipedia:

The table shows that there are forty-four national teams with ratings higher than Slovakia’s rating of 1654; they are lucky to be in the tournament.

The likelihood that a team will win its match can be computed from the elo rankings of the team and its opponent according to the formula . Thus, the United States had a 60.5% expectation of winning its match against Ghana this afternoon, and Ghana had a 39.5% expectation of defeating the United States. Harrumph!

Every time a match is played, the elo rating of a team changes. The amount of the change is based on the actual result as compared to the expected result. If a team wins when they have a high expectation of winning, their elo rating goes up by a small amount, since they were expected to win. However, if a team wins when they have a low expectation of winning, their elo rating goes up by a large amount. The formula is , where K is a weighting for the importance of the game (K is 60 for the World Cup), G is a parameter based on the goal differential (we’ll assume that all games are won by a single goal, so G = 1), W is 1 for a win and 0 for a loss, and We is the winning expectation calculated by the formula given above.

Your task is to use the data and formulas described above to simulate the knockout stage of the World Cup a million times and report the number of times each nation wins. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Hm, my code doesn’t seem to want to show up. Phil, could you have a look to see if it ended up in the spam queue? The only other explanation I can think of is that the source code highlighter doesn’t like the pragma in my code.

I re-ran my simulation after the quarter finals. The new Elo rankings put Netherlands and Spain in a tie for first place with 2085 points, Brazil dropped from first to third with 2072 points, Germany in fourth with 2044 points, Argentina in fifth with 1940 points, and Uruguay in sixth with 1895 points; the United States drops to twenty-fifth with 1749 points. Germany is the big mover with an increase from 1930 points to 2044 points following two wins with big goal differentials against higher-ranked teams. The new teams variable reflecting the semi-final bracket and the new Elo ratings is (("URU" 1895) ("NED" 2085) ("GER" 2044) ("ESP" 2085)), and the result of a million simulated tournaments is (("NED" . 378294) ("ESP" . 317881) ("GER" . 231524) ("URU" . 72301)). I have been quite impressed with Germany; they are a young team that is visibly improving with each half they play, and I wouldn’t be surprised to see them win the World Cup or, at least, defeat Spain then lose in the final. However, I am sticking to my prediction that the winner of the Brazil/Netherlands game, who we now know to be Netherlands, will win the tournament; Netherlands are a great side, and the numbers back me up.

Something I don’t quite understand is how you can infer the winner based on a random number (I noticed Remco does something along the same lines too). Are you basically saying the winner of a match is, in spite of the ELO rankings, the toss of a coin? If that’s not the case, are you using actual world cup results to aid in your computations?

Great problem statement by the way (decidedly apropos). Keep them coming!