Are the chess World Champions just lucky?

9/4/2013 – Some players – notably Fischer – were expected to win the title and did so convincingly. With others, such as the FIDE knockout world champions, clearly luck was involved. Statisticians like PhD student Matthew Wilson tell us what to expect from world championship matches, based on the previous results of the players, the event format. Part one (of two) of his statistical study.

Are the World Champions Just Lucky?

By Matthew S. Wilson

“A good player is always lucky” – Capablanca

The year is 1972. Fischer has just crushed Spassky by 12.5–8.5.
The way he qualified for the match is no less impressive: in 1970 he won
the Interzonal with a staggering 18.5/23, 3.5 points ahead of his nearest
rival, and in 1971 he defeated Larsen and Taimanov by 6–0. In the
final Candidates match, former world champion Tigran Petrosian could only
score 2.5–6.5 against him. There can be no doubt that Fischer is the
strongest player in the world; how could such dominating results occur by
chance?

But not all world champions won so convincingly. Euwe barely beat Alekhine,
winning by only 15.5–14.5, even though Alekhine had drinking problems
and severely underestimated his challenger. Two years later, a reinvigorated
Alekhine scored 15.5–9.5 in a return match. Was Euwe’s 1935
victory just luck? For a long time Euwe was probably the least respected
world champion, until the FIDE knockout tournaments crowned players like
Khalifman, Ponomariov, and Kasimdhzhanov. Subsequently the knockout tournaments
were denounced as being little more than lotteries and FIDE overhauled the
world championship cycle.

So how do we distinguish between “lucky” world champions like
Khalifman (above) and “convincing” world champions like Fischer?
How can we be confident that the winner of a match is really the best player
in the world? Fortunately, there are statistical techniques to answer these
questions.

Let’s start by analyzing Fischer’s historic 1972 victory over
Spassky. The process has three steps:

Assume that each player is equally likely to win in every game
and then run simulations of the match. Well, actually we need a
bit more information, since a game could also be drawn. What draw rate
should we assume? I analyzed all the world championship matches, and
they seem to fall into three categories. In the “Pre-Capablanca”
era, only 31% of world championship games were drawn. As chess understanding
improved, errors became less frequent and the draw rate rose to 53%
in the games from 1927–1963. Ever since 1966, the draw rate has
been fairly stable at 66%. Fischer’s match belongs to the last
category. Thus, for each game in the simulations, the computer will
assume that there is a 66% chance of a draw, a 17% chance of a Fischer
victory, and a 17% chance of a Spassky victory. Later we will test these
assumptions. After simulating 40,000 matches, we move on to step 2.

Take the result that actually occurred and use the simulations
to find out how likely it is. Fischer didn’t show up for
game 2 and lost by forfeit. If we exclude this game, then he won the
match 12.5–7.5. Returning to the simulations, we then measure
the probability of winning by at least 12.5–7.5. This is done
by counting all the simulations that ended in 12.5–7.5, 13–7,
12.5–6.5, and even more lopsided margins. These results occur
in only 8.3% of the simulations – this number is called the p-value.

Use the result from step 2 to reevaluate the assumption in step
1. At the beginning we assumed that the players were equally skilled.
But if this were true, then there is only an 8.3% chance of seeing a
victory as dominating as 12.5–7.5. This is unlikely, so our assumption
that the players are equal is probably wrong. Instead we conclude that
Fischer is better.

Looking back at Fischer’s candidates matches, there is only a 0.16%
chance of one player winning 6–0 against an equally skillful opponent,
so Fischer’s demolitions of Taimanov and Larsen almost surely represent
superiority rather than luck.

Now what about the other assumptions I made in order to run the simulations?
Is the draw rate really constant at 66% throughout the match? It is easy
to imagine that wins can change the draw rate. Maybe after winning, a player
is energized and more confident, so the probability of winning rises while
draws become less likely. Or perhaps the winner becomes complacent in the
next game while the loser is motivated to take revenge. Decisive games can
surely shift the players psychologically, but do they actually affect the
outcome of the next game? To test this, I examined the results of games
immediately following a win or a loss. In the modern era, the player who
won the previous game has a 15% of winning, a 19% of losing, and a 66% of
drawing the next game. This differs slightly from our assumption of 17%,
17%, and 66%, respectively. However, a simple statistical test reveals that
the differences are not significant. I ran similar tests for 1927–1963
and pre-1927, and reached the same conclusion: there is not enough evidence
to reject our assumptions.

So we can safely conclude that Fischer was better than Spassky. Should
we also find Euwe’s win over Alekhine convincing? Once again, we start
by assuming that Euwe and Alekhine are equal. In the match, Euwe prevailed
by 15.5–14.5. How likely is such a victory? It turns out that if two
equally skilled players play a match under the same conditions, a victory
at least as good as Euwe’s will occur 88.3% of the time. Since similar
victories occur so frequently, there is no reason to reverse our initial
assumption that the players are equal. Euwe did not convincingly show superiority
– his triumph could very well have been just luck. In the rematch
in 1937, Alekhine prevailed by 15.5–9.5, and such a large margin of
victory is seen in only 10.5% of the simulations if we assume that the players
are equal. In other words, the p-value is 10.5%. Typically statisticians
reject their initial assumption only if the p-value is less than 10%, but
here the value is very close. In view of Alekhine’s big victory, it
is unlikely that Euwe is his equal.

However, Euwe is hardly alone in scoring match victories that wouldn’t
convince a statistician. Perhaps Capablanca really did deserve a return
match after losing by 15.5–18.5 to Alekhine, since the p-value is
62.6%. We can’t reject the possibility that Alekhine and Capablanca
were actually equal. Even Botvinnik’s 13–8 against Tal in 1961
fails to impress: the p-value, at 16.6%, is higher than 10%. Karpov’s
+6, –2, =10 versus Korchnoi in 1981 is one of the few exceptions,
having a p-value of 9.1%. Except for Anand’s 3.5–0.5 score against
Shirov in 2000 (p-value = 2.9%!), the FIDE knockout tournament champions
have been quite unpersuasive in their claim to the ultimate chess title.
The best result, Ponomariov’s 4.5–2.5 versus Ivanchuk, still
has quite a high p-value: 32.2%. So are the world champions just lucky?

First of all, we should be careful in interpreting our results. Note that
a p-value of 16.6% for Botvinnik–Tal in 1961 does not prove
that they are equally good players. Rather, it only means that we cannot
reject the possibility that they are equal. So we haven’t proven
that Botvinnik’s 13–8 is just luck. In addition, there are some
factors that the statistical analysis can’t take into account. If
one player consistently outplays his opponent, as chess players we can see
that he is better even if the score in the match is close. But the statistics
only look at the results and not at the quality of play. Most world champions
build up the credibility of their match victories by also dominating elite
tournaments – this is also ignored in the statistical analysis. For
instance, none of Kasparov’s match victories over Karpov were statistically
significant. We can reframe the question slightly: starting from 1985, what
is the probability of winning 3½ matches out of four against Karpov?
The p-value is slightly below 20%. This is still not statistically significant.
But if we combine this information with Kasparov’s numerous tournament
victories and his long reign as the #1 rated player (there are only two
rating lists from 1984–2005 that do not have him at the top), then
it is easy to persuade yourself that Kasparov truly was the best player
in the world at his time.

– In part two of his article, to be published
soon, Matthew Wilson does 40,000 match simulations to assess the chances
of World Champion Viswanathan Anand and Challenger Magnus Carlsen in the
November match –

About the author

Matthew Wilson is a PhD student studying Economics. He graduated
from the University of Washington in 2010 with a B.S. in Economics
and a B.A. in Math, and then earned an M.S. in Economics from the
University of Oregon. He works as a teaching assistant at the University
of Oregon, where he has taught applied statistics (econometrics) and
economic theory. Currently his research focuses on macroeconomic theory
and rational choice.

In chess, he used to be a promising junior, tying for first place
in the Washington State Elementary Championships (grades 4-6). He
frequently appeared in Washington's top ten and the USCF's top 100
for his age group. But now there is not as much time to study chess,
and his rating peaked at 1952.

See also

4/1/2014 – They waited to see who would be the challenger. When it became clear that it was Anand the Norwegian government made an express bid to FIDE to stage the event in World Champion Magnus Carlsen's home country. The dates are not fixed, but the venue is – and it is spectacular: the match will be played on oil rig in the North Sea! We have an interview with the Norwegian Oil and Energy minister.Discuss

9/28/2013 – Surely, we venture, you have not yet seen a video of the legendary American World Chess Champion analysing a game. Well, here is a rare example: Bobby Fischer explaining a miniature of Capablanca. We have reconstructed his analysis so you can watch the video and simultaneously replay the moves he is talking about on a JavaScript board. Historical footage.Discuss

Discuss

Today on playchess.com

3/4/2015 – Ideas and strategies in Grandmaster games can be quite instructive. IM Merijn van Delft presents games like
these every wednesday at 8 pm. This is how you learn to play like a Grandmaster.
Become Premium Member!

News

ChessBase 13 is a personal, stand-alone chess database program that has become the standard throughout the world. Everyone uses ChessBase, from the World Champion to the amateur next door. New functions: ChessBase-Cloud, improved repertoire function etc.

In the classical system of the King’s Indian White develops naturally and refrains from chasing ghosts looking for a refutation of Black’s set-up. White instead relies on the fact that natural play should yield him a small but lasting advantage.

The polish GM Michal Krasenkow presents a repertoire based on the Noteboom and the Stonewall. Black’s set-up may lead to a whole range of different and interesting positions, which help the black player to broaden his strategic and tactical understanding.

WCh-match Sochi 2014 with a survey by Marin and video analyses by King. Hightlights from the London Chess Classic and Qatar Open with annotations by Yu Yangyi. Training on tactics, strategy and endgame. 12 opening articles with new repertroire ideas!

Avoid long theoretical battles by playing 3.Bb5 or 3.b3/3.g3 against the sicilian. Lorin D’Costa and Nick Murphy show you, how you play for an advantage in these systems and how to punish knowledge gaps of your opponent.

Opening expert and former World Champion Rustam Kasimdzhanov shows you the secrets in the McCutcheon (3.Nc3 Nf6 4.Bg5 Bb4). He shows why Bd2 is no longer the main line and which continuation will be the theory of tomorrow.