A researcher from Toulouse has proposed a novel approach that would rank the greatest chess masters more fairly than the ELO system.

In a ring, on a pitch or around a board, a match determines who is the best at a given time. But how can these individual results be incorporated into an overall ranking? Sports federations use a host of different methods.

In the 1970s, the highly rational and mathematical chess community mostly adopted the rating system developed by the Hungarian Arpad Elo (1903-1992). Points are awarded according to the outcomes of matches, and they accumulate without limit. The higher the level of the players, the more points they get. For example, the current world champion, Magnus Carlsen, reached a score of 2 882 points—the highest ever achieved—in May 2014. But does that necessarily make him the best player in history?

The limitations of ELO rating

This system is now being challenged and new ways of ranking players are put forward. In a paper published in the International Computer Games Association's journal, Jean-Marc Alliot, a senior researcher at the Toulouse Institute of Computer Science Research (IRIT),1 attempts to find a technique for rating players according to their talent rather than their sole competition results. While the latter could be thought to reflect the ability of each player, there is more to a game than a victory or defeat.

"The ELO system is based on the idea that the number of points changes according to the outcome of a match and to the opponent's rating," Alliot explains. "Although it works very well, this method does not take into account the quality of the moves. It is perfectly possible to win while playing poorly, as long as your challenger does even worse."

The ELO system does not take into account the quality of the moves.

Using pre-established tables, the difference between the ELO ratings of two competitors can be used to estimate the chances of either one winning. For instance, a player whose rating is 100 points greater than their opponent has a 64% chance of winning, and a 95% chance if their rating is 500 points higher. Points are gained when a player does better than expected—and lost when they don't do as well.

Earlier rating techniques that co-existed before the ELO was adopted were also based on the same principle, including the Ingo system, in which the best players had the lowest score. At no time was the quality of play assessed. Ratings were also subject to a form of inflation and drift, making it impossible to use them to compare competitors from different eras.

The problem is even more blatant in other sports. For instance, in the well-known ATP rankings in tennis, points are awarded according to tournament results over the previous 52 weeks. It doesn't matter whether a player smashes the world's number one or a series of withdrawals eliminates their competitors: winning a grand slam tournament will earn them 2 000 points either way.

The Norwegian Magnus Carlsen playing against the Cuban Leinier Dominguez during a tournament.

The Norwegian Magnus Carlsen playing against the Cuban Leinier Dominguez during a tournament.

Taking moves rather than matches into account

Alliot's system therefore prefers to rank players according to the quality of their moves. To achieve this, he relies on a rather special chess master, called Stockfish. The world's best chess engine, which is free and open-source, easily beats the most talented human players. Its choices can thus be considered as "almost" perfect, which means that a competitor's decisions can be valued according to how closely they match those made by Stockfish.

Many players prefer to go for the simplest move rather than the best.

Although statistical comparisons between human and machine players have already been attempted, they lacked the computing power available today. In this case, all 26,000 matches played by every world champion since the reign of Wilhelm Steinitz (1836-1900), the founder of modern chess, were retrieved. Installed on the OSIRIM2 supercomputer in Toulouse, Stockfish took 62,000 hours of computing time to evaluate more than two million positions.

"There is however one pitfall," Alliot points out. "Who is the best player, the one who is generally better but occasionally makes serious mistakes, or the one who never makes quite the best move but avoids major errors? In addition, when victory is guaranteed, many players prefer to go for the simplest move rather than the best."

DroidFish is an Android open-source chess programme based on Stockfish.

DroidFish is an Android open-source chess programme based on Stockfish.

The programme assesses each position in terms of points, depending on how many more or fewer pieces a player has, compared with their opponent. This evaluation remains constant if the best move is played, and it is possible to determine the extent to which champions diverge from it.

A more predictive ranking

These statistics establish the probability of a player doing a wrong move in a given situation, making it possible to simulate matches between all the members of a sample. Since many of the competitors selected for the study actually played one another, these estimates could be compared to the official results. The method not only works very well but it also improves on the predictions based on the ELO ranking.

Alliot's work compares pairs of players, but does not yet provide a global ranking. It is indeed difficult to rank experts in a coherent order. However, things are straightforward when it comes to the top players: predicted to beat all the other nineteen champions, Magnus Carlsen can legitimately be considered the best chess player of all time. This result bolsters the idea that the overall level has improved, even if Bobby Fisher (1943-2008) comes third, ahead of contemporary players such as Viswanathan Anand.

"The ELO system could be refined, but it has been stable over a long period, as well as being practical and easy to use," Alliot admits. "It would take significant advantages to convince the federations to switch to a new system and modify their rankings. A method like mine still requires numerous checks, and there is certainly room for improvement. However, increased computing power should make it possible to generalize it to all players and compete with the ELO ranking."

Keywords

Share this article

0

0

Author

Martin Koppe

A graduate from the School of Journalism in Lille, Martin Koppe has worked for a number of publications including Dossiers d’archéologie, Science et Vie Junior and LaRecherche, as well the website Maxisciences.com. He also holds degrees in art history, archaeometry, and epistemology.