Site Navigation

Site Mobile Navigation

Are Today’s Top Players Better Than 20 Years Ago? Not Necessarily.

Sixty years ago, if two players wanted to prove who was better, the only real way (other than comparing past accomplishments, which were just that, in the past) was to play a game or a series of games.

That all changed with the advent of the rating system, first developed by Kenneth Harkness in 1950 and then significantly improved by Arpad Elo, a professor of physics at Marquette University in Milwaukee. The Elo rating system was mathematically rigorous method to determine each player’s relative ability. It was adopted by the United States Chess Federation in 1960 and by the World Chess Federation in 1970.

Over the years, the rating system has become well established (although there are competitors, such as chessmetrics), and has even been used by other games and sports. While players now routinely ask each other their ratings, and obsess about whether their ratings are going up or going down, their wide-spread acceptance has not provided clear-cut answers for a favorite topic of discussion: Who are the greatest players of all time? And how do players from different eras compare?

Another problem is so-called rating inflation — the apparent tendency of the average rating of all players to rise over time because of some inherent flaw with the system. A good illustration of this can be seen by the average rating of the top 100 players in the World Chess Federation rankings.

A consequence of this inflation is that Sergei Movsesian of Slovakia, currently No. 10, and rated 2751, has a higher rating than the peak rating (2690 in January 1971) of Boris Spassky, the 10th world champion. While Movsesian is a talented player, it would be surprising if he turned up on anyone’s list of the greatest players of all time, particularly ahead of Spassky.

So, is rating inflation real? If it is, can an adjustment be made so that players of different eras can be accurately measured and compared?

Probably not, according to Mark Glickman, who since 1992 has been the chairman of the United States Chess Federation’s ratings committee. “You get into dangerous territory when estimating ratings of the past,” he said. “It’s possible, but it is very, very hard.”

Glickman, who is an associate professor of health policy and management at Boston University, explained that ratings are meant to be a relative measure. “It is possible to say how much better Capablanca was than his contemporaries,” said Glickman. “If it were possible to put Capablanca in a time machine and transport him to this time, it would be virtually impossible to predict how strong he would be.”

Glickman pointed out that contemporary players have access to computers and games databases so they know more than their predecessors and are better than them. But a common argument in discussions of the greatest ever presupposes that, to have a fair comparison, a player from another era would have to have the same resources as players of today. So, using Glickman’s example, if Capablanca were transported to modern times, rather than just dropping him into a tournament, he might first be given months, or even years, to catch up on what has happened since he died in 1942. Then again, the whole idea of computers might so amaze him, who knows how he might react?

Glickman said in the United States, instead of rating inflation, “there is more of a tendency for deflation.” He added, “A rating now connotes a better rating than 10 years ago.” The reason is that young players (who make up more than half of the chess federation’s membership) drag average ratings down. “You have these young players who are improving faster than their rating is moving,” said Glickman.

But what about the World Chess Federation’s ratings, which seem to be rising? Glickman said that the world federation used to have a rule that players needed to have performances of at least 2200 to be rated. In recent years, however, the federation has lowered the performance needed to earn a rating. That has allowed more players, particularly weaker players, to be rated. That may be contributing rating points to higher-rated players who feast on the lower-rated ones. “It permeates through the system,” said Glickman. He added, “The more people there are in the system, the more the ratings stretch out.”

As for who was the best player of all time, Glickman said the analysis of Garry Kasparov, the former world champion, was an acceptable one. Kasparov wrote in “Garry Kasparov on My Great Predecessors” (Everyman Chess) that the rating difference between the No. 1 player and the No. 2 player was a good determinant of who was the most dominant player ever. By that measure, since ratings began, Bobby Fischer (peak of 2785) was No. 1, and Kasparov (peak of 2851) was No. 2.

However, in a 2002 academic paper called “﻿﻿Parameter Estimation of Large Dynamic Paired Comparison Experiments,” published in Applied Statistics, a periodical from the Journal of the Royal Statistical Society, Glickman used statistical analysis to rank the best players of all time. His analysis found that Lasker was No. 1, with a peak year of 1916, Capablanca was No. 2 (peak in 1921), Fischer, No. 3 (peak in 1972), Alekhine, No. 4 (peak in 1930), and Kasparov, No. 5 (peak in 1991).

Thanks for this post– it asks (and tries to answer) some questions I’ve been thinking about for awhile. I’m particularly curious about how we should regard the pre-WW II players in terms of greatness. We can’t quite apply Kasparov’s method since it was an era before ratings. And I also wonder if his idea always works. Is a player LESS great if he were slightly better than two others– all of whom are far and away better than their peers, when compared to a lone player who is much better than all the rest?

I had to smile when reading this, having recently read many articles about the college football BCS system. In spite of the fact that, based on their computer ratings, Florida and Oklahoma played in the “championship” game, there were a few (!) who saw it differently. Namely, Texas, USC and especially Utah with a perfect 13-0 record. Plus their legions of fans. The annual cry for a playoff system started again, and this is for current teams. Comparing teams from history, say Oklahoma in the 1950’s winning 47 straight, seems impossible.

I was surprised to learn that Glickman had done a statistical analysis and concluded that Lasker was the #1 chess player of alllllllll time ( as Ali would have said). I reached the same conclusion last year from studying the statistical results on Chessmetrics and from reviewing the tournament crosstables on Lasker’s performances. He demolished the opposition from 1893 to 1924. It is nice to have some mathematical support for my non-scientific assessment. I am more surprised to see that Kasparov is only #5 on the list. I would have put him second. He also dominated the opposition for almost 20 years which is a remakable achievement.

There is no question that today one must improve steadily to maintain the same USCF rating. The proliferation of books, programs and databases has made the game more accessable but at a price. As skills become more accessable they lose their scarcity value. Thus players get less credit from the rating system for the skills they do have. Three other factors deflate ratings also although many claim that there is actually rating inflation (which may be the case for FIDE but not the USCF).

Most tournaments are club tournaments so many players play the same people over and over. Club members may improve together while their ratings don’t reflect all the improvement. Many players play in a small relatively closed rating pool. If they are improve together but rarely play players outside the club club members’ ratings may not reflect their improvement. It is quite common to gain points in a large open tournament and then give the points back at the chess club. The reverse, except at a few events like the World Open with huge class prizes, is much less common.

The overwhelming majority of active USCF members are now children. This creates a “black hole” of rating points for established players. Once the Elo system envisioned a mean rating of 1500 but now two out of three members are rated under 1000. As they rapidly improve, kids suck rating points out of the pool of players with relatively stable playing skills and established ratings. When they used to start with ratings over 1000 this effect was much less significant.

The third and most significant deflator of ratings is the unspoken and officially denied USCF policy of keeping ratings low. Under the old system (say in the 1980s), someone might lose six out of six in their first tournament and get a rating of say 1350. That rating would drop to a rating floor of 1100 because the player was actually much weaker than 1350. In an era where most players are children, the USCF doesn’t want children’s ratings to drop like stones for fear they will get discouraged and drop out. So instead, childrens tournaments assign arbitrary low ratings like 150 or 270 to new players. Directors often deny doing this. But try playing blitz chess with some players who dominate children’s events but are rated well under 1000. You may be in for a suprise because some of them play much better than the rating below 1000 suggests.

In the 1980s it was rare to even see a rating so low but today ratings under 1000 make up the majority of ratings. The idea is that if a child starts at say 200 he can “only go up” and will think he is doing “well” when he goes up to say 600. But it may take hundreds of games for such young players to get a rating reflecting the fact that they actually have considerable skill. Meanwhile these kids take tons of points from more established players.

Many active players are actually in favor of low ratings if they admit that ratings are lower at all. Club players imagine that they can more easily win class prizes because their ratings are low. Many masters also favor low ratings believing that deflation guarantees that a USCF master will be regarded as a “real” master internationally.

But the USCF does not talk to the adult players who have gotten disgusted by rating deflation. Many players are at least as good as they were 25 years ago but their ratings are 200-300 points lower. It is easy to say that such players just aren’t aware of the errosion of their skills due to age. To test my theory, try having one of these “old timers” show a random sample of Expert or Class A players’ games games from 25 years ago, Then see if you think those players played anything like their ratings would suggest today.

Rating deflation is gradual and most players haven’t played long enough to notice the dramatic change since the 1980s. I would respectfully submit that adult rating deflation cost the USCF tens of thousands of members over the years. Of course the USCF, by not asking people why they dropped out would not have any idea if this is so. The pity is that mathematics is flexible enough to permit the USCF to keep children’s initial ratings low while keeping adult ratings reasonably high.

Some say this would harm the “integrity” of the rating system. However, the same player, playing in two different rating pools might have dramatically different rating performances. As such there is really no such thing as “rating integrity” in that sense. In the 1980s I was almost always rated over 1900 in the US. When I played in seven Quebec Opens, however, I had difficulty maintaining a rating in Quebec over 1700. I was on vacation and more rested than I was at USCF tournaments so obviously the ratings were lower there. Nevertheless some of my chess friends insisted that there was no 200 point difference and I just psyched myself out in Quebec.

It is no answer to simply say that: (1) ratings measure relative strength between players and not absolute strength and (2) stop worrying about ratings. Rating policy is a bit like tax policy, with lower ratings being like higher taxes. Many people want other people to have higher taxes and lower ratings but seek the opposite for themselves. Just as the economic ramifications of tax policy are debated, a debate on rating policy should also be encouraged.

Was it really necessary to turn 1700 into the new 1900 in the last 20-25 years and has the USCF lost adult members as a result? The USCF says the answer is no. But on this subject the USCF policy board may be like members of an imaginary Democratic club who only discuss politics with other Democrats, not Republicans. Is the USCF interested in knowing how many people have decided to confine their chess to unrated or Internet play? Should ratings really “not matter” to players who have spent hundreds of hours studying and fighting for them?

For the past eight to ten years, the USCF has had a “K-factor” that is a function of rating. What that means is that lower rated players see a higher variance in their rating compared to lower rated players. These kids with ratings of 200 and 300 need only win a half dozen games or so (more than they lose) to see their rating increase by hundreds of points. Please see Mark Glickman’s home page or the USCF site for more details on the math behind this. In fact, Glickman even says that USCF ratings are “going back up” again relative to the late 90s and early 00s.

Besides, rating deflation was never so severe as 200 points — 50, maybe, but not 200. Age, on the other hand, may affect a player by 200 points or more (go see Elo’s book, “The Rating of Chess Players, Past and Present”). Elo tracked a number of masters. Or go to Jeff Sonas’ Web site, where he tracks the performance of masters over their lifespan. A drop of 200 points (off peak) is not uncommon after one reaches 60.

I would like to second and expand Neil Wilson’s comment: why can’t rating be done by a computer? The computer program offers these advantages:
1. Consistent strength – no “off days”
2. Fairness – everyone faces the “same” opponent
3. No possibility of collusion
Currently very important exams are given via certified test centers and it would not be hard to imagine several times each year, the USCF could run a “rating” fair where players play a series of games against different programs and based on the results are given a rating classification.

One problem with rating v. a computer is that the ‘computer pool’, although stable in rating, would be a small sample set. So if some players being evaluated had particular strengths or weaknesses that manifest only vs. computer play, then these particular players would have inaccurate ratings. The only way to avoid this is to rate vs. a large sample set.

Unrelated: Yes, it is possible that ratings are inflating naturally. If the high end of the curve is improving, then these players will perform better and their rating will go up. There is no rule that states that the mean and standard deviation needs to be constant over time. Certainly it’s possible that Movsesian is a better player than Spassky ever was.

One more thing: I guess what’s important is to look at how much a top player out-rates his peers at a given moment in time. Everyone ooohs and aaahs over Kasparov’s peak rating (what was it, 2850 or something?). But would that mean anything in a world of rating inflation? No. But Kasparov out-rated #2 (Karpov) by many points. Would Kasparov have outperformed Capa or Lasker if he lived in those eras? We’ll never know. And it is not important to know. –Howard

Responding to Howard Goldowsky, Mark Glickman’s lengthy article was prominently displayed the August 2006 Chess Life, obviously with full USCF approval. See Bill Goichberg’s comments in the October 2006 Chess Life. While I stated that the USCF officially denies any substantial deflation of ratings since the 1980s I simply cannot accept this, after hundreds of tournament games. Moreover, it seems doubtful that Mr. Glickman can be credited with objectivity, since he was almost certainly speaking for the USCF in his Chess Life article.

Naturally I would like to believe that my 1901 rating reflects my current strength as I regained a rating over 1900 in the December 2008 Polgar Chess Club Championship. During the 1980s I played hundreds of games reaching a peak rating of 1970 iduring 1987, before going to law school later that year with a rating of 1912. I doubt I have gotten much weaker during the last 30 years (I’m 55). Yet I probably would not be able to maintain a 1900 rating for dozens of games today, much less for hundreds.

Let any expert or master look at a signficant random sample of my games from the 1980s when I was usually over 1900 and reached 1970 to test my deflation theory. According to the ideas set forth by Mr. Glickman and Mr. Goldowsky, my play during the 1980s should clearly reflect that I played Class “A” strength. But I would expect a young master looking at those games now to conclude that my play then would be barely be at 1700 level now. The best way to test whether or not I am right about this would be to have a strong player look at enough of my games from the 1980s to give an objective opinion. If a master concluded that my play then was on the order of a 1700 player that would support my idea. An objective conclusion that I played to roughly my rating at the time (1803-1970) that would tend to undercut my assertion.

How far the top player was ahead of the second best player is misleading as that depends as much on thestrength of the second player as the top player.

For example, Fischer was further ahead of Spassky than Kasparov was ahead of Karpov but most people would agree that Karpov was a much stronger player than Spassky. Also in the 100 plus games Kasp and Karp played they learned a lot from each other but could not gain elo points from each other.

What's Next

About

In its 1,500-year history, chess has imbedded itself in the world's culture and vocabulary. Ideas, terms and images from the game have long been used as proxies for intelligence and complexity. But chess is more than a diversion. Thousands worldwide play professionally or earn a living by teaching it to children. The Internet has transformed the game, making it easy for players anywhere to find an opponent day or night. Chess computers, originally developed to test the bounds of artificial intelligence, now play better than grandmasters. This blog will cover tournaments and events, trends and developments. Reader comments and questions will be more than welcome.

With an easy draw in the penultimate round, Hikaru Nakamura preserved his lead, while Viswanathan Anand, the world champion, was lucky to escape with a draw against a 16-year-old grandmaster. Read more…