Chessmetrics, that brings back memories of my first attempt at a serious DB. I used it for some time, but eventually left it behind. When you see players like Gustav Neumann, ahead of contemporary Paul Morphy, you know something's seriously wrong.

The absolute value of ratings has undergone "inflation" over time... just like the "big bang"... because of that it is difficult to compare old ratings to current ratings. No matter how good you are at reconstruction and estimation, it is a losing effort due to the continuing inflation. All ratings are really good for is to compare contemporary players... again the absolute values are meaningless, only the comparison of contemporaries makes any sense. Because elo ratings represent a probability of win/lose/draw based on the difference between 2 players ratings. If players have contemporary ratings that are +/-500 points then the better side has a nearly perfect 1.0 probability of a win. But if one rating is from 1950 and another from 1990, the comparison is meaningless due to the ensuing inflation.

This is true, but there is a legitimate question of whether the rating inflation is actually due to players being stronger now.If you could resurrect Steinitz in his prime would he be able to score well against the current top 10 players in the world?

The United States Chess Federation has one hand fighting the other on this issue.They try to fight rating inflation by manipulating K values and bonus points, but also put floors on peoples ratings to prevent sandbagging for money tournaments.These floors guarantee rating inflation in the case of people who actually get worse at chess because of inactivity, advanced age, alcoholism, or whatever else could cause it.

If I play against someone with a floor of 1800 who's actual playing strength is 1600 then my rating gets inflated by gaining more or losing less points than are actually justified.

Yep, the USCF is between a rock and a hard place. Sandbagging or rating floors? Which is better? Maybe they need a dual system, one for prize consideration and one for actually scoring a game.

The problem with pitting older generations against the newer crop of players is largely fashion. Many quite sound openings go out of fashion. Because of this, it is hard to know who would have the advantage. The style of a Morphy vs the current fashion of a Carlsen or Nakamura? I am sure that it would be a wonderful sight to see... but who would win? Both generations have their own strengths. And I am sure Morphy would like to use the modern tools for match prep... so he would play differently now too.

Also, I remember, years ago, when I played at the local Chess Club... the local favorite used to play rated games (not in any event) against club members so that he could pick up a point or 2. The games were legit, but the local favorite was a master level player and the average club player was around 400 points below. Does not seem fair really to inflate ones rating that way either. In fact, I think there was a player from New Orleans that had his rating "fixed" because of such things.

>The problem with pitting older generations against the newer crop of players is largely fashion. Many quite sound openings go out of fashion. Because of this, it is hard to know who would have the advantage. The style of a Morphy vs the current fashion of a Carlsen or Nakamura? I am sure that it would be a wonderful sight to see... but who would win? Both generations have their own strengths. And I am sure Morphy would like to use the modern tools for match prep... so he would play differently now too.

If older players used their "old fashioned" openings, they would surely get bad results; if they updated their repertoire and prep tools, they'd be different players, than those playing in their time. We're talking Elo ratings, not persons, so for a fair comparison, you have to let them play in exactly the same manner they would, back in the day.

Obviously, if you denied the old players equal prep they would be at a huge disadvantage. In fact, I do not believe any player would accept such match terms. All players, old or new, would insist on equal conditions. To think otherwise is clearly just a thought experiment... not realistic.

I guess you are saying, in effect, the Elo ratings from before their creation are reconstructed from the quality of the games?? If so, then I would agree with you. But then I would have to disagree with how the ratings were reconstructed (or estimated). Elo ratings have never reflected the quality of play. Rather they are based on "expected" future performance, based on past performance.

In isolation there is no such thing as 2800 level play... nor 2400 play. Such a thing does not exist in a vacuum, but only in relation to ones contemporaries.

No, quality and strength are different measures, I'm talking about "reconstructing" the Elo values. You will allow me to keep a lid on the exact procedure, but this is something I did already 7-8 years ago, so that older games could have a home in my DB.

I am not sure I understand the difference between "quality" and "strength".

However, my background is mathematics... mostly statistics. And Elo is based on probablity theory. And I am not sure how that equates to either "quality" or "strength". And not knowing how pre-1970 Elo has been estimated, I am left guessing.

So this discussion is rather pointless. As to this thread, we have come far afield. Personally, I think that assigning low Elo to players pre-1970 is a mistake, and not truly representative of their actual performance... but that is just my opinion, and you know what they say about opinions, everyone has one.

Yes, I agree... strong players have high Elo ratings. But it all depends on the opposition. Players can be quite strong at one venue and weak at another. For example, a club player can be quite strong when he plays at the local YMCA. And then he plays like a "baby" when he gets to the National Open (where all players are stronger than his previous opposition).

So generally I consider raw Elo numbers to be misleading - at the very least. Who a player has been playing tells the rest of the story. For example, when playing against players that have very high Elo ratings, a player is likely to have a rating increase, even if you are not playing any better. This has recently happened to me. I have recently started playing against higher (than me) rated opponents and doing about as well as I did before... so my rating has gone up about 60 Elo points over that time period, plus I was able to earn 2 CCM norms. All due to my opponsition being better, not my play, which has remained about the same as before (at 72 things no longer improve very fast).

The accuracy of the Elo system (as implemented by FIDE) leaves much to be desired, both in terms of the formula used (see this table) and the conditions as they've come to be over the years (entry rating of 1,000 points?!). That said, it's all we have to sort games in a DB, having "quality" in mind as the sorting criteria.

There is a way to quantify quality in a manner that goes beyond the advantage a modern player has in opening preparation.This same method could be used to more accurately approximate ratings of players from the past.

There was an article in Chess Life awhile back that described a method of detecting people who cheat with computers in OTB tournaments for big cash prizes.A strong computer goes over the game and ranks possible moves according to the usual centipawn evaluations, then compares the players moves to this list.

A GM will find the best move most of the time, but tends to make second or third best moves as the time control approaches.Weaker players will routinely play substandard moves.It is possible to create a graph of the players moves compared to the computers move ranking, and these graphs are characteristic of the player's strength.A blunder would show up as a downward spike in the graph, and so these graphs could also be used to determine the overall quality of play in a game.

Elo ratings are just an approximation to gauging a chess game's quality, but it works most of the time, because strength and quality go usually together; a strong player will play better moves than a weak one.

If you want to know about a game's quality, the best thing would obviously be to analyse the moves, but that's not very practical when dealing with hundreds of thousands of pre-70's master games. Reconstructing Elo values is way faster (although not a trivial matter).

As for "going beyond the advantage a modern player has in opening preparation", how would you analyse opening moves? Automated backwards game analysis let's you stop at a given move, to avoid wasting time considering opening theory, but in the case of outdated openings, there might very well be a case for actually passing them trough the grinder.

This may actually work, in-practice, to find OTB cheats... but as a CC player, that heavily relies on the computer for all sorts of assistance, I can tell you that this would not work for any serious analysis, such as used by CC players, we use multiple engines, multiple databases, and in serious games, multiple days, to analyze a given position. My computer alone is currently working on 3 games, 2 of which the computer has been working on for nearly a week. And the favored moves jump all over the place. In fact, I personally use that jumpiness to tell when enough analysis has been done. Of course, move selection in some positions is a no-brainer, edgy positions in the middle game (sometimes in the opening too) well, they are really hard to get "finished" analysis.

The 2 games I am working so hard on are from the WCCC semi-final that I am currently participating in. The games are critical... I hope to at least quality for another semi-final... it is, of course, very hard to quality for the Candidates, but all semi-finalists have hopes <g>.

So back to the thread, as a practical matter, many "strong" or "quality" moves simply are no such thing. So I go back to my point, Elo is based on probability theory, not on undefinable things, like "strong" or "quality". Those things strength and quality are ephemeral, they are extremely hard to pin down. Take Capa as an example, he was a GREAT chess mind, I think he could hold his own against any modern player... perhaps even without modern prep.

Capablanca could play 50+ simultaneous blindfold games, so yeah, great chess mind seems like a pretty fair assessment.

I still think the methodology of comparing moves in a game to moves ranked by a 3400+ strength computer is a sensible way to assign retroactive ELO ratings, but of course it would be time consuming with hundreds of thousands of games. Maybe it would be reasonable to use that method on a subset of the players then generate ELO ratings for the rest of the players based on their results against the subset.

I understand that it is based on probability theory, which actually makes things easier.You could read and understand Mark Glickman's extensive analysis on the subject, but it really isn't necessary.There is canned software that will generate performance ratings if a sufficient number of people in a tournament are already rated.Ozzymandias won't tell us what's under the hood of his retro rating machine, but that's how I would do it.

A simpler way would be to backwards propagate through time the results of players with official ratings against contemporary unrated opponents.Player A was rated 2500 in 1979 and had a 3:1 win ratio against unrated player B. Therefore we assign player B a 2300 rating, and then consider Player B's performance against player C.

Well, that would be pretty time consuming also...

I got around this problem by dumping every game played before 1978 into a database named "Classic Games".

I confess I don't know how the initial elo system began. I have read a lot about how it works, but not the start-up.

However, I assume it has got to be something that would equate to the win/lose/draw percentages of the particular player and the relative elo of the opponents. Maybe go back in the database to 1900 (or whenever) and assign reasonable ratings to all master level players and just rate the games in the database from that point. Probably easier said than done. But it would give you a result based on performance, i.e. a probability based elo.

I wonder if anyone knows how elo got started? They surely must have assigned reasonable values to players based on some sort of ranking system. Digging out the required historical data might be tough to do that.

The technique you describe to catch OTB cheating certainly works to catch _some_ cheaters. I think, if you know that they are trying to use this method to catch cheaters, some cheaters will not be caught. The ones it does catch are probably cheating. But I have been working with computer engines long enough to know that they are not all producing the same moves. Zappa Mexico II produces quite different moves than Stockfish does. And Stockfish will produce different moves depending on how much time you give it. So I think a clever cheater could beat the system.

If the tournament director suspects you of cheating they could ask you to play a few 5 minute games against a master, where your true strength would be revealed.They don't actually need to prove anything. The big money tournaments are run by the CCA, and they can assign ratings for use in their tournaments.They sent me the list recently and I noticed a player from Mexico who had a 1300 USCF rating and 2100 CCA rating. That list is in addition to the USCF floors.

This thread wasn't really about cheating though, and if you asked those different programs to rank moves from 1 to 10 they would likely contain most of the same moves in the top 2 or 3. The probabilistic nature of the method is the percentage chance that a particular human has to make computer strength moves, and I bet someone who scores 55% compared to Stockfish would have a very similar result compared to Komodo. You could narrow the difference further by comparing to the top 2 or 3 moves instead of only the best.