I have a set of questions about the calculation of FIDE ratings. The questions are about the sequence in which games are rated and how different tournaments and leagues decide which grading list to use when rating their events. The post is a bit long, but there is a summary at the end.

The questions arose when I realised that the FIDE rated games I've been playing this year have been rated out of sequence to the order in which they were played, and some of the grading changes due to play in league chess (4NCL) have been delayed by some considerable period of time, which doesn't make too much sense (IMO), when FIDE issue rating lists every 2 months. But maybe this is an unavoidable consequence of the system.

This seems to have only happened recently because I've been playing in FIDE rated events that take place within the period in which the 4NCL takes place. So the games from those tournaments are rated in an earlier list to the ones in which the 4NCL ones get rated (at the end of the season). But I can't figure out how the system can cope with this.

Anyway, games I've played have been rated in four of the last 5 rating periods:

Five games from the September 2010 Sunningdale Major were rated using the 1910 rating I had then. New rating of 1933 in the November 2010 rating list. That's easy enough to understand. Similarly for the January 2011 list, where a game from the November 2010 Brighton Major got rated, giving rise to the new rating of 1937. Nothing in the March 2010 list. Then 3 games from the April 2011 Surrey Major, bringing the rating down to 1929 in the May 2011 list.

That's all fine so far, but then the four 4NCL games I played on the weekends of 15/16 January 2011 and 26/27 March 2011 appear in the rating summary for the July 2011 list:

From what I can tell, those games have been rated using my rating from the November 2010 list (1933), but the ratings given on the 4NCL website for the matches in question appear to be the rating from the list in force at the time the games were played (1937, from the January 2011 and March 2011 lists) - two examples linked below:

1) My first question is what is happening here? Are the 4NCL games rated using the grading from the list published at the start of the season? And is the rating on the 4NCL website just for "board order" purposes? (The ratings of my opponents can be different on the FIDE rating page when compared to the rating published on the 4NCL website - e.g. Jameson is 2088 on the 4NCL page and 2054 on the FIDE page).

2) My second question is how can a change based on a previous rating (1933) be applied to the current rating (1929) when that rating is different? I suspect the answer is something fundamental to do with how the FIDE rating system calculations work, as I had no idea that the system could cope with that sort of retrospective rating kind of thing. How does this work?

Also in the rating summary for July 2011 is the rating change from the May 2011 Sunningdale Major, which is based on the 1929 rating from the list current at the start of that event (i.e. the May 2011 list). The next set of rating calculations should be from an event from June 2011 (held in Liverpool), so if that gets onto the July list it will appear on that page I've linked to.

3) That leads to my third question. If an event you are playing in is using ratings in their published documentation from a previous grading list (March 2011 - this is quite common if an event receives entries over the course of several grading lists and they don't update their records), is it possible that the rating will be done using that earlier rating or should it be done with the one from the list in force at the time of the event (which would be the May 2011 list in this case)?

4) The fourth (and final) question is about chronological order. When games are submitted at the end of a league season (as for the 4NCL above), it seems the system is blind to the chronological position of these games compared to others from shorter events (the system doesn't actually know when they were played, and for the rating purposes it seems they are counted as being played at the time of the list from which the rating used for the calculations is taken). But can this introduce errors into the system? To take the examples from my FIDE rating history, the games from the Surrey Major (April 2011) were played after the games from the 4NCL (January 2011 and March 2011), but were rated before those games were rated. This seems a little bit confusing.

Summary: 4NCL games rated with rating from November 2010 list (1933), played during the period when the January 2011 and March 2011 lists were in force (rating of 1937), but rated for the July 2011 list (contributing to a provisional change from 1929 to 1924). So my actual rating of 1937 at the time of the 4NCL games doesn't seem to have been used at all in the calculations. Is that an accurate summary of how rating for league seasons works in an era of rating lists published every 2 months?

The critical rule is this: for anything rated as a single event, the ratings on which the event is rated are those in force on the start date of the event. So as the 4NCL season (which is treated as a single event) started in November 2010, the November 2010 ratings are what is used.

(The 4NCL treats the entire season as a single event so you can get a FIDE rating or a title norm out of it; its board order rules, however, are nothing to do with the rating system, and it chooses to use current ratings for those.)

Simply, games are rated at the end of an event based on the ratings in play at the start of the event. Hence the 4NCL is rated in July based on last Novembers rating list. Most events of course start and end in the same period. Each event is essentially a self contained unit, and is ambivalent to rating changes from other events whenever they occur.

Thanks (to both Sean and Jack)! That makes things a lot clearer (and I'm sure I could have asked the questions with less typing!). I do have one more question, which concerns whether someone with a part-rating at the start of an event, who completes his rating during the event, counts as a rated or unrated player in the calculations of the rating changes for those rated players he played?

Christopher Kreuzer wrote:Thanks (to both Sean and Jack)! That makes things a lot clearer (and I'm sure I could have asked the questions with less typing!). I do have one more question, which concerns whether someone with a part-rating at the start of an event, who completes his rating during the event, counts as a rated or unrated player in the calculations of the rating changes for those rated players he played?

He was unrated at the start of the event, so he counts as unrated for the entire event for his opponents. However, his own games will be rated as a rated player and he will gain or lose points based on his performance relative to his newly acquired rating.

Christopher Kreuzer wrote:Thanks (to both Sean and Jack)! That makes things a lot clearer (and I'm sure I could have asked the questions with less typing!). I do have one more question, which concerns whether someone with a part-rating at the start of an event, who completes his rating during the event, counts as a rated or unrated player in the calculations of the rating changes for those rated players he played?

He was unrated at the start of the event, so he counts as unrated for the entire event for his opponents. However, his own games will be rated as a rated player and he will gain or lose points based on his performance relative to his newly acquired rating.

OK, thanks. That doesn't help in my case, as I beat him. I'm actually slightly disillusioned with the way the FIDE rating system works, as if you are playing at a consistent level it seems that there is not much change in the rating, and there are too many under-rated players around for the system to work properly.

In other words, I can sometimes do OK against high-rated players, but there are enough under-rated players floating around in the 1700-1900 range to ensure that I will lose to them every now and again and that undoes most of the rating gains made. Another thing I've noticed is that if you have two under-rated players of the same strength playing each other, and they get the draw you would expect, that merely reinforces the too-low rating they both have.

When I first talked to various people about the FIDE rating system (this was some 10 years ago, when it was much harder to get an initial rating), they all said that the initial rating you get is critical. Is that still the case with the rating floor having been lowered to 1000?

Christopher Kreuzer wrote:When I first talked to various people about the FIDE rating system (this was some 10 years ago, when it was much harder to get an initial rating), they all said that the initial rating you get is critical. Is that still the case with the rating floor having been lowered to 1000?

The absolute values shouldn't make much difference, except that ratings at the 1000 level aren't particularly reliable. At the risk of being accused elitist, you could probably get from 1000 to 1200 just by reading a book.

The Elo system has a distinct memory which means you don't plummet if you get a run of bad results. Equally it can be hard to convert improvement or a run of really good form into rating points.

I've seen players objectively in the 1950 - 2150 standard come into the rating list at well above or well below their expected standard as measured by domestic ratings. The k=25 rule coupled with 4NCL board order rules and seeded Swiss pairing does in practice align the rating to their peers reasonably rapidly.

If you are targeting CM at 2200, FM at 2300 or IM (rating requirement) at 2400, the higher you start the better, provided these are realistic aims for your playing strength. There's still the hurdle of needing to score at least one point in your very first event. If you don't you start again from scratch. This caught at least one player in this year's 4NCL .....

Christopher Kreuzer wrote: if you are playing at a consistent level it seems that there is not much change in the rating

To be fair, I think that's the point

I think what I was trying to say is that if you start off as under-rated and then play at a consistent level (or play up and down to get an average that is consistent), it can take a long, long time for the rating to adjust to your playing strength. Say you normally play to 2000 strength, but get an initial rating of 1900, and then play at an average of 2000 strength for the next 30 games at a k-factor of 25. Doing the calculation in blocks of 5 games (per event or per rating list), you get the following changes over the first 30 games (rounding up and down for simplicity):

So 30 more games later, you are just about beginning to correct the initial rating. That is 60 games. This is OK if you play lots of FIDE rated games every year, but if you only play 10 or so rated games a year, it can take a long time to see any impact on the rating. Though to be fair, at the 1900-2000 level, being 100 points off in terms of rating is not the end of the world. At other levels it might be a different story.

And of course, it works the other way as well. If you have a bad run, the impact is not ruinous. And if you start off over-rated, it can take a long time for your grade to settle downwards to a more correct level. My conclusions are that I'd look at the activity of a player and the trend of their results to get an idea of whether their rating is accurate or not.

EDIT: It is worse, of course, when an initial rating is obtained against under-rated players. You may draw against someone of your own strength, but 'gain' their low rating. Though drawing against an over-rated player of the same objective playing strength has the opposite effect. As long as you face an even mix, it should balance out, but if you don't it won't. I suspect that under-rated players may play slightly more than over-rated players (who may try and 'protect' their rating), so the under-rated players will 'breed' more under-rated players from the pool of unrated players, and you have an excess of under-rated players producing deflationary pressure.

Roger de Coverly wrote:I've seen players objectively in the 1950 - 2150 standard come into the rating list at well above or well below their expected standard as measured by domestic ratings. The k=25 rule coupled with 4NCL board order rules and seeded Swiss pairing does in practice align the rating to their peers reasonably rapidly.

k-factor=25 helping I can understand, but how do 4NCL board order rules and seeded Swiss pairings help?

One more question. Both Sean and Jack pointed out that league games are graded as a single event, with the 4NCL games "rated at the end of an event based on the ratings in play at the start of the event". My question is whether this is strictly valid for rapidly improving players, whose playing strength can change over the course of a league season. When you have rapidly improving players, don't you want their grades to be recalculated as much as possible to ensure accuracy (hence the rating lists being published every 2 months)? Or to put it another way, would those who devised the system have envisaged a set of games played over the course of 7-8 months being rated within a system where new grading lists are published every 2 months?

Christopher Kreuzer wrote:
k-factor=25 helping I can understand, but how do 4NCL board order rules and seeded Swiss pairings help?

The board order rules in the 4NCL require you to play below a player who out-rates you by 80 points. But if you are a hundred or more points under-rated you will be at the tail of the team and potentially against easier opposition.

In a large seeded swiss, you always seem to play players 100 or 200 points above or below you. So again you have the opportunity to gain or lose a big haul of points. If you are 200 points under-rated and you play someone 200 points above you, then even a draw (which is the expected result on strength) is worth 6 points at (k=25).

There were a couple of players in the Torquay British of 2009 who illustrate this. One whose initial rating was about 200 points higher than his current one, and another whose rating was 200 points lower.

Christopher Kreuzer wrote:1900 + 3.5*5 = 1918
1918 + 2.75*5 = 1932

The trick is to play lots of games whilst as low rated as possible. That way you get 3.5*5 . This maximises your take. It's increasingly more difficult to do this, as the frequency of rating publication increases. Allegedly starting from a low point and playing lots of games whilst on good form was a method used to get the rating qualification for IM and GM titles.

Christopher Kreuzer wrote:Or to put it another way, would those who devised the system have envisaged a set of games played over the course of 7-8 months being rated within a system where new grading lists are published every 2 months?

Elo designed the system to cope with conditions in the USA where leagues don't exist. When in 1970, the system went international, the updates were annual or semi annual, so an entire league could fit into one list or at the most two.

Stewart Reuben has written elsewhere that the treatment of leagues when rating lists are frequent is still evolving. The basic problem being new players and the rules which qualify a first part rating. At the moment you need a score of at least 1 from at least 3 games. You cannot do this in the context of a league where at the most 2 games are played in 4 of the 5 weekends. Hence the need to hold back the 4NCL rating to the end of the season.

In other countries with leagues that are spread across the season, some ignore the issues of players seeking part ratings. The French do it by weekend, whilst the Germans rate in bulk at the end.

Roger de Coverly wrote:There were a couple of players in the Torquay British of 2009 who illustrate this. One whose initial rating was about 200 points higher than his current one, and another whose rating was 200 points lower.

Interesting! Thanks for this and the other reply.

Roger de Coverly wrote:The trick is to play lots of games whilst as low rated as possible. That way you get 3.5*5 . This maximises your take. It's increasingly more difficult to do this, as the frequency of rating publication increases. Allegedly starting from a low point and playing lots of games whilst on good form was a method used to get the rating qualification for IM and GM titles.

I never knew that. Wonder if you can spot this sort of thing in the history of the rating lists?

Christopher Kreuzer wrote:I never knew that. Wonder if you can spot this sort of thing in the history of the rating lists?

For k=15, at around 50 games, you could overshoot.

Using the tables at http://www.fide.com/fide/handbook.html? ... ew=article , you can observe that for a rating difference of 80, the lower rated player is expected to score 39% and for a rating difference of 200, the expected score is 24%. So if the lower rated player is in fact the equal of those above him, every game by the 80- player could score 15 * 0.11 = 1.65 and the 200-, 15 * 0.26 = 3.9. Divide these into 80 and 200 tells you how many games you need to catch up, so it's 80/1.65 = 48 and 200/3.9 = 51.

At k=25, it would be 2.75 per game and 6.5 per game. So it would be 29 games and 31 games.

It's one of the mathematical properties of the theory that the number of games you need to get to your "proper" rating is a function of the k rather than the amount by which your rating is wrong. FIDE's implementation uses tables rather than formulae, so you get rounding effects.

The ECF system gets you there in thirty and doesn't give you extra credit for more games at the same standard. So K=25 within a rating period corrects a rating using the same number of games as the ECF system would need. The ECF system suffers publication lags of course. If the new grading list came out on the first of June each year, we wouldn't think of it as being quite as out of date as when it appears on 1st August.