Updated every Monday, Wednesday and Friday ... and maybe other days too.

Friday, February 28, 2014

Elo DWP?

I played at Hampstead a couple of weeks ago. Although my score of +2 =0 -2 might not sound that bad, in truth I played extremely badly and quite deservedly failed to match better outcomes I’d achieved at similar events in January, November and October. Frankly, it was not a successful weekend for me at all.

It wasn’t as bad as FIDE would have you believe, mind.

There’s a standard narrative that you get when you raise the question of whether there’s systemic deflation in the elo rating system. It’s the kids. They improve too fast and the system can’t keep up. Especially now elo ratings go down to zero.

You play a junior, especially one of the very young ones, and since their rating doesn’t reflect their true playing strength you can lose a tonne of elo points even if you lose to somebody who, objectively speaking, is about your strength. And even if you do happen to win you don’t get your just rewards.

So underrated juniors leads to underrated adults. Listen carefully and you can hear the low hiss of rating points leaking out of the system as the spiral continues ever downwards.

That, anyhoo, is the argument. And it’s not an unattractive thesis either, given that it’s not in the least bit difficult to think of an examples of absurd grade-rating differentials - the kid who has an ECF in the 180s whilst retaining a FIDE in the 1880s, say.

It’s all down to the youth of today, then? It’s a reasonable argument, for sure, and yet I’m not convinced that’s all that’s going on.

I won two games at Hampstead and lost two. That’s according to the tournament. According to the rating system I only won one. My first round opponent didn’t have a rating so the game didn't trouble the raters.

Them’s the breaks? Perhaps, but the same thing happened in January. And November. And October. And two of my games at the Hampstead Open back in August didn’t count either. Which, as it happens, is the same number of games that didn’t count for rating from the nine I actually played in Penarth.

So in the past six months or so, I’ve played 31 games in rated tournaments of which eight didn’t count for the ratings calculations. And my score from those games? +7 =1 -0.

Little wonder that my elo has been in freefall while my ECF has been on the up?

Maybe this would help

Is it possible that I’ve simply had an unlucky run over the past couple of months? Perhaps, but I don’t think so. In fact my experiences at the Hampstead tournaments matches that of the preceding couple of years. Since my first event in Sunningdale (May 2011) I’ve played a total of 92 games, but only 72 have counted for rating. My score from the missing games has now reached +14 =5 -1.

Secondly, there’s the fact that many of these games are against very inexperienced players who wouldn’t earn me many points even if the victories were counted. That’s true enough, but not invariably so. Amongst the missing I find games against players with long-established ECF grades in the 140s, 150s, 160s and even 170s. And these missing points add up over time.

Finally, it could just be me, couldn’t it? It could, by some statistical quirk, just be an outlier. That it’s happening to me doesn’t mean that it’s happening to everybody else. Missing games might be knackering my rating, in other words, but aren’t necessarily having an impact on the system as a whole. Well maybe not, and I’m open to somebody demonstrating that I’m wrong, but as things stand I remain suspicious.

Yes I know. Ultimately, we play because we love the game and, when it comes down to it, ratings don’t really matter that much.

But, if we’re going to have a rating system we might as well have one that kind of sort of reflects the playing strength of the chessers involved. So this latest missing game at Hampstead might not have cost me much in itself, but that’s not really the point.

How, I wonder, can the system work effectively if games routinely go missing. When this happens at every tournament, the answer, as far as I can see, is that it can’t.

19 comments:

Anonymous
said...

The question is ask, which those with access to the detailed results could answer is " Is there any bias in results submitted for both ECF grading and FIDE rating where a FIDE rated players plays one who isn't? " The answer might be different for Juniors and adults.

I could suspect the disparity for the blog author is a fluke, but it might be that rated did better than 50% against non-rated to an extent not explained by their respective ECF grades. Possible explanations abound, but rated players are likely to have more experience of playing with 30 second increments.

With International titles being rating dependent, a set of robust rules is necessary. Whilst you could treat national ratings as an initial rating, there would always be the question as to whether these were universally trustworthy.

Extending the International scale all the way down to 1000 (not zero) has created a whole series of new problems, which were not present when only the elite or near elite could get International ratings.

Games not getting rated is not a problem with the rating system itself, of course. And you might equally well have lost all those games. if you want to show the ELO system doesn't work, complaining that not all national games and rated under it isn't going to do that. No system will work well if a player's best results aren't fed into it.

well I might, but I didn’t. And I suspect the same is true for most people in my situation.

It might be different for different situations - e.g. for those who play mainly in the 4NCL or perhaps tournaments in other parts of the country - but in London I’d be really surprised if results of rated v unrated were anything like 50:50.

No system will work well if a player's best results aren't fed into it.

>but in London I’d be really surprised if results of rated v unrated were anything like 50:50.

Well, of course they're not. Rated players tend to be more experienced and thus tend to be stronger.

But if those unrated players were somehow miraculously granted a rating corresponding to their strength, you might have won point or you might have lost them.

I'm not understanding your point, to be honest. You seem to be suggesting that you're somehow suffering by some of your games not being rated and that for some systemic reason you would otherwise have gained points in those games. That can't be true, but otherwise I don't see what it I you're trying to say.

I don't think anybody expects the scores in rated v unrated games to be 50-50. What might be more interesting to track is whether the rated players tend to do better in those games than the grading differential would suggest.

The key problem which you have identified is that in Swiss tournaments (unlike All-Play-Alls)games between rated and unrated players don't count for the former. Two or three years back FIDE proposed to address this, but the plan was dropped. Clearly this is a suggestion which needs to be revisited.

Most regular players in Opens and the 4NCL have had International ratings for many years. But below that level, perhaps particularly amongst London League players or SCCU county players, I suspect distribution of FIDE ratings is patchy. If you took a pool of adult players in the 120 to 170 range, entered in Majors and Intermediates, I wouldn't be sure that the average ECF grade of those with a FIDE rating would exceed that of those without. Hence the centralising assumption of a 50% score. Above that, there are relatively few players who don't have FIDE ratings, the British Championship usually has no more than two,and in some years none at all.

I'm still not understanding why, David. Indeed, I don't see, conceptually, how you could possibly rate a game between a rated and an unrated player (at least not until the unrated player has played a few more games).

I'm interested in this as someone who got a rating 3 years ago, only to gain 30 ECF and lose 15 Elo since then. However, looking through my FIDE games I don't have your problem - 49/56 have been against rated opponents. Of the other seven I scored +2 =5 -0 at a 164 performance: a little above my grade but not much. Most of my FIDE-rated games are in South London, or the odd e2e4/4NCL, so I'm not overly surprised most people are rated.

Instead I just seem to consistently underperform by 15-20 points in weekend events, where most of the games are. Not sure why - me trying too hard to correct the rating, congress opposition being more dedicated, or them making an extra effort when playing a 160 with a 1750 rating. Who knows? Two people isn't much of a sample, but one of us must have had some unusual luck...

Well, I wanted to write about the Elo system. Specifically, that - if only for me - it’s busted. I really don’t see me ever getting back to 2050. More likely I’ll get over 200 ECF.

Perhaps I didn’t do it very well, so I’ll try to give a bullet point summary of what I’m trying to get at here.

1. My opinion is that the Elo system is irretrievably broken.

2. People usually say that this is because of under-rated juniors, but my feeling is that games against unrated players is also a deflationary factor.

3. I wanted to show that I’d missed a lot of games being rated and my score from these games was enormous.(You might think this is obviously going to be the case - and I would agree - but I have had somebody argue this point.)

4. Yes it’s true that I only lost a point or two from my game in the first round and that’s hardly a massive difference. What I’m raising is the issue of this happening every tournament. And not just to me - to every (or at least many) rated adult.

So in effect we’re ALL losing about 20 points every six months, just by playing.

5. I wanted to write this all out partly because I wanted to see if anybody would say, ‘well I don’t think so because ...’ or ‘well I’ve played 2000 games since Christmas and every one has counted’. To see if other people’s experience matched mine in other words.

I do see that rating a game against an unrated opponent is problematic, but simply ignoring them is causing problems. At least in areas where there are a lot of unrated opponents to play.

It’s all very well for Roger to say, " there are relatively few players who don't have FIDE ratings”, but that’s just not true where I play.

If you look at the subset of Fide-rated players in English chess and their games among themselves, it is obvious that they cannot all be underperforming. Exactly because there are no rating points that "leave" this subgroup, on average people should perform at their Fide-rating.Of course there can be other issues, like different k-factors for different players and fast improving juniors and whatever.But as I see it, games against unrated players not being rated, cannot have any influence in in/deflation.

'Obvious', perhaps, but demonstrably an incorrect assumption. Come back next week.

As for unrated games not counting, isn’t it ‘obvious’ that a system that counts only a player's losses but only a fraction of their wins (or none at all as the case with me for many many months) will end up under-rating them. When this happens on a systematic basis it leads to deflation.

It's true that I've never played at Penarth or Hampstead, where over half of your missing games come from. My local events are at CCF which is almost all rated players (only 2 unrated games from 25) - take those out and I have 16.1% unrated.

As to why it could vary so much by pool, I'm not sure - maybe because the CCF tournaments are mostly the same people every time, so mostly got a rating long ago? Or maybe it's just in the right place geographically.

Alternatively, other events may be more successful in attracting people who don't play many congresses; in general are your unrated opponents inexperienced, mostly league-only, or something else?

Those losses of yours that were counted by the system, were somebody else's wins. The Elo you lost, was won by your opponents. If the unrated players are ignored by the rating system, then their games don't influence the rating system. If you look at games between rated players there will still be an even number of wins and losses. That's just mathematics. Maybe not obvious, but if there isn't some important fact that you left out, how the unrated players do affect the rating system, then you definitely have to look somewhere else for a reason for deflation.

If you reduce the average FIDE rating by bringing in more weak players, that shouldn't affect the top players, because they don't play them anyway. Indirect effects would arise only if the low ratings were incorrect. That is the allegation of course. In the days when the ratings were cut off at 2000, a player of a 2100 standard couldn't play a rated game against a player of 1800 standard. Once you extend that range, rated games become possible, but the 2100 player maintains their rating provided they make the necessary expected score.