However commensensical it might be, it is not in line with the
workings of the rating system. This system is designed to
guarantee, that a difference of 200 points is a difference of 200
points - no matter the number of players. The numbers you give
for the internet servers are not really interesting - since the
lowest ranked players are certainly (ill-programmed) bots. A more
ionteresting number would be the average deviation of the
rankings. Please note, that an internet server, and probably also
a large national system like the Danish one, will probably
attract more weak players.
> For instance, oldtime FIBSters remember when 1800 really meant
> something :)
This has to do with inflation IMHO (the average rating rising),
not with the rating differences.

Well, I think you're both right here. Yes, the expected difference
between two players should (by and large) be equivalent regardless of
the number of players in the system, but it is perfectly normal to
find larger differences between the extremes with larger samples.
(After all, if you rate 10 randomly selected backgammon players,
chances are the best player of the 10 will be somewhat better than the
worst, but not by a huge amount. But if you rate every player in the
world, you'll be able to measure the difference between the world
champ and somebody who barely knows the rules, which we would expect
to be enormous.)
> The numbers you give for the internet servers are not really
> interesting - since the lowest ranked players are certainly
> (ill-programmed) bots. A more ionteresting number would be the average
> deviation of the rankings.
That's quite possibly true, although it depends what you mean by
"interesting" :-). A better metric for the "spread" of a distribution
would be the inter-quartile range (the difference between the 25th and
75th percentiles). If we are allowed to assume that the populations
we're sampling from are equivalent (e.g. that FIBS does not attract a
different type of player than those measured in the Norweigian
system), then the expected inter-quartile ranges between each rating
system ought to be the same. Of course this assumption is unlikely to
be reasonable in practice (FIBS attracts all kinds of players from the
casual to world-class professionals, whereas national ratings mostly
consist of regular tournament players; the tournament players are more
likely to be closely matched than the FIBS ones). But the important
thing about the inter-quartile range is that its expectation is
independent of sample size.
> For instance, oldtime FIBSters remember when 1800 really meant
> something :)
Well, I think it still means more or less the same thing, depending on
how you interpret it; a rating of 1800 today means you are in the top
6% (or so) of FIBS players. When there were only 300 players, that would
put you in the top 20; now that there are going on 7000, it's only enough
to make it into the top 400.
> This has to do with inflation IMHO (the average rating rising),
> not with the rating differences.
Actually I believe the effect of inflation is rather small compared to the
other factors. The median FIBS rating at the moment is only 1528, after
8 years of FIBS -- inflation of 3 or 4 points a year doesn't seem like much
to me!
To get back to the original question ("How can this effect be
quantified?"), there has surely been plenty of work on the expected
maxima of samples of various distributions. I'm at work at the moment
and don't have any references handy, so I cheated and made a quick
simulation which appears to show that the expected deviation of the
maximum of n samples from a normal distribution appears to grow
slightly less than proportionally with log(n). I plotted a graph of
this expectation and superimposed Daniel's data on it; I had to assume
that backgammon ratings are normally distributed with std. dev. 150
points. I have no idea whether this assumption is reasonable or not;
in practice the FIBS/GG standard deviations are likely to be higher
than the national ratings, because they include a wider variety of
players, as described above. (Daniel, do you still have your original
samples available? It might be interesting to compute the inter-quartile
ranges and standard deviations to see how much they vary between pools
of players.)
The graph is available in PostScript form at:
http://www.cs.arizona.edu/~gary/backgammon/spread.ps
for those interested.

> It's been mentioned in other discussions that average, not median
> rating, is a better indication of ratings inflation.
True -- I tried searching for the articles about inflation that had
been posted here in the past, but unfortunately now that we have only
a "precision buying service" instead of Deja News, things like that
aren't easy to find.
Luckily we still have Tom Keith's r.g.b. archive -- one relevant article
is:
http://www.bkgm.com/rgb/rgb.cgi?view+416
which does seem to indicate that a FIBS rating of 1800 has been reasonably
consistent at marking the 95th percentile in 1995, 1997 and 2000.
One other snippet -- Michael Klein's latest FIBS Ratings Report shows
the mean FIBS rating to be 1534, which is surprisingly close to the median.
Thanks for those data! (I believe that the "-69.50" figure in the FIBS
25%-ile should be "-98.14".)
A few random observations:
- The inter-quartile ranges of the online servers do seem to be
significantly higher than the national ratings (~210 vs. ~170), which
supports the hypothesis that the Internet servers attract a more
varied range of players than real-life tournaments.
The Danish range is much smaller than the others, though; I have no
idea why this would be the case (perhaps the results include a large
number of relatively new players? The other descriptions make it
sound as if they do or might exclude inexperienced players.)
- The Danish, Swedish and British medians show virtually no sign of
inflation. I suspect this may be because they "include all rated
players": the main cause of inflation is that weak players are more
likely to leave the system than strong players, and so weak ratings
are gradually deleted over time which effectively raises whatever is
left behind. The Norweigian ratings (which require at least 1 match
played in the last year) show comparable inflation to FIBS.
GamesGrid shows the most inflation of all. This might well be because
the financial cost increases the tendency of weak players to leave. I
understand that GG have added points to all players' ratings in the
past when a server crash lost the results of some games (I'm not sure
which is more disturbing -- that somebody thought this was a good idea,
or that users were apparently pacified by it!) which would certainly
add to this effect.
- The results show that the distributions tend to be skewed slightly to
the right (the upper quartile is larger than the lower quartile). One
explanation for this might be that weak players tend to improve faster
than strong players (hopefully nobody's getting significantly worse!)
which could shrink the left-hand tail somewhat.

The Danish rating list includes only current, paid-up members. Members
who neglect to renew their membership are dropped from the rankings.
Ditto for GamesGrid and NBgF and, I assume, for BIBA and SBgF. Not
only because seeing one's name in the ratings list is an incentive to
remain a member, but because (as is the case in Denmark) membership in
the national federation is mandatory for residents to participate in
Open or Intermediate flights of almost all tournaments.
But several factors do limit inflation in the national ratings. No one
can drop out and then rejoin under a different identify. No ever ever
gets his rating "re-set" to par. The system never awards all players X
points. At least in Denmark, everyone new to the system starts out at
par regardless of real or estimated ability. And my impression is that
in Denmark, for example, there's a small but steady outflow of
higher-ranked players every year, as people move or give up real life
play for whatever reason -- I imagine this effect isn't so notable on
the online servers. Nis mentions another reason -- unlike all the
online systems, the Danish system has no accelerated ratings boost for
low-experience players. I believe he's correct that the Norwegian
system has adopted the exact FIBS formula, including the "boost" for
players with less than 400 TMP.