The first attempt at a mathematical rating system for
chess has been credited to the Correspondence Chess League of America in
1939. With a world war intervening, the influential Ingo System of West Germany followed in
1948, named by its originator Anton Hoesslinger (1875-1959) for his home
town, Ingolstadt in Bavaria [HW, "rating"]. It establishes the basics of ratings
with a simple formula,

[1.1]
R = ERc - (Pct - 50) ,

where ERc is the arithmetic average of the opposition ratings and
Pct is the player's score in percentage points. (A
peculiarity here, from the standpoint of subsequent systems, is that
lower ratings represent greater playing strength.) The Ingo formula
represents a notable advance in the rating of chess players, though it
appears to have sprung primarily from Hoesslinger's intuition. The formal development of rating theory
began about 1960 with the introduction of probability formulas.
The originator of this idea was Arpad E. Elo (1903-1992), one of the founders of the United States Chess Federation (USCF), whose system was adopted
in 1970 by the International Chess Federation (FIDE). The paradigm
shift
proved irresistible to mathematicians.
About the same time that Elo was developing his system, similar ideas
were afloat in Australia [E2, Part 1].

The new formal theory of ratings was based on an implicit measurement of
playing strength, necessarily implicit since there
is no clear measurement of playing strength beyond the obvious facts
of winning, losing, and drawing. The measurement would
become explicit with the development of ratings, just as the notion of
gravity had become explicit with Newton's formulas. As
a professor of physics, Elo would no doubt have found this analogy
appealing. It will be the burden of this treatise to show that
ratings are measurements of playing strength in a figurative sense only. The simple fact is that ratings are statistics.
The information they convey is based solely on the data provided by pairings and outcomes.
To imagine that
they represent some other dimension of playing strength, if only hypothetically, is to invite premature speculations
about probability distributions, leading by a circular
route to arguments for probability treatments based on the same distributions.

On the strength of probability theory Elo judged the Ingo and similar systems to be
deficient because they were unwittingly based on a rectangular (uniform) distribution as a consequence of their linear
formulas.
The implication is that every rating system is based on a probability
distribution and that the accuracy of a system is to be judged by the
suitability of this distribution.
Elo offered two complete
systems, one based on the normal curve, another on the logistic. Apologists are quick to point out that there is little practical difference between the two systems,
although the proposal of alternatives seems problematic by Elo's own
standard. On the view that ratings are statistics we can hardly call any rating system invalid. We shall have
occasion to call the Elo System cumbersome and not entirely coherent,
but a judgment of invalidity would admit the mistaken standard it
adopts.

By analogy with the commonly accepted scales of measurement, Elo distinguished three types of rating systems:
ordinal, interval, and ratio. Since game scores
in aggregate
lend themselves to these scales of measurement, the classification is convenient for describing the
various statistical methods that arise from rating theory and will be utilized in the following
pages. But first the issue of probability will be revisited.