The Length of a Logit

In fitting data to the Rasch model in order to use them to
establish measurement, our aim is to construct a system of
invariant linear measures, to estimate their precision (standard
errors) and to assess the degree to which these measures and their
errors are confirmed in the data (accuracy, i.e., fit statistics). This is quite
different from that "assignment of numbers to observed phenomena"
so often cited as a definition of measurement in the social
sciences. When "measurement" is by mere "assignment", the
resulting numerical labels do not maintain their implied
arithmetical meaning - the meaning necessary to use them to
calculate differences, means, variances or regressions. This
unnecessary numerical ambiguity causes social scientists a great
deal of uncertainty and confusion.

A more useful aim for the construction of measurement is to
enable the same kind of quantitative reasoning for the social
sciences that has been so productive in the evolution of the
physical sciences - namely, the careful construction and
maintenance of invariant linear measures. Physicists do not dwell
on linearity because they take for granted that their instruments
maintain it. Everyone expects that "labelling" a mountain as
"20,000 feet high" carries with it the well-known and universal
measurement properties of length. In contrast, "labelling" a
student as "3.0 in grade point average" carries with it no more
than an ordinal classification of some grade of "B" in some
particular context. This is clearly not a linear measure, and most
certainly not invariant over teachers, let alone schools.

Physical units are defined prior to the current experiment, and
then carefully implemented in the design of the instruments used to
make the observations, such as yardsticks for measuring length.
The resulting experimental observations are recorded as counts of
well-defined, carefully maintained, and entirely artificial
measurement units (such as inches) from arbitrary origins (such as
one end of a yardstick). Since a different measurement unit, such
as centimeters, produces a different count of length, even the
count is an abstraction.

Physicists are unequivocal as to the quantitative length of an
inch, but its substantive implication, its qualitative meaning,
depends on the context: one inch added to the height of a mole-hill
has different meaning than one inch added to the height of a
mountain.

Rasch measurement can lead to the same kind of arithmetical
numbers that physicists reason with. But to do this, we must
address explicitly the first step in the construction of
measurement - a step no longer explicit in most physical
measures.

The initial experimental observations in the construction of any
measurement system are counts of the occurrence of observable
events, such as the number of correct responses achieved on a test.
Once the indicative events have been defined, these counts are
based on concrete observations. The only way to change one of
these counts is to change the experiment, say, by dropping one item
from the test and then recounting the correct responses.

The mathematical unit of Rasch measurement, the log-odds unit or
"logit", is defined prior to the experiment. One logit is the
distance along the line of the variable that increases the odds of
observing the event specified in the measurement model by a factor
of 2.718.., the value of "e", the base of "natural" or Napierian
logarithms used for the calculation of "log-" odds. All logits are
the same length with respect to this change in the odds of
observing the indicative event.

As with an inch, the substantive length of a logit, i.e., what
a logit means in terms of the composition of the underlying
variable in any particular application, is not pre-determined.
When benchmark elements are chosen to give meaning to a variable,
the number of logits estimated between a pair of benchmarks depends
on the particular distribution of counts obtained in the current
experiment. The substantive length of the logit depends not only
on its numerical value, but also on the conceptual distance between
the benchmark elements. If a second experiment should lead to a
different distribution of counts, then the number of logits between
the pair of benchmarks will become different, even though their
conceptual distance might remain unaltered. As a result, it is
useful to represent the results of the measurement process in terms
of a linear transformation of the initial logits which preserves
the conceptual structure of the measurement system - the
differences between benchmarks. Considerations along these lines
are explained and applied by Wright & Stone (1979, Chap.8).

In order to expedite the realization that the substantive length
of a logit is affected by the distribution of the observations,
consider a judging situation. The more discriminating the judges,
the more precisely and consistently will they assign ratings to
performances, and the more peaked will be the distribution of the
ratings given by each judge to each level of performance. The more
peaked the distribution of observations, the larger the number of
logits between levels of performance. This occurs irrespective of
the ability of the persons, difficulty of the items, severity of
the judges or construction of the rating scale.

The manner in which a particular rating scale works affects the
distribution of responses across the categories, and so also
affects the substantive length of the logit. The rating scale is
part of the instrumentation of the test. Changing the form of the
rating scale changes the experiment and changes the substantive
length of the logit. If observations of persons are made on a
three category scale, a particular set of logit measures will be
estimated. Then, if the top two categories are combined into one
category, making the test dichotomous, another set of logit
measures will be estimated. The relative utility of these
alternative sets of measures will depend on the fit and separation
statistics they produce and on the meaning and purpose of the test.
In general, the standard deviations of the two sets of measures for
the same persons will differ, showing that, since it is not useful
to think that the person abilities have changed, we must rescale
our measurement units accordingly.

This realization alerts us to a step we must take which precedes
those we see physicists taking. Since logit measures are estimated
from the counts observed in the current experiment, the meaning of
a logit in terms of the underlying variable need not be invariant
between experiments. Inches are implemented so that every inch has
the same length. Every logit has the same mathematical length in
terms of log-odds, but not necessarily the same substantive length
in terms of what it implies about the distances between the
defining benchmarks of the underlying variable. Any linear
transformation of the logit maintains its equal-interval status.
Often the strict probabilistic interpretation of measurement units
is of only incidental interest, particularly for rating scale data.
Then the analyst is free to choose the most useful linear rescaling
of the logit. A convenient transformation can be to rescale the
lowest observable person measure to 0, and the highest to 100, so
that reported measures can be interpreted as a kind of "percentage"
progress up the effective range of the measurement instrument.

Since our intention is to maintain, by choice of item and
response format, units of equal substantive length for tests
constructed to measure what we intend to be the same underlying
variable, the comparison of measures from tests intended to be
commensurate requires, through an equating step, not only
adjustment for differences in local origin, but also for variation
in the substantive length of the measurement unit we have
constructed for the underlying variable.

Equating of the interval scales constructed from two tests is
confirmed when plots of the measures of elements common to the
tests follow an identity line stochastically. When this
verification fails, a necessary step is to linearly adjust the
relative lengths of the logits constructed by the two tests (and
intended to be based on the same underlying variable) by the ratio
of the observed standard deviations of the measures common to those
tests, so that both tests measure in the units with the same substantive meaning.

Go to Institute for Objective Measurement Home Page.
The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.