Reliability - separation - strata

(Separation) Reliability and Strata

These are reporting "reliably different". These are the opposite of inter-rater reliability statistics that are intended to report "reliably the same."

The reported "Separation" Reliability is the Rasch equivalent of the KR-20 or Cronbach Alpha "test reliability" statistic, i.e., the ratio of "True variance" to "Observed variance" for the elements of the facet. This shows how reproducible is the ordering of the measures. This may or may not indicate how "good" the test is in other respects. High (near 1.0) person and item reliabilities are preferred. This "separation" reliability is somewhat the opposite of an interrater reliability, so low (near 0.0) judge and rater separation reliabilities are preferred.

Since the "true" variance of a sample can never be known, but only approximated, the "true" reliability can also only be approximated. All reported reliabilities, such as KR-20, Cronbach Alpha, and the Separation Reliability etc. are only approximations. These approximations are all attempts to compute:

"Separation" Reliability = True Variance / Observed Variance

Facets computes upper and lower boundary values for the region in which the true reliability lies. When SE=Model, the upper boundary, the "Model" reliability, is computed on the basis that all unexpectedness in the data is Rasch-predicted randomness.

When SE=Real, The lower boundary, the "Real" reliability is computed on the basis that all unexpectedness in the data contradicts the Rasch model. The unknowable True reliability generally lies somewhere between these two. As contradictory sources of noise are remove from the data, the reported Model and Real reliabilities become closer, and the True Reliability approaches the Model Reliability.

The "model" reliability is based on the model standard errors, which are computed on the basis that all superfluous unexpectedness in the data is the randomness predicted by the Rasch model.

The "real" reliability is based on the hypothesis that superfluous randomness in the data contradicts the Rasch model:

Real S.E. = Model S.E. * sqrt(Max(INFIT MnSq, 1))

Conventionally, only a Person Reliability is reported and called the "test reliability". Facets reports separation reliabilities for all facets. Separation reliability is estimated based on the premise that the elements are locally independent. Specifically that raters are acting as "independent experts", not as "scoring machines". But when the raters act as "scoring machines", then Facets overestimates reliability. It would be the same as running MCQ bubble sheets twice through an optical scanner, so doubling the amount of "items" per person, and then claiming that we had increased test reliability! To assist in identifying this situation, Facets reports to what extent the raters are acting as "independent experts", as aspect of inter-rater reliability, see Table 7 Agreement Statistics.

Separation = True S.D. / Average measurement error

This estimates the number of statistically distinguishable levels of performance in a normally distributed sample with the same "true S.D." as the empirical sample, when the tails of the normal distribution are modeled as due to measurement error. www.rasch.org/rmt/rmt94n.htm

Strata = (4*Separation + 1)/3

This estimates the number of statistically distinguishable levels of performance in a normally distributed sample with the same "true S.D." as the empirical sample, when the tails of the normal distribution are modeled as extreme "true" levels of performance. www.rasch.org/rmt/rmt163f.htm

So, is sample separation is 2, then strata are (4*2+1)/3 = 3.

Separation = 2: The test is able to statistically distinguish between high and low performers.

Strata = 3: The test is able to statistically distinguish between very high, middle and very low performers.

Strata vs. Separation: this depends on the nature of the measure distribution.

Statistically:

If it is hypothesized to be normal, then separation.

If it is hypothesized to be heavy-tailed, then strata.

Substantively:

If very high and very low scores are probably due to accidental circumstances, then separation.

If very high and very low scores are probably due to very high and very low abilities, then strata.

If in doubt, assume that outliers are accidental, and use separation.

Help for Facets Rasch Measurement Software: www.winsteps.comAuthor: John Michael Linacre.