Unidimensionality Matters!(A Tale of Two Smiths?)

Editor's note: There are indications that the random-number generator malfunctioned during these simulations.
Please do not rely on these findings without verifying them for your own situation. See also
Rasch First or Factor First?

Introduction

The Rasch measurement model is a unidimensional measurement model and this attribute has been the subject of much discussion in the Transactions (Stahl J 1991; Wright BD 1994; Linacre JM 1994; Fisher WP 2005). In an early article Wright and Linacre tell us that "whether a particular set of data can be used to initiate or to continue a unidimensional measuring system is an empirical question (Wright BD, Linacre JM 1989). The only way it can be addressed, they argue, is to
1) analyze the relevant data according to a unidimensional measurement model,
2) find out how well and in what parts these data do conform to our intentions to measure and,
3) study carefully those parts of the data which do not conform, and hence cannot be used for measuring, to see if we can learn from them how to improve our observations and so better achieve our intentions'. (MESA Memo 44, reprinted from Wright BD, Linacre JM 1989).
Smith uses simulation to investigate which technique is better at discovering dimensionality (Smith RM, 1996). A review of these findings in RMT (9:4, 1996) argues that the conclusions are simple. 'When the data are dominated equally by uncorrelated factors, use factor analysis. When they are dominated by highly correlated factors, use Rasch. If one factor dominates, use Rasch."

Table 1. Details of Datasets

Dataset

Structure

Contents

1

Unidimensional

20 items.

2

Two orthogonal dimensions (r

10 items in each dimension. Items generated in difficulty order (1=easiest, 20=hardest) . Interlaced items with item 1 assigned to dimension1, item2 assigned to dimension 2 to ensure equal difficulty for each dimension

In Rasch analysis the understanding and detection of unidimensionality in the context of medical and psychological studies has developed and changed much in the past 15 years. Early published articles subscribed to the notion that fit to the model supported the unidimensionality of the scale and little else was done to confirm that assumption (Tennant A, et. al 1996). In the 1990's Wright had put forward a Unidimensionality Index (Wright BD, 1994), and gradually greater emphasis was placed on analysis of the residuals and particularly a Principal Component Analysis (PCA) of the residuals to detect second factors after the 'Rasch Factor' was removed. Originally interpretation of this was difficult as the proportion of variance attributable to the first residual factor was reported, but the total variation in the data was unknown. Subsequently Winsteps (Linacre JM, 2006) has incorporated the total variation into its reporting, so the magnitude of the first residual factor against the Rasch factor can be determined. In 2002 Smith reported an independent t-test approach to testing for unidimensionality (Smith EV, 2002, JAM) which is being incorporated into the latest RUMM2020 software (Andrich, D., Lyne A, Sheridan B., Luo G, 2003). Elsewhere, others have used classical factor analytical approaches to testing for unidimensionality prior to fitting data to the Rasch model (Bjorner JB, Kosinski M, Ware JE Jr, 2003).

A review of the literature suggests that there are three main approaches to assessing dimensionality:

a) prior testing using classical approaches, such as factor analysis;

b) those which hold to the assumption of fit equals unidimensionality - a fit only approach;

c) those which involve post-hoc testing, having undertaken the Rasch analysis and supposing fit to the Rasch model (e.g., PCA of the residuals).

Thus it is possible to conceive of a broad selection of tests which may be undertaken for any given data set. For the everyday user of Rasch software working in the health and social sciences, how can they be sure that they are truly dealing with a unidimensional construct? How far do these various tests detect multidimensionality in the data?

Methods

The aim of this present study is to contrast commonly used techniques from each of the three main approaches identified above by applying them to a set of simulated datasets with known dimensionality characteristics. Each data set is based upon 20 polytomous items with 5 response options (0-4) and 400 cases. Details of the datasets are outlined in Table 1. A series of analyses were conducted on each of the 6 data files to assess dimensionality (Table 2). SPSS Version 14.0 was used to conduct factor analysis, and both Winsteps and RUMM2020 were used to conduct Rasch analysis. The data were simulated using SIMMsDepend (Marais I,2006).

We have chosen procedures from SPSS because it is widely available and easy to use. Principal components analysis (PCA) was used to extract the factors followed by oblique rotation of factors using Oblimin rotation (delta = 0). Kaiser's criterion, which retains eigenvalues above 1, was used in Procedure 1.1 to guide the identification of relevant factors. In Procedure 1.2 Horn's parallel analysis (Horn JL, 1965), which has been identified as one of the most accurate approaches to estimating the number of components (Zwick & Velicer, 1986), was used. The size of eigenvalues obtained from PCA are compared with those obtained from a randomly generated data set of the same size. Only factors with eigenvalues exceeding the values obtained from the corresponding random data set are retained for further investigation. Parallel analysis was conducted using the software developed by Watkins (2000). Analyses were also conducted using a non-linear Factor Analysis (HOMALS) available in SPSS. Using curve estimation and a quadratic function, the values exported from the HOMALS procedure can be tested to determine the number of dimensions in the data.

For the Rasch procedures we set both Winsteps and RUMM2020 to have identical convergence criteria. As none of the data sets satisfied the assumptions of the rating scale model, we use the unrestricted (partial credit) polytomous model. A number of different fit statistics are reported. OUTFIT ZSTD in Winsteps and Residuals in RUMM are equivalent, with any variation reflecting the difference in the underlying estimation procedures. We use the value 2.5 and above for both ( ~ 99% significance) to determine misfit to model expectation. Usually the two statistics provide similar magnitudes of fit to the model.

INFIT and OUTFIT MNSQ (Winsteps) are also reported with acceptable ranges of 0.9-1.1 and 0.7-1.3 respectively, following Smith's recommendations for sample size adjustment (Smith RM et al, 1998). RUMM Chi-Square probabilities are also reported, Bonferroni adjusted to 0.0025 and unadjusted. We also report the RUMM Chi Square Interaction Fit Statistic which is a summary fit statistic and widely used to indicate overall fit to the model. We also report Wright's Unidimensionality Index which is the person separation using model standard errors, divided by the person separation using real (misfit inflated) standard errors (Wright BD, 1994). A value above 0.9 is indicative of unidimensionality; 0.5 and below of multidimensionality and everything between is the usual grey area of uncertainty!

We report the usual Principal Component Analysis (PCA) of the residuals, including the percentage of variance attributable to the Rasch factor and the first residual factor (usually identical in Winsteps and RUMM), and the percentage of variance attributable to the first residual factor out of total variance (Winsteps).

Finally, we report on a comparison of person estimates based upon subsets of items. In practice where there is a conceptual basis for multidimensionality estimates are made from the a-priori dimensions. In the present case with this simulated data, we use the item loadings on the first factor of the PCA of the residuals. Person estimates derived from the highest positive set of items (correlated at 0.3 and above with the component) are contrasted against those derived from the highest negative set. A series of independent t-tests are undertaken to compare the estimates for each person and the percentage of tests outside the range ±1.96 is computed, which follows Everett Smith's general approach (Smith EV, 2002). A Binomial Proportions Confidence Interval can be calculated for this percentage. The Binomial CI should overlap 5% for a non-significant test. The results of these analyses are reported in the Table 3.

The default factor analysis (1.1) failed to identify the single dimension, instead, identifying two 'difficulty' dimensions. The HOMALS procedure failed to detect the situation (specified in Set 4) where only four items belonged to a second dimension, and consistently failed where the correlation between factors was ~ 0.7. The Rasch model fit statistics performed poorly where dimensions were interlaced and where the correlation between factors was ~ 0.7. Wright's Unidimensionality Index appeared insensitive to multidimensionality. Little can be gleaned from the percentage of variance attributable to the Rasch factor, as this seems consistently high, irrespective of the underlying dimensionality. In Set 1 the percentage of variance attributable to the first residual factor was substantially lower than in other sets, but the percentage of variance out of the total variance was low, except for the orthogonal data sets. The independent t-test approach consistently identified the unidimensional and multidimensional data sets.

These results have a number of implications for everyday practice of Rasch analysis. In the construction of a new polytomous scale where the intention is to create a unidimensional construct, Rasch fit statistics may mislead if there are two dimensions where the items are interlaced in difficulty. Supporting Richard Smith's (1996) recommendation, exploratory factor analysis should be undertaken at the outset to make sure that dimensionality is not going to be a problem, or to identify which items may be problematic so as to inform the iterative Rasch analysis procedure. As we cannot know in advance whether or not two interlaced dimensions may exist, this analysis should be undertaken as a matter of routine. The simplest way to undertake this is with the default factor analysis procedure using the parallel analysis to determine the number of significant eigenvalues.

Although the PCA of the residuals may give clues to multidimensionality in the data, their interpretation is not straightforward. The percent of variance of the first residual factor (out of total variance in the residuals) does show a clear increasing trend from the unidimensional data, through the correlated factors to the orthogonal factors. However, at what point does this figure shift from a unidimensional indicator to a multidimensional indicator?

The individual t-test approach proposed by Everett Smith seems the most robust in that it clearly identifies dimensionality. This test has importance not just for the interpretation of unidimensionality, but also the meaning of multidimensionality in the data. Note that the proportion of t-tests outside the range is high across Sets 2-6, even when the factors are correlated at ~ 0.7. In practice this means that person estimates differ by between 1 to 2 logits, depending upon which set of items are being used for that estimate. This variability in person estimate is unsustainable when scales are to be used for individual clinical use, for example where cut points are often used to determine clinical pathology. The variability of person estimates where multidimensionality exists also raises fundamental questions about Computer Adaptive Testing approaches which rely upon estimates based upon just a few variables. Clearly, only the strictest form of unidimensionality must be used to avoid significantly different person estimates driven by multidimensionality.

The analysis we have undertaken is only at the simplest level, reflecting what is most likely to be used in everyday research practice in the health and social sciences. We have, for example, not used Monte Carlo simulation or other methods to look at ranges of variance explained. Neither have we looked at different sample sizes or different test lengths. We have not addressed dichotomous items, which bring their own set of problems to factor analysis. Nevertheless, we believe that this simple analysis has shown that great care needs to be taken in confirming the assumption of unidimensionality of data when fitted to the Rasch model. Perhaps others may pursue some of the issues we have omitted.

Conclusion

When developing new polytomous scales, an exploratory factor analysis used a priori, with parallel analysis to indicate significant eigenvalues, should give early indications of any dimensionality issues prior to exporting data to Winsteps or RUMM [Editor: but see also Rasch Analysis First or Factor Analysis First?]. This should identify the situation of equal number of items on two factors which will not be detected by the Rasch analysis fit statistics and where the PCA of the residuals may be indeterminate. After fit of data to the Rasch model, careful examination of the PCA of the residuals should provide clues to any remaining multidimensionality. Comparison of person estimates derived from these subsets of items, using the independent t-test approach, should confirm or reject the unidimensionality of the scale.

Alan Tennant PhD,Academic Unit of Musculoskeletal & Rehabilitation Medicine, Faculty of Medicine and Health, The University of Leeds, UK.
Julie F. Pallant PhD, Faculty of Life and Social Sciences, Swinburne University of Technology, Hawthorn, Victoria 3122, Australia

Table 3. Summary of Results of Analyses

Test

Dataset:

1

2

3

4

5

6

Prior Tests - Number of Factors

1.1

EFA with eigenvalue>1. (% Variance 2nd factor)

2(6%)

2(30%)

2(31%)

2(14%)

2(63%)

2(63%)

1.2

EFA with parallel analysis

1

2

2

2

2

2

1.3

HOMALS - number of factors

1

3

2

1

1

1

Rasch Fit

2.1

% OUTFIT ZSTD out of range

0

0

0

100

5

0

2.2

% Residuals outside range

0

0

0

85

0

0

2.3

% INFIT MNSQ out of range

5

0

5

100

20

15

2.4

% OUTFIT MNSQ out of range

0

0

0

60

0

0

2.5

% Chi-Square significant

0

5

70

100

0

0

2.6

% Chi-Square significant (Bonferroni adjusted)

0

0

35

70

0

0

2.7

Item-Trait Interaction Fit statistic

0.74

0.09

0.00

0.00

0.97

0.12

2.8

Wright's Unidimensional Index

1.08

1.11

1.11

1.12

1.07

1.08

2.9

Person Separation Index (= Rasch Reliability) ~ a

0.91

0.88

0.89

0.93

0.95

0.95

2.10

(Real) Person Separation

3.12

2.44

2.56

3.59

4.04

4.09

Post Hoc tests

3.1

% variance attributable to the Rasch factor.

82.0

70.0

70.6

76.9

85.2

84.8

3.2

% variance attributable to first residual factor

7.4

48.8

47.5

25.4

26.3

23.8

3.3

% variance attributable to first residual factor out of total variance

Go to Institute for Objective Measurement Home Page.
The Rasch Measurement SIG (AERA) thanks the Institute for Objective Measurement for inviting the publication of Rasch Measurement Transactions on the Institute's website, www.rasch.org.