Date of Completion

Embargo Period

Keywords

Major Advisor

Co-Major Advisor

Associate Advisor

Christopher Rhoads

Associate Advisor

D. Betsy McCoach

Associate Advisor

Gilbert Andrada

Field of Study

Educational Psychology

Degree

Doctor of Philosophy

Open Access

Campus Access

Abstract

Evidence of student growth is a primary outcome of interest for educational accountability systems. When three or more years of student test data are available, questions around how students grow and what their predicted growth is can be answered. Given that test scores contain measurement error, this error should be considered in growth and prediction models. As Fuller (1987) and other studies have indicated, ignoring or misspecifying measurement errors can result in attenuation bias in parameter estimates, reduced power for testing hypotheses, and reduced accuracy of prediction. This study addresses these concerns, with a special focus on prediction accuracy.

The purpose of this study is to perform a comprehensive investigation of the impact of test score measurement errors on growth prediction models. The primary research questions of this study are: (1) Does considering test score measurement error improve prediction of student growth and/or reduce the standard error of prediction in different regions of the proficiency continuum? and (2) Which of the procedures investigated is most effective in accounting for measurement error in the prediction of student growth in different regions of the proficiency continuum?

This study was conducted under a full Bayesian framework. Two structural models for growth prediction were considered: a linear growth (LG) model and a two-cohort linear regression (LR) model. In addition, three measurement error models were investigated: correcting for test score unreliability; incorporating individual test score error variances; and modeling item-level responses directly.

Data were generated to resemble response data from Smarter Balanced Assessment Consortium (SBAC, 2016) assessments. These are fixed-length computerized adaptive tests that provide vertically scaled scores. A characteristic of the SBAC assessments relevant to this study is that they are based on difficult item pools, resulting in much higher measurement errors for lower performing students.

Results showed that the LR model correcting for score unreliability (LRreli) and the LR model with item-level responses (LRsirt) provided the most accurate predictions among models. The LG model incorporating individual score error variances (LGtsme) and the LG model with item-level responses (LGsirt) improved the slope estimates when the item pool was not appropriate for lower proficiency students.