The dimensionality and validity of student ratings of instruction : two meta-analyses

Abstract

Many colleges and universities have adopted student ratings of instruction as one (often the most influential) measure of instructional effectiveness. Although some researchers claim student rating forms are multidimensional, reliable, valid, and uncontaminated by biasing variables, other researchers and many instructors continue to express concerns that the validity of summative evaluations based on student ratings are threatened by inappropriate data collection, analysis, and interpretation. The most commonly used validation design for student ratings is the multisection validity design. Because this validation design has high internal validity and has been used extensively, with many student rating forms under diverse conditions, it provides the most generalizable evidence for the validity of student ratings. However, researchers using this paradigm have reported widely divergent validity coefficients. Meta-analysis is a useful method of both integrating the findings of a large number of studies and investigating the potential moderating effect of study features. Thus, I conducted two meta-analyses of the multisection validity literature. In the first meta-analysis, I addressed the question. "What is the structure of instructional effectiveness (as judged by students) across student rating forms?" I concluded that the forms (at least those used in the multisection validity studies) measure general instructional skill. General instructional skill is a composite of three correlated factors, delivering instruction, facilitating interactions, and evaluating learning. In the second meta-analysis, I addressed three questions. The first question was, Are there significant and practically important interactions between moderator variables and the factor structure of student ratings? The second question was, What is the overall validity of student ratings as measures of instructional effectiveness? The third question was, To what extent is the multisection validity literature consistent, and if it is not consistent, to what extent do study features explain the variability in reported validity coefficients? The results indicate that there are few interactions between study features and the factor structure of student ratings. They also indicate that there is a medium correlation (.33) between student ratings and student learning. However, methodological and publication features, quality of evaluation features, student rating form features, achievement measure features, and explanatory features (student, instructor, course and institutional) moderate the validity of student ratings. The presence of these moderators suggests that the student ratings should not be "overinterpreted"; that is, that only crude judgements of instructional effectiveness (exceptional, adequate and unacceptable) should be made on the basis of student ratings.