Assessing the Assessments

Submitted by Doug Lederman on November 5, 2009 - 3:00am

When the country's two major associations of public universities were trying to craft a new accountability system three years ago, they found that many of their member institutions (and especially their faculties) were deadset against the idea of choosing one measure of student learning outcomes.

"Their reaction was, we don't want a single test along the lines of No Child Left Behind -- we want multiple tests from which to choose," said David Shulenburger, vice president for academic affairs at the Association of Public and Land-Grant Universities, which designed the Voluntary System of Accountability along with its partner, the American Association of State Colleges and Universities.

In response, the groups settled on three possible options that institutions could use to fulfill the "student learning outcomes" portion of the VSA (the Council for Aid to Education's Collegiate Learning Assessment, the Educational Testing Service's Measure of Academic Proficiency and Progress, and ACT, Inc.'s Collegiate Assessment of Academic Proficiency), thereby avoiding the single test problem.

But it created another potential issue, Shulenburger says: uncertainty about whether the results on one test (chosen by one institution) would be the comparable to the results for another institution that chose a another of the three tests, and the possibility that institutions would try to game the system by seeking to use a test on which they thought they might perform better.

On Tuesday, the groups released a federally funded analysis[1] of a "test validity study"[2] conducted by the makers of the three tests showing that the three tests produced comparable outcomes at the institutional level, based on having been administered at a diverse range of 13 institutions, big and small, public and private.

In other words, a college that ranked in the 95th percentile for critical thinking using one of the tests would rank in roughly the same place using the critical thinking component of one of the other two tests, and vice versa.

The study, which was part of a larger $2.4 million grant financed by the Fund for the Improvement of Postsecondary Education and led by the Association of American Colleges and Universities, (link to 2007 story), doesn't necessarily mean that the tests measure exactly the same thing, given their differences, but that an institution will fare (or "rank") essentially the same no matter which measure they use.

The significance of that finding, in Shulenburger's view, is that "it means that within the VSA, we can offer some diversity in measurement" to satisfy faculty and other concerns about a one-size-fits-all approach "and still be able to say that we're using consistent measurement from school to school."

The study may have solved that political problem for VSA, but it did nothing to ease concerns among those (including some leading psychometricians and researchers) who question the accountability system's underlying dependence on tests that purport to measure student learning, and especially an institution's role in driving that improvement among its students.

"Even if the tests do measure the same thing, there is no evidence that they measure learning and, more specifically, learning that is the result of what the student has experienced in college," Victor Borden, associate vice president for university planning, institutional research, and accountability at Indiana University at Bloomington, said in an e-mail message.

While he acknowledged that the study released Tuesday was not intended to prove the tests' ability to measure student learning, and that the creators of the Voluntary System of Accountability cited previous validity studies in embracing the CLA, MAPP and CAAP three years ago, Borden is unpersuaded. "The research conducted to date does not demonstrate that these exams measure any aspect of college learning."

Shulenburger, in reply, said only that the public university groups had been satisfied by the existing evidence that the three tests can be used to measure the "value added" student learning that colleges and universities contribute.

That argument is unlikely to be settled for some time. But the release of the study on the three tests' comparability raises some other, more immediate issues.

By eliminating the tests' predictive powers as a reason for choosing one over another, since institutions would fare comparably whichever they chose, colleges can now focus on other factors in deciding which of the three exams to use, the researchers said. "[T]he decision about which measures to use will probably hinge on their acceptance by students, faculty, administrators and other policy makers. There also may be trade-offs in costs, ease of administration, and the utility of the different tests for other purposes, such as to support other campus activities and services," the VSA analysis says.

What that may mean for the three tests and their providers is unclear. Backers of the Collegiate Learning Assessment have been viewed (and resented) in some quarters for arguing, often none too subtly, that their test is better than the others at measuring value-added learning. With that advantage arguably wiped away by the comparability study, Steve Klein, director of research at the Council for Aid to Education, focused on what students might learn -- and what institutions might want them to gain -- from taking the CLA. Unlike the more standardized CAAP and MAPP tests, the CLA focuses on giving students problems to solve.

"The skills that you would need to do one are different from the skills you'd need to do the other," Klein said in a telephone interview Tuesday. "When we look at the mission statements of colleges, they emphasize the kinds of things we're testing. What message do you want to send to students and faculty about the skills you think are important? Is it about regurgitation, or the kinds of analysis you'd have to do to take a test like the CLA?"

But some testing experts speculated that the finding that the tests predict equivalently could hurt the CLA, which is significantly more time consuming, and somewhat more costly, for colleges to administer (though its protocols call for fewer students to be tested than do those of its competitors).

Jim Sconing, who directs ACT's statistical research department and represented it on the FIPSE study, said there was no doubt that "some colleges prefer the open-ended type of questions" contained in the CLA, because they "think it has more face validity with their faculty." But "other people are drawn to the fact that multiple choice tests tend to have higher reliability," Sconing said -- an assertion challenged by the FIPSE validity study, Klein said.

And many colleges, Sconing added, are increasingly likely to "base their choice of test on other things, such as ease of use and, yes, cost."

The validity test also could end up opening the way for more competition for all three of the tests that are already considered VSA-worthy, Shulenburger noted.

"We now have a benchmark for considering the addition of other measures of value added learning outcomes," he said. "If folks come up with other value added measures that correlate highly with one or two of these, then perhaps we have a candidate for adding other measures."