Multiple measures in multiple venues

Paul Teske is Dean and University of Colorado Distinguished Professor at the School of Public Affairs at the University of Colorado Denver. These views represent the personal opinions of the author and may not reflect the position of the University of Colorado Denver or the University of Colorado system.

Several recent intersecting conversations lead me to this post: The North “credit recovery” issue, increasing discussions about using performance funding for Colorado higher ed and/or K12, evaluations of ProComp and other teacher incentive pay programs and Alex Oom’s valuable recent post.

If we want to incentivize or reward educational performance in some form (and we do), we need to pay careful attention to how we do that. Nearly any output or outcome measure can potentially be “gamed” or cheated. We see this with No Child Left Behind, where state tests are the key to school evaluation. As a result, states have produced considerable improvement on those tests, while not showing much improvement on NAEP, the national test that was not “dumbed down” to show greater proficiency of students.

Can there be too many multiple measures in evaluating schools, teachers, higher ed institutions? Probably.

It is also true that no single measure comes near being perfect. In addition to cheating or gaming, reliance upon a single measure (and test scores are the one that most of us would lean towards), makes the assumption that this measure is capturing appropriately what we want to capture. Currently, for state tests like CSAP, this is not the case, and we clearly need to find more, better tests.

In some ways, this is an obvious point – who can oppose multiple measures of evaluation?

Well, some do. They argue that multiple measures make accountability more diffuse and difficult, as teachers or schools or districts can usually point to some improvement in something, and that is more likely with the more measures you have. ProComp has been criticized, in this space, for having too many measures by which teacher pay can be influenced.

So we see recently that high school graduation is potentially open to some manipulation, via easier-than-appropriate “credit recovery” programs, and Ooms shows that in DPS proficiency and graduation are not necessarily closely linked. That can seemingly reduce the value of a diploma.

But this measure, like many others, needs more careful scrutiny. High school graduates are much more successful in life than those who do not graduate – and doing so more or less on-time, rather than X years later via a GED program, is much more valuable (indeed, Nobel Prize winner James Heckman, known in education for his work on the value of quality pre-K education, has shown that GED graduates get almost zero economic value from earning the GED). So on-time graduation is important – and even if it might be “dumbed down” in some circumstances, it would also be a mistake to write it off as a key measure of a school’s success.

That is because part of the value of education is “signaling” to potential employers that you are reliable, hard-working, punctual, and yes somewhat smart (see this week’s interesting New Yorker article on the debate between signaling and real learning in the higher ed context).

Can there be too many multiple measures in evaluating schools, teachers, higher ed institutions? Probably. Some have criticized draft teacher effectiveness evaluation rules for looking at 26 different metrics – that might be too many. But on the other hand, one is almost certainly too few, unless we have great confidence in the reliability, validity, and integrity of that measure. In most educational context, we aren’t at that point, at least not yet.