Thursday, May 14, 2009

Now, the whole point of high-stakes testing is to provide us with hard, quantitative assessments of how our kids are doing. You simply can't be a believer in this stuff and not care about whether the tests are meaningful from place to place and year to year. And yet, as Bob says, this issue gets only an occasional mention each year before being quickly dropped down the memory hole until another year's test results come out and someone happens to casually mention it again. It's almost enough to make you think that a lot of these folks are more interested in using tests as a political cudgel than they are in whether kids are actually learning something. Almost.

This is such a huge issue. If you look into the literature on the subject, you'll see a lot about "objective norming" and the like. It all sounds very statistical and scientific, and it certainly generates no end of PhDs in education.

But creating tests is very hard, even in isolation. When you freight them with conveying more information than they can easily, such as allowing comparisons across space and time, you create a near-insuperable problem. Think about it: How would you create two tests in anything that are sufficiently different so as not to allow cheating, but similar enough that two different groups of students will score proportional to their "true" levels of ability?

Even if you could create tests far better than seems possible, how do you know the populations are really the same? There is an assumption that a 2008 group of 7th-graders is roughly the same as a 2009 group, but you don't know that, and you have no way to tell whether an increase in scores comes from: 1) A group that happens to be smarter; 2) A test that turned out to be easier; or 3) An actual improvement in knowledge that comes from education. (Again, the testing bodies will assert that they do have ways to correct for the first two, but I have little confidence in the fourth-decimal point precision that they assert.)

Yet, every year, the results roll out, and they're used to threaten or close schools, discipline teachers and administrators, allocate funds, and as a symbol of American pride or shame. That's putting a whole lot of weight on something that has inherent statistical problems. It's a real shame we're so uncomfortable with plus/minus, that we demand a false accuracy that alters lives.