What Test Scores Don't Tell Us: The Naked Emperor

Recently the Times carried a story saying that parents in New York City, fed up with the blanket of tests with which schools are smothering their children, are keeping their kids home on test days. This raises, yet again, the question of whether the value of standardized tests outweighs the burden they place on children, teachers, and to some extent, families.

Most of the public debate so far has focused on whether the tests are being used in unfair ways. Relatively few have publicly questioned an underlying and crucial assumption- that the tests measure something meaningful, or predict something significant, beyond themselves. I have just finished reviewing over 200 studies of K-12 standardized tests. What I have discovered is startling- most tests used to evaluate students, teachers, and school districts predict almost nothing except similar scores on subsequent tests. I have found virtually no research demonstrating a relationship between those tests and measures of thinking on the one hand, or life outcomes on the other. To grasp what we do and do not (yet) know about standardized tests, it’s worth considering a few essential puzzles: why we find individual differences in test scores (why one child does better or worse than others), what makes a child’s test scores go up, and what such improvement could possibly indicate.

Most researchers agree that several non-school factors have a big impact on children’s performance on academic tests. These non-school factors help explain why, overall, most children’s test scores are fairly stable. Children in poverty do less well, all other things being equal, than children from families with adequate incomes. Children who don’t hear much language at home are at an academic disadvantage, which is manifested, among other places, in their test performance. Children whose parents read a lot do better than children whose parents don’t read. And these factors all tend to be bundled together- middle class children are more likely to have educated parents, and hear more language at home than children who grow up in poverty. In other words, some children have a lot of educational advantage compared to others, and this is reflected in their test scores. If you hold all of these non-school features of the environment steady- for instance, by comparing only children who come from the same economic background, some children will still do better than others. The remaining difference between children is, to some extent, a function of underlying intelligence. Both these influences- home environment and intelligence, are quite stable, which helps explain why children who get a higher than average test score in third grade are likely to get a higher than average test score in 9th grade.

But most of us believe that intelligence and family background do not seal a child’s fate. We believe that children can learn something in school which gives them knowledge and skills above and beyond what they can get on their own. Furthermore, the current faith in testing suggests that we believe that test scores are a good measure of whether children are learning something valuable at school. As we have seen in the news, some classrooms (or even whole schools) have succeeded in boosting children’s scores beyond what was predicted by their earlier scores. When scores go up, assuming no cheating is involved, people tend to think it means that a specific teacher or educational practice has helped children to know more and think better than they otherwise would have.

But do we have evidence of this? I haven’t seen any. To show that improved test scores actually indicate a more knowledgeable and skilled child, we need at least three kinds of evidence.

First, we need evidence that when a child scores better than she has in the past, her knowledge or skills extend beyond the specific items on the test. So far, the evidence has not shown this. In most states where scores have given a different type of test, the same children don’t show similar improvement. As one school principal I know said to me, “One of my teachers reported that her students had particular trouble with questions that involved reading a menu. Her solution was to include menu items in the weeks of school work leading up to the test”. Needless to say familiarity with menus was not the real problem. What those children needed was not more time practicing menu questions, but instead, more skills reading unfamiliar material, understanding a new domain by reading about it, and how to navigate new literary formats. Her students may well have improved on the next round of tests, but that wouldn’t necessarily mean they had actually become better readers. Here’s another way to think of this issue. A child’s temperature is a pretty good measure of the absence or presence of the flu. It tells you that a child is sick, and/or it predicts that that the kid with the fever will feel bad within hours. If you give aspirin to someone with a fever their temperature will go down, but you won’t actually do anything to change the virus within them. There are ways to raise a child’s test score that do little more than giving a person with a fever aspirin.

Second, it would be good to know that when children’s test scores improve, their academic performance in non-test settings also improves. In other words, we’d need evidence that the teacher whose students regularly get better scores than predicted by their earlier scores are also become better thinkers and learners more generally. For instance, we’d need to see that children who test better than we expected them to on reading comprehension items also choose more complex books, use books in a more sophisticated way to form opinions, and speak in more literate and authoritative ways. There are virtually no data to show this.

Third, even in the absence of these two kinds of research, it would be good to know that improving a child’s test score actually improved their life outcome. Research has established that good test scores can cause good things to happen- a good score might qualify a student for an enriched academic opportunity, or a scholarship at the state university (as it does in Massachusetts). For instance, if the children in, say, Ms. Good’s fourth grade class showed more improvement from 3rd grade than the students in Ms. Bad’s fourth grade class, it would be useful to know if, 15 years later, Ms. Good’s students had better jobs, did better at their jobs, found more life satisfaction, and were more conscientious voters than children in Ms. Bad’s class. It would be equally important to show that the children in Ms. Good’s class had better life outcomes than the students in Mr. Alsogood’s class, where children’s scores didn’t go up, but other good things were happening (for instance children were engaged, working hard, and reading a lot). That is, whatever a teacher does to improve student’s scores should also predict (not just cause) a better chance at a good life.

Until we have more data showing that improving test scores actually teaches students to think well, or that an improved test score predicts better life outcomes, we’re all willfully looking away from the Emperor’s nakedness.

While we try to come up with measures that tell us something about individual children, their teachers, or our schools, we’re better off using no tests than ones which have unintended bad effects, and haven’t yet been shown to measure anything meaningful.

What about the famous Chetty et al study, that was featured on the front page of the NYT, that supposedly showed kids assigned to teachers who raised their test scores more later had significantly higher incomes? This study has been used widely to support value-added evaluations of teachers, and policies that would weaken tenure etc.

Readers who want more information about the limitations and flaws of standardized tests, the damage caused by high-stakes testing, alternatives to testing, and activism to reduce testing (such as opting out) might be interested in our material at http://fairtest.org, such as fact sheets, getting involved in changing the situation, etc.

I have also noted that children's scores are often stable. And that test prep accounts for only small changes. A child was provided heavy-duty test prep both in class and for homework for nearly 3 months in a Chicago Public school in 8th grade last year. Her scores went from 97 in Reading / 99 in Math to 98 in Reading / 98 in Math. And she hated the time wasted on prep.

I have also noted that children's scores are often stable. And that test prep accounts for only small changes. A child was provided heavy-duty test prep both in class and for homework for nearly 3 months in a Chicago Public school in 8th grade last year. Her scores went from 97 in Reading / 99 in Math to 98 in Reading / 98 in Math. And she hated the time wasted on prep.

I have also noted that children's scores are often stable. And that test prep accounts for only small changes. A child was provided heavy-duty test prep both in class and for homework for nearly 3 months in a Chicago Public school in 8th grade last year. Her scores went from 97 in Reading / 99 in Math to 98 in Reading / 98 in Math. And she hated the time wasted on prep.

The solution would seem to be to design tests then that actually measure the qualities that we hope to cultivate, qualities that will be useful in life, whether the next level of education or the workforce, the ability to solve problems, think critically and communicate effectively in writing and verbally. Such tests would free teachers from teaching to the test and make their jobs about skill development and making better people again. The reason this won't be done is because it would involve more spending, involve giving tests that require interviews with students and evaluation their writing and problem solving processes rather than just feeding a scantron. Nevermind that we could create a cottage industry of test evaluators, the states could generate revenue through licensing fees and training courses, and some of those unemployed collage graduates could find jobs, at least part time. It just won't happen and we'll continue to stare at the emperor's bare heiny like it's the greatest garment we've ever seen and I just can't understand why.

I can't agree that we need "better" tests, although I realize many tests are unfair and unhelpful. They are designed to sort and relegate kids and teachers into different tiers. They are high-stakes and punitive and they do real harm to our schools' curriculum because of the time wasted on test prep.

Our federal and state politicians have taken a lot of money from the corporations which benefit from this crazy increase in computer-based standardized testing for K-12.

I can't agree that we need "better" tests, although I realize many tests are unfair and unhelpful. They are designed to sort and relegate kids and teachers into different tiers. They are high-stakes and punitive and they do real harm to our schools' curriculum because of the time wasted on test prep.

Our federal and state politicians have taken a lot of money from the corporations which benefit from this crazy increase in computer-based standardized testing for K-12.

we have so many standardized, computer-based tests now, beginning in preschool and going through 12th grade!

I can't agree that we need "better" tests, although I realize many are not diagnostic and helpful but high-stakes and punitive. Those are used to sort and relegate students and teachers into tiers.

The federal and state politicians have already taken a lot of money from the corporations and their lobbyists that develop these tests and the technology to run them: Microsoft, Apple, Amazon, Pearson, McGraw-Hill, and Amplify (formerly Murdoch's Wireless Generation).

If citizens could find a way to prevent corporations from profiting from excessive testing, our schools might be allowed to get back to helping children learn.

we have so many standardized, computer-based tests now, beginning in preschool and going through 12th grade!

I can't agree that we need "better" tests, although I realize many are not diagnostic and helpful but high-stakes and punitive. Those are used to sort and relegate students and teachers into tiers.

The federal and state politicians have already taken a lot of money from the corporations and their lobbyists that develop these tests and the technology to run them: Microsoft, Apple, Amazon, Pearson, McGraw-Hill, and Amplify (formerly Murdoch's Wireless Generation).

If citizens could find a way to prevent corporations from profiting from excessive testing, our schools might be allowed to get back to helping children learn.

we have so many standardized, computer-based tests now, beginning in preschool and going through 12th grade!

I can't agree that we need "better" tests, although I realize many are not diagnostic and helpful but high-stakes and punitive. Those are used to sort and relegate students and teachers into tiers.

The federal and state politicians have already taken a lot of money from the corporations and their lobbyists that develop these tests and the technology to run them: Microsoft, Apple, Amazon, Pearson, McGraw-Hill, and Amplify (formerly Murdoch's Wireless Generation).

If citizens could find a way to prevent corporations from profiting from excessive testing, our schools might be allowed to get back to helping children learn.

we have so many standardized, computer-based tests now, beginning in preschool and going through 12th grade!

I can't agree that we need "better" tests, although I realize many are not diagnostic and helpful but high-stakes and punitive. Those are used to sort and relegate students and teachers into tiers.

The federal and state politicians have already taken a lot of money from the corporations and their lobbyists that develop these tests and the technology to run them: Microsoft, Apple, Amazon, Pearson, McGraw-Hill, and Amplify (formerly Murdoch's Wireless Generation).

If citizens could find a way to prevent corporations from profiting from excessive testing, our schools might be allowed to get back to helping children learn.

Better tests would measure students' acquisition of learning outcomes, those learning outcomes being real world skills of communication and problem solving processes, and even some memory tasks. Of course these tests shouldn't be high stakes and shouldn't be used as they are as the sole measure to categorize students, evaluate teachers, or determine a school's funding. However, it's possible to create a standard that satisfies the demand by politicians, parents, other educators or trainers, and even students themselves for a test which provides a description of students' abilities and areas for improvement.