All About Assessment / Anchoring Down the Data

W. James Popham

Water, water, everywhere/Nor any drop to drink": These oft-quoted lines from Coleridge's "The Rime of the Ancient Mariner" remind us that seasoned sailors can find themselves floating smack in the middle of an ocean yet have no access to what they most desperately need, namely, drinkable
water. Today's educators, inundated by a profusion of students' test scores, face a similar dilemma. We can understand why some might, with good reason, cry out, "Data, data, everywhere. Too much to let us think!"

Most teachers and school administrators are now buffeted by students' scores on accountability tests, classroom tests, interim tests, college entrance exams, and even tests to see whether students are ready to take those college entrance exams. In addition, large numbers of students are often asked to do battle with the National Assessment of Educational Progress; some even tangle with an international test or two. With space travel apt to become more common in the future, imagine the potential for intergalactic test taking. Talk about data overload.

Staying on Course

Despite repeated calls for educators to get more instructional mileage out of the assessment data they have at hand, two deterrents typically stand in the way of most educators' effective use of test data. First, there's a missing realization, and second, there's a missing skill. Educators who possess both this realization and this skill will have greater success navigating these data-rich waters.

Let's look first at a key realization: Not all test scores are worth using. This is a profound—and too often overlooked—truth. Those zealots who worship daily at the altar of data usage sometimes try to stigmatize any colleague who fails to use every scrap of test-score data at hand. This, however, is advocacy misguided. Many test scores seem to have been collected almost mindlessly; even a careful analysis of those scores fails to reveal how an educator might use them to make an educationally defensible decision.

For example, in California (and many other states), students' annual test scores are returned to teachers in the form of "reporting clusters." These reporting clusters are collections of students' performances on many fundamentally dissimilar sorts of skills and knowledge—each of which is measured by a seemingly random number of items. For instance, in a 2006 language arts test, reporting clusters for grade 2 included Word Analysis and Vocabulary Development (22 questions); Reading Comprehension (15 questions); Literary Response and Analysis (6 questions); Written Conventions (14 questions); and Writing Strategies (8 questions). Teachers can make little instructional sense of such a hodgepodge. State officials urge teachers to "mine" the data or "drill deep" into those score reports. But the tests, and their resultant reports, were not conceptualized from the getgo to provide instructionally actionable data. And they don't.

Several of the so-called interim tests that testing firms are peddling to U.S. educators provide what they describe as instructionally illuminating, diagnostic score reports. However, this "diagnostic" evidence is actually reported as a student's item-by-item performance. To make sense out of such data, a teacher must go through the test's results—one item at a time—and then try to arrive at inferences about what skills and knowledge a student possesses (sometimes on the basis of a student's performance on a solitary item). How many teachers have time for this sort of analytic silliness?

Many of today's assessments were constructed to provide comparative
interpretations of students' performances—often in relation to the performances of students in a norm group of previous test takers. However, to make such norm-referenced comparisons work well, the test creators sometimes completely abandon any potential for the tests to provide useful diagnostic data—that is, any meaningful evidence about students' mastery of particular skills or knowledge. Merely wishing for diagnostic data won't make such data appear if a test isn't built with instructional diagnosis in mind.

The Skill of Reading Data

Once educators realize that not all test scores are worth serious analyses, they need an important follow-up skill: They need to be able to distinguish between data that inform educational decisions and data that don't. Realistically, an educator must be disposed to look at test scores from a totally utilitarian
perspective. A teacher might ask, for instance, "What instructional decision will I make on the basis of these test score results that I would have made differently had the scores turned out otherwise?" To answer such a question, however, teachers need to see a student's scores on a sufficiently large collection of items so they can arrive at a reasonably accurate judgment about that student's mastery or nonmastery of what the test is measuring. If the test scores you're looking at don't provide answers to such questions, then perhaps you're dealing with the wrong data. And that may be one body of water that you need to steer clear of.

W. James Popham is Emeritus Professor in the UCLA Graduate School of Education and Information Studies;
wpopham@ucla.edu.