Teacher Evaluation: A Starting Point for Action

When a couple hundred educators, journalists, parents and researchers - including the L.A. Times reporter who worked on the controversial database of teacher ratings -- gathered at UC Berkeley on Monday to tackle the thorny issue of teacher evaluation, the biggest news was probably that the discussion remained - save a few cat-calls and much grumbling - mostly civil.

Emotions ran high and opinions differed starkly, but a couple of themes emerged. First, there's a lot on which we in the big, messy education community actually agree. Second, there's a whole lot we don't know - but need to figure out. And those are starting points we can work from.

Here are the key messages I took away, and on which (I can't say this enough) pretty much everyone agreed.

1. We don't have a clear, shared definition of what makes a good teacher - and we need one. If we're going to evaluate teachers, we need to know what we're looking for, right? And what student outcomes demonstrate that good learning has gone on? We all probably have a sense that we know good teaching when we see it. But we need to compare notes - and yes, probably data - to find some common threads. Kyla Johnson-Trammell, an administrator in the Oakland Unified School District, pointed out that great teaching may look different in the diverse, high-poverty communities she serves than in other settings.

I asked an array of panelists and audience members: what's one great measure of a good teacher? Their ideas overlap some and diverge some: they reflect the heart and soul of teaching.

2. We desperately need better measures of teacher effectiveness. The old ways and the new ways, alike, generally stink.

Despite the fact that the teachers in the room probably wanted to hurl chalkboard erasers at Jason Felch, the L.A. Times reporter, he gave an impassioned and empathetic critique of the old model: the 15-minute "drive-by" observation by the principal in the back of the classroom. "The result is teachers get almost no support," Felch said. "They get very little feedback about their performance. The teachers who are doing incredible work against all odds get no recognition. Teachers who are struggling almost uniformly want to do better ... and they get no help."

At the same time, our standardized tests are...how shall I say this... A few tools shy of a full kit? Accuracy-challenged? I'll leave it to panelist Eric Hanushek of the conservative Hoover Institution at Stanford University: "We have some very bad tests." He added, aptly, "Teaching is much more complex than our analytical methods."

In truth, the tests aren't so terrible in and of themselves -- it's just that we're using them for giant purposes that way outstrip their narrow capabilities. A pair of UC Berkeley measurement experts on the panel laid out in dizzying detail the limitations of what test scores really tell us (for an explanation in plain English, see this story from Edutopia magazine). Even "value-added" measures (where we take students' test scores and calculate how much each teacher changed her students' performance, for better or worse, in a year), they said, are grossly imprecise. Scientifically speaking, the tests are "blunt instruments," said Professor Mark Wilson, but we're using them to assess individual people in fine detail. Plus, they gauge only one small slice of a student's preparation for life.

3. More rigorous observations and value-added scores from standardized tests both need to be part of teacher evaluations. No one in the room disputed this. Period.

Solution Seekers

The panelists suggested some promising new methods. Secondary science coach Anthony Cody touted the peer-assisted review program he has participated in within the Oakland schools. David Plank of Policy Analysis for California Education praised the portfolio evaluation used by the National Board for Professional Teaching Standards - but that would take hundreds of millions of hours to do for every teacher in the country. Richard Rothstein of the Economic Policy Institute pointed out that aside from education, no other profession, not banking or journalism or any other, has one person supervising 25 or 30 employees. So, no single solution is straightforward.

The question no one answered is: in the meantime, as we work toward a more coherent system, how should we evaluate teachers? UC Berkeley Professor Sophia Rabe-Hesketh, the other measurement expert, cited the gaping margin of error in calculating value-added scores and said simply, "do not use teacher value-added measures for high-stakes decisions, or for naming and shaming." Rothstein cautioned, "There are serious consequences of using one measure when you know it's not the whole picture, because it distorts the institution of education." But Hanushek and Felch both countered, essentially: if not value-added scores, then what?

Here's my appeal: we agree on a lot. We have the shared will and momentum to change our archaic systems to make education a better experience for teachers and students alike. As Rothstein said, this debate tends to be "filled with caricature and oversimplification." So -- enough of that.

Let's seize this opportunity and run with it. Let's embrace our common goals and jointly appeal to policy makers to create conditions that enable us develop creative solutions, rather than boxing us in. Let's focus on working together to untangle the puzzles of what we need to know about teachers and students, and how we can achieve that.

I think that we are also forgetting that grading standardized tests has become very tedious and subjective. With trying to assess deeper critical thinking skills comes more open ended questions and essays. People (mostly non-educators) are inadequately trained to assess such higher level thinking and often don't have enough time to even think about it. Boom. In less than ten minutes they're putting a grade on an essay that could be the end of a job for a teacher. We do stand on common ground in some areas and I think putting money into those common areas will improve teacher evaluation.

Gaetan, good point. It reminds me of this column from last year by Todd Farley, a former test grader himself, about just the kind of unreliability you mention. The folks on the UC Berkeley panel didn't spend a lot of time addressing that. The best solutions seem to require more time and money than we have so far been willing to put into educational evaluation.

There might be some promising models in development, and I outlined a few in this package of stories on better assessment. One of the experts I quoted, Andreas Schleicher of OECD, said this simple thing that sticks with me: "For any assessment, you have to make a trade-off between objectivity and relevance."