On Student Achievement and Teacher Evaluations

We're evidently headed to a lot of wrangling on this topic, given the focus on student-teacher data in the Race to the Top proposed criteria. So, once again Teacher Beat provides you with a cheat sheet to help you make sense of it.

First off, we must start by assuming, as the federal government does, that it is appropriate to consider student achievement at least to some degree in evaluating teachers. (I fully realize there are people and groups out there who vociferously disagree. If you are one of them, I invite you to leave a comment below to tell us all why, but this would be a short blog item if we didn't start from that assumption.)

Next, how do we define student achievement? This is the place where things really start to get dicey, because most of the annual testing is done in math and language arts. But only perhaps a third of teachers explicitly teach those subjects. So how do we get estimates about student performance in non-tested grades and subjects?

The National Council on Teacher Quality, in this report on Colorado's bid for the Race to the Top funding, elaborates on a few interesting alternatives. It suggests randomly sampling student work, as long as these samples are reviewed independently and audited centrally to ensure consistency.

As for test scores, probably the most promising option is to use "value-added" models that track growth over time rather than absolute proficiency levels, so that teachers aren't penalized off the bat for having poor-performing students.

Now, we've all heard that value-added estimates of teacher performance are problematic. The estimates of a teacher's effectiveness can vary from one year to the next. Sometimes tests aren't appropriately scaled to give good estimates; and the models are typically better at identifying outliers (very good or very weak teachers) than making finely-graded distinctions in the middle.

Still, there is a possibility of reducing error here by focusing only on the top and bottom teachers and comparing results over time, (i.e., if you are a bottom-quartile teacher for three consecutive years, something's wrong.)

Additionally, such scores could be compared to scores on measures conducted by trained observers (principals and/or peer teachers) that describe, for instance, whether a teacher effectively engages students in content, makes the purpose of the lesson clear, and engages in formative assessment to ensure students have mastered concepts.

Finally, we have this important question: Just how reliable should we expect teacher-evaluation systems to be? What margin of error are we willing to accept? Right now, districts lean toward one end, rating nearly all teachers as proficient, even those who are very poor. Clearly we don't want to go the other way, either, and misidentify scores of good teachers.

But if we expect a system to be infallible we're probably going to be disappointed. As any good scientist will remind you, measurement comes with error. Are stakeholders, especially teachers and teachers' unions, willing to accept a system that is highly reliable but not perfect? (If 95 percent of judgments are accurate, is that high enough? What if 90 percent are accurate?)

Now that I've put all that out there, let's hear your thoughts. Is this doable, or should we all give up and go home?

The most immediate problem is that Sec. Duncan and 'race to the top' identify 'achievement' as scores on state standardized tests. This would certainly be true for math and reading. Given the absence of test scores for other subjects, states might indeed think about reviewing samples of student work. Done right, that could be useful - creating the irony that what NCLB views as the core of the core - math and reading - are reduced to scores on mostly multiple-choice standardized tests, while other subjects are not.

In theory, states could use RTTT funds to overhaul their assessment systems, including ways to look at actual student work (which will require trusting local teachers, among other things). (On this, see materials at www.fairtest.org and www.edaccountability.org.) But such systems exist in no state in the US, while RTTT expects states to rapidly implement systems to evaluate teachers on student test scores. More, the RTTT draft guidelines on developing new assessments are far too vague and weak; they do not encourage (though do not bar)the necessary transformation of assessment systems.

The 'value added' concept is rooted in the state tests, leading to something closer to valueless addition. States could design growth models using multiple sources of evidence of student learning - but none are. Perhaps they will use RTTT funds or the money set aside for new assessments based on the new standards (which states are now supposed to commit to essentially sight unseen).

The upshot will be that elementary teachers in particular will even more intensively teach to the tests. The new test scores will inflate as do state scores. Kids real learning will decline even as the proponents tout the illusory gains.

It need not be that way, but the Department will have to redesign its plan.

For a large part of my 42 years as an elementary school teacher, I was a reading specialist who visited teachers' rooms daily. During this time I was able to observe the delivery of instruction and to evaluate the progress made by students.

It wasn't all that difficult to do. The students of some teachers made phenomenal progress in reading and writing, while the children in other classes made little or no progress. Almost all the adults in the school seemed to know who the really successful teachers were.

However, the standardized test scores often did not give an accurate picture because many times principals would place "low" children in the classes of these excellent teachers. Also, special education teachers often asked that their students be placed in these classes. In addition to that the best teachers were often the very people who would offer to take the English Language Learners. Many of these standardized tests were not even designed to measure student progress during an academic year. They were designed to compare populations. Principals often encouraged teachers to drill children on the test items because that was the only way to raise scores. Teachers with low test scores were often the people who refused to do this for ethical reasons.

That said, it is still possible to evaluate a teacher based on student achievement, but this would require the involvement of professionals and therefore would be expensive to do. It is very naive to believe that this complex task can be accomplished by a standardized test. To evaluate a teacher fairly and aaccurately some of the following steps would have to be taken:

An objective evaluator would have to collect samples of student writing in the fall and the spring;

The student would have to be given an individually administered reading test in the fall and again the spring. This test would have to be geared to the child's level;

The teacher would have to be observed by several unbiased professionals, ideally a committee of peers and administrators;

Standardized tests would have to be "wide-range" that is, they'd have to be able to measure students who are many years below or above grade level;

State and federal tests would have to be strictly administered. Teachers should not see tests before they are administered by people other than themselves. Teaching exact test items should be strictly prohibited and strictly enforced. Teachers should teach to the curriculum, but not to a specific test.

So in conclusion I'd like to say that, yes, a teacher can be evaluated according to student progess, but such an evaluation will be involved and expensive.

One short comment: the idea of meauring teacher performance by analyzing quartile scores for students may make some limited sense in a so-called regular ed classroom. It is much more difficult to imagine for phys ed teachers; special ed teachers; Auto Body instructors; etc. etc.

I think the most troubling aspect of this emphasis on data is its continued reduction of students and teachers to numbers on a page. Maybe they're no longer "widgets," as Arne Duncan referenced in his i3 speech, but now they're the modern equivalent-a bunch of zeros and ones. It's still dehumanizing and disrespectful of teachers as professionals and students as individual learners.

I have always believed that the only true evaluatation of a teacher should be a national test of people over the age of 25. Aren't we teaching for the future and not the short term? Don't we deny much we know about child development and acquisition of skills by arguing that teachers be evaluated each year on classroom progress? I always hoped that what I taught would be reinforced by others and by the time a student had terminated his formal education most of the information and procedures I taught would be remembered and what was not remembered was not important for that individual. It takes all of us working together to educate a population. I think current models have us working against each other. What would happen if everybody's classes moved at the same rate? Would everyone get merit pay?