Flawed Evaluation Systems: How Should We Assess School/Teacher Performance? Who Will Have the Cojones to Admit Their Errors and Choose a Valid/Reliable/Stable System?

What if the educators making important decisions about schools and colleges are acting too much on their guts and not enough based on actual evidence? (Review of Howard Wainer, Uneducated Guesses: Using Evidence to Uncover Misguided Education Policies, 2011)

The list of scientists that have rejected Value-Added Modeling (VAM) is long and growing. Howard Wainer has been parsing numbers for decades, and getting angrier and angrier.

I don’t know whether it is the age we live in, or the age I have lived to, but whichever, I have lately found myself shouting at the TV screen disturbingly often. Part of the reason for this may be the unchecked growth of the crotchety side of my nature. But some of the blame for these untoward outbreaks can be traced directly to the unremarkable dopiness that substitutes for wisdom in modern society. Ideas whose worth diminishes with data and thought are too frequently offered as the only way to do things. Promulgators of these ideas either did not look for data to test their ideas, or worse, actively avoided considering evidence that might discredit them.

The science simply does not support the concept. The tragedy is that the feds decided to require VAM-based teacher evaluations before a “beta,” a testing phase, they not only jumped into the pool they required that every state that wanted big bucks also jump into the pool.

In New York State only 20% of a teacher’s evaluation is based on student test scores; 60% of the assessment is based on supervisory observations. The state required school districts to select from a list of observation models; Danielson, Marzano, Marshall and even a rubric developed by the state teacher union.

Just as the VAM student test score methodology is fatally flawed so are supervisory observation evaluations. The entire idea is based on a fallacy – that supervisors in school A would give the same score as supervisors in school B. As I wrote in a previous blog some principals are reticent to give lesser evaluations, it reflects poorly on the principal while others strictly apply the rubric.

A just-published study supports that higher achieving students with better language skills are more likely to exhibit behaviors rewarded on the Danielson and other scales.

School principals—when conducting classroom observations—appear to give some teachers an unfair boost based on the students they’re assigned to teach, rather than based on their own instructional savvy.

* Under current teacher evaluation systems, it is hard for a teacher who doesn’t have top students to get a top rating. Teachers with students with higher incoming achievement levels receive classroom observation scores that are higher on average than those received by teachers whose incoming students are at lower achievement levels, and districts do not have processes in place to address this bias

* Observations conducted by outside observers are more valid than observations conducted by school administrators.

* The inclusion of a school value-added component in teachers’ evaluation scores negatively impacts good teachers in bad schools and positively impacts bad teachers in good schools

We suspect that across New York State teachers in high poverty-low tax districts are receiving lower APPR scores than teachers in high achievement-high tax districts – the commissioner has failed to release this type of analysis.

These flawed scores are used to claim that teachers in high poverty schools are less capable than teachers in high wealth/high achievdment schools.

Teacher assessment is not a science – the skills and experience of school leaders varies. I have sat with groups of principals watching videos and assessing lessons. Not surprisingly we disagreed.

Other countries use inspectorate systems – “inspectors” who visit schools and assess both teacher and school effectiveness.

A study by the well-respected Chicago Consortium on School Research conducted a detailed examination of an assessment system in which trained teams of supervisors and teachers observed teachers in selected schools.

… research-based evidence showing that new teacher observation tools, when accompanied by thoughtful evaluation systems and professional development, can effectively measure teacher effectiveness and provide teachers with feedback on the factors that matter for improving student learning.

Our problem is that Secretary Duncan is wedded to a system that is unsupported, a system that is deeply flawed, a system that is rejected by experts across the spectrum, it is unlikely that he will threw himself at the feet of Randi Weingarten pleading for forgiveness. State after state timidly saluted and implemented systems, each trying to outdo the next, some using as much as 50% student test scores to assess teacher performance.

“If they succeed in their destructive goal of crippling the landmark advancement—of 45 states committing to college and career ready expectations for all students—it will be a setback to the cause of greater equality in our schools,” King said. “And that would be a disgrace.”

Compare King’s recent “dig” to current headlines about re-segregation due to charters and testing, the ethnicity of teachers in poor areas, forcing charters to comply with civil rights laws and the utterly awful common core tests of late. He is spouting inflammatory nonsense. And we know from a recent story that broke about the revision of our state standards just prior to the adoption of common core that Tisch is a real piece of work.