Since then, the Times has issued two responses. The first was a quickly published article, which claimed (including in the headline) that the LAT results were confirmed by Briggs/Domingue – even though the review reached the opposite conclusions. The basis for this claim, according to the piece, was that both analyses showed wide variation in teachers’ effects on test scores (see NEPC’s reply to this article).

Then, there was another response, this time on the Times’ ombudsman-style blog. This piece quotes the paper’s Assistant Managing Editor, David Lauter, who stands by the paper’s findings and the earlier article, arguing that the biggest question is:

"…whether teachers have a significant impact on what their students learn or whether student achievement is all about … factors outside of teachers’ control. … The Colorado study comes down on our side of that debate. … For parents and others concerned about this issue, that’s the most significant finding: the quality of teachers matters."

Saying “teachers matter” is roughly equivalent to saying that teacher effects vary widely – the more teachers vary in their effectiveness, controlling for other relevant factors, the more they can be said to “matter” as a factor explaining student outcomes. Since both analyses found such variation, the Times claims that the NEPC review confirms their “most significant finding.”

The review’s authors had a much different interpretation (see their second reply). This may seem frustrating. All the back and forth has mostly focused on somewhat technical issues, such as model selection, sample comparability, and research protocol (with some ethical charges thrown in for good measure). These are essential matters, but there is also an even simpler reason for the divergent interpretations, one that is critically important and arises constantly in our debates about value-added.

Here’s the first key point: The finding that teachers matter – that there is a significant difference overall between the most and least effective teachers – is not in dispute.

Indeed, the fact that there is wide variation in teacher “quality” has been acknowledged by students, parents, and pretty much everyone else for centuries – and has been studied empirically for decades (see here and here for older examples). The more recent line of value-added research has made enormous (and fascinating) contributions to this knowledge, using increasingly sophisticated methods (see here, here, here, and here for just a few influential examples).

Therefore, the Times’ claim that the NEPC analysis confirmed their findings because they too found wide variation in teacher effects is kind of missing the point. Teacher effects will vary overall with virtually any model specification that’s even remotely complex. The real issue, both in this case and in the larger debate over value-added, is whether we can measure the effectiveness of individual teachers.

Now, if the Times had simply published a few articles reporting their overall findings – for example, the size of the aggregate difference between the most and least effective teachers, and how it varies by school, student, and teacher characteristics – I suspect there would have been relatively little controversy. The core criticisms by Briggs and Domingue would still have been relevant and worth presenting, of course – their review is focused on the analysis, not how the Times used it. But the LAT technical paper (and articles based on it) would really have just been one of dozens reaching the same conclusion – albeit one presented more accessibly (in the articles), using a large new database in the newspaper’s home town.

Let’s say I was working for a private company, and I told my boss that I had an analysis showing that there was wide variation in productivity among the company’s employees. She probably already knew that, or at least suspected as much, but she might be interested to see the size of the differences between the most and least productive workers. The results might even lead her to implement particular policies – in hiring, mentoring, supervision, and the like. But this is still quite different from saying that I could use this information to accurately identify which specific employees are the most and least productive, both now and in the future.

The same goes for teachers, and that is the context in which the criticisms by Briggs and Domingue are most consequential. They address a set of important questions: How many teachers’ estimates change with a different model with different variables (and what does that mean if they do)? Did the model omit important variables that influenced individual teachers’ estimates? Were the estimates biased by school-based decisions such as classroom assignment? How many teachers were misclassified due to random error?

From this perspective, with an eye toward individual-level accuracy, the Times might have proceeded differently. They might have accounted for error margins in assigning teachers effectiveness ratings (as I have discussed before).

When confronted with the failure to replicate their results, they might have actually shown concern, and taken steps to figure it out. And they may have reacted to the fact that their results vary by model specification and were likely biased by non-random classroom assignment (which will likely be made worse by the publication of the database) by, at the very least, agreeing to make public their sensitivity analyses, and defending their choices.

Instead, they persisted in defending a conclusion that was never in question. They argued – twice – that the NEPC review also found variation in teacher effects, and therefore supported their “most significant” conclusion, even if it disagreed with their other findings.

On this basis, they downplayed the other issues raised by Briggs/Domingue (who are, by the way, reputable researchers pointing out inherent, universally accepted flaws in these methods). In other words, the Times seems to have conflated the importance of teacher quality with the ability to measure it at the individual level.

And, unfortunately, they are not alone. I hear people – including policymakers – advocate constantly for the use of value-added in teacher evaluations or other high-stakes decisions by saying that “research shows” that there are huge differences between “good” and “bad” teachers.

This overall variation is a very important finding, but for policy purposes, it doesn’t necessarily mean that we can differentiate between the good, the bad, and the average at the level of individual teachers. How we should do so is an open question.

Conflating the importance of teacher quality with the ability to measure it carries the risk of underemphasizing all the methodological and implementation details – such as random error, model selection, and data verification – that will determine whether value-added plays a productive role in education policy.

These details are critical, and way too many states and districts, like the Los Angeles Times, actually seem to be missing the trees for the forest.

This--"This overall variation is a very important finding, but for policy purposes, it doesn’t necessarily mean that we can differentiate between the good, the bad, and the average at the level of individual teachers"--is one the most important sentences in the debate.

To ISOLATE direct and singular causation between ONE teacher and ONE student is impossible in the context of time and money required to control for all factors in the student outcome mix. This is complicated and counter-intuitive, but simply a fact.

This does NOT discount that we should be addressing teacher quality and student outcomes--just not in the simplistic and mechanical ways we are hearing and have been doing for a century.

One reason why the LA Times, and the Los Angeles community, might have dismissed Briggs' critique without a second thought; the Los Angeles school district has spent $500,000 per teacher ($3.5 million total) trying to fire just SEVEN of the districts 33,000 teachers for poor classroom performance.

The LA community knows full well the extreme lengths the teachers union, and their allies, will go to protect incompetent and ineffective teachers.

"When we fought to change the seniority-based layoff system that was disproportionately hurting our neediest students, the teachers union fought back.

When we fought to empower parents to turn around failing schools and bring in outside school operators with proven records of success, the teachers union fought back.

And now, while we try to measure teacher effectiveness in order to reward the best teachers and replace the tiny portion who aren't helping our kids learn, the teachers union fights back.

It's not easy for me to say this. I started out as an organizer for UTLA (United Teachers Los Angeles), and I don't have an anti-union bone in my body. The teachers unions aren't the biggest or the only problem facing our schools, but for many years now, they have been the most consistent, most powerful defenders of the unacceptable status quo."

"It also is worth noting that the policy center is partly funded by the Great Lakes Center for Education Research and Practice, which is run by the top officials of several Midwestern teachers unions and supported by the National Education Assn., the largest teachers union in the country and a vociferous critic of value-added analysis."

Spinning one's wheels in an effort to discredit someone is not the same thing as successfully refuting an argument. Objecting to a flawed system of measuring employees' effectiveness with the attendant consequences is not the same as defending the status quo.

Seems like a certain FrankB is monopolizing the comment section by defending a study that has been proven statistically flawed not only by those mentioned in this article, but by others including the people that warned NYC against using them to evaluate teachers.

Are there some bad apples? Yes, but as this study proves, a very effective teacher can falsely be labeled ineffective. The LA Times seems to be covering their butts rather than admit the truth.

And Frank, effectiveness should never be based on passing scores but on the amount of progress a student makes in that year. A student making a year or more's growth can still fall below the passing grade of a standardized test. That is why other methods of assessing students should also be taken into account. That's significant growth.

Progress should always be rewarded and that student should be made to feel proud of that achievement and work towards reaching or exceeding grade level. People have different rates of acquiring learning, and their journey is different.
To judge a child or teacher by that one test is really an injustice.

"A Christie administration task force is recommending New Jersey teachers be judged half on student test scores and half on observations of teachers and other methods.

The evaluation system laid out in the group’s report, if approved by the state Legislature, would affect teachers’ pay and tenure. In what his administration is pitching as the “year of education reform,” Gov. Chris Christie is looking to make it easier to fire teachers, create charter schools and pay teachers based on job performance rather than seniority."

What about the fact that many students couldn't care less about their performance on state standardized tests used for value-added measures of teacher effectiveness? Sure, elementary school students will tow the line, and some middle and high school students will accept the rhetoric of performing well for their school's sake, but many will get bored of reading long passages and doing monotonous calculations that don't figure into their grades, their chances of graduation or college acceptance, and reduce their effort or worse yet start guessing randomly so they can sleep for an extra 35 min. I have to admit, if I was a student, I wouldn't take these tests very seriously, especially for subjects that I am disinterested in or that I perform poorly in. Never mind the fact that these scores aren't accurate reflections of teacher effectiveness; In many cases they're not even accurate reflections of students' mastery of the tested standards.

After reading a NYTimes article on how Bloomberg is buying off reporters, it didn't surprise me the LA Times won the award.

But I really don't think you understand that you cannot evaluate teachers based on one test. First understand, no teacher wants an ineffective teacher in the mix. But the principals, not union, does the evaluations. I wonder how many excellent teachers made the list as ineffective?? And what other profession has a list published in the papers?? All we want is a fair and balanced evaluation process. The LA Times changing their methods only proves they were wrong the first time. And will probably be wrong again.

What boggles the mind is that people still seem think kids are vessels that information can just be poured into. Kids are kids. Some care about their work, some don't. Some try, some don't. Others are unable to focus for more than a minute or two. Often times, the teacher is great, but disruptive students make for an impossible learning environment.

"I hear people – including policymakers – advocate constantly for the use of value-added in teacher evaluations or other high-stakes decisions." Another partial truth in the ed reform debate in an attempt to substantiate one's POV.

Most, including Diane Ravitch, in any discussion regarding VAMs used in evaluating teachers have advocated for using it only as part of a "mixed measures" approach combined, of course, with subjective administrative evaluations. The degree to which VAM is used in a system is determined by the local collective bargaining agreement, not dictated by administration. As well, many "experts" also advocate for examining these results over time (at least 3-5) for more validity, that one or two years data has little validity. Did Mr. DiCarlo mention this in his piece?

It's also worth noting that random student placement is critical for any degree of effectiveness using a VAM. The "problem" students should not be placed with the same teacher year after year under the pretense they're more capable of handling these students. Does that teacher get more pay for any of this? Almost never.

Another aspect of VA measures almost never discussed is whether raw scores or percentage of growth from year to year is the accepted practice. For obvious reasons, percentage growth of a student is a more realistic view of a student's progress as opposed to their raw score, especially when dealing with youngsters from the lower learning cohort.

If you're going to discuss an issue, an attempt needs to be made to give both sides of the question and then allow the reader to judge for themselves.

There is no test that can accurately measure the progress of each child in the class while evaluating the teacher at the same time. Yes, "value added" attempts to do this, but it is far from accurate at this time. But don't take my word for it: ask any testing expert.

It's not that difficult to evaluate a teacher, but the task requires time and expertise. That should be obvious to everyone.

We encourage users to analyze, comment on and even challenge washingtonpost.com's articles, blogs, reviews and multimedia features.

User reviews and comments that include profanity or personal attacks or other inappropriate comments or material will be removed from the site. Additionally, entries that are unsigned or contain "signatures" by someone other than the actual author will be removed. Finally, we will take steps to block users who violate any of our posting standards, terms of use or privacy policies or any other policies governing this site. Please review the full rules governing commentaries and discussions.