LA Times reporters respond to my questions on school ratings

I have been avidly following the Los Angeles Times publication of value-added data from elementary schools in the L.A. district. After Sunday's latest installment, I posted an appreciation of what they were doing, but expressed frustration with some gaps in the story, and what I considered at least slight misrepresentation of some of their data. I thought they had failed in particular to address the possible statistical distortions that arise when analyzing how well already high-performing schools do in raising student achievement higher than what those students have done previously.

The reporters, Jason Felch, Jason Song and Doug Smith, responded Tuesday. Here is what they sent me:

Thanks for these good questions. As you suggested, some of the broader context you’re wanting will be available when the school and teacher database goes up later this week. Here are some answers for the interim:

Are the average losses by students at Wilbur – 10 percentile points in math and 3 in English over three years – meaningful? We said in our first story that teachers matter most, and have three times more impact on student scores than schools. So the lower school effect numbers should not come as a surprise. They may strike you as small, but they paint a dramatically different picture than the school’s API score, and we think they will be very meaningful to parents. Also, keep in mind that school effects, while small, are far more stable than teacher effects because they’re based on hundreds and hundreds of individual students.

You’re right to point out that students at Wilbur still end their time at the school as high achievers, even after sliding while there. But don’t we expect all children at schools to grow, not only those who are below proficient? Value-added reveals what raw achievement tests obscure: many high achieving kids at this and other schools are not being challenged. The gains of low-achieving students are often ignored.

You have several questions relating to the so-called “ceiling effect.” Are kids at the top of the achievement spectrum topping out the test, and therefore unable to make significant gains? Like other research in this area using California data, we found no evidence of that, as the story says. Presumably, if there were a ceiling effect you would see mostly low-achievement schools at the top of the growth list, while high-achievement schools would be pushed down. We found the opposite: many of the top schools in the district by growth also topped the lists of API scores. If anything, schools with lots of high achievers appear to have an easier time showing growth. (This may be because gifted kids have a very small but statistically significant advantage on growth scores, as you can see in our methodology paper.) We focused our report on those schools where the API was misleading, and might have made this point more clearly in the story. As you hinted, space limitations didn’t allow for that elaboration. But it will be very clear when parents start using the data on our database.

As for outside research on ceiling effects with California’s state test, see Cory Koedel & Julian Betts, 2009. "Value-Added to What? How a Ceiling in the Testing Instrument Influences Value-Added Estimation," NBER Working Papers 14778, National Bureau of Economic Research, Inc.

You ask about the precision of the data in several places. We’re working hard to make sure the information we’re going to make public is understood in the proper context, and with the appropriate caveats in place. For example, teachers and schools will not be receiving numbered percentile rankings, which imply a level of precision that is not possible with these estimates. Rather, their relative position will be shown along a spectrum of effectiveness in a way that visually indicates the inherent imprecision of these estimates. A Q&A will also inform parents how they might balance this information with other indicators of teacher and school effectiveness. For more technical readers, our methods paper (www.latimes.com/teachermethod) gives the actual effect sizes found in our study, expressed in standard deviations from the mean.

A few points we would like to make in addition to those you touched on:

There has been much debate about the reliability of value-added approaches, and there are ample ways to show it is a far from perfect measure. But shouldn’t we really be comparing it to the status quo for the vast majority of school districts – occasional, pre-announced and subjectively evaluated classroom visits, once every few years? And what of the reliability of some of the other “multiple measures” everyone agrees should be a component of teacher evaluations? What is the “error rate” of parent and student surveys, portfolio reviews, and the various observation rubrics? The answer: nobody knows. That’s why the Gates Foundation is spending millions to test them, using value-add as a baseline. When compared to these other largely unstudied measures of teacher effectiveness, experts tell us value-added vaults to the top of the class.

It is a random-assignment experimental validation of several value-added methodologies. The conclusion: “While all of the teacher effect estimates we considered were significant predictors of student achievement under random assignment, those that controlled for prior student test scores yielded unbiased predictions and those that further controlled for mean classroom characteristics yielded the best prediction accuracy.” Teacher effects, they also found, faded by 50 percent per year.

Finally, value-added measures are being adopted by school districts across the country, and usually in secrecy. They are complex, and depend greatly on a variety of decisions made along the way. Yet both the results and the methodology have often been kept away from teachers, parents, experts, and others. As you know, the failure of DC schools to make their technical methodology public before using the scores to fire teachers led to quite a ruckus in your town.

Given that history, we thought it important that the approach be publicly vetted and discussed, even if it meant fielding technical challenges for months to come. We’ve gone to lengths to be transparent about our process and decision making along the way. For fairness, for months we’ve had a team of web and data people building a back-end verification system that allows us to give teachers an opportunity to view and comment on their data before it is made public. We’ve made our methodology public and welcomed the scrutiny of experts across the country. And we’ve gone to lengths to explain those methods in lay terms to our readers.

Thanks for your interest in the stories.

Me again: They also said they would try to get to me some of the data I thought was needed on high performing schools that did well on their measures, compared to Wilbur Elementary, the school their latest story focused on that did not do so well. I will post that when I get it.

I find it interesting that the reporters have said nothing about the fact that there is usually little or no security around these tests. They are often the same, or nearly the same, from year to year. They often sit in school offices or classrooms for days before and/or after administration. As we can tell from Atlanta, DC, and many other cities, this has led to all kinds of test abuses and invalidation. The most common abuse is drilling the students on the exact test items from September to May. Of course, this invalidates the test. I might be wrong, but I have a strong suspicion that the reporters just assumed that the tests were administered correctly and just accepted the scores as "valid."

If the scores on these tests are to be made public, teachers and other citizens must insist that (1) the tests are designed to measure teacher effectiveness (2) they are different from year to year and (3) they are professionally proctored and administered.

Although teachers are usually very honest people, some are not. We need to know if "Miss Smith" is the best teacher in the school, or the one who drilled her students on the exact test items.

Today I made a donation to FairTest, an organization that "is dedicated to preventing the misuse of standardized tests." I'm also going to write to my representatives in California and Congress about the importance of ensuring the integrity of high-stakes tests. I hope others do the same.

Jay, I feel like the reports played a little language game with you the question of whether the losses at Wilbur were "significant" or not. I understood you to mean statistically (or at least analytically) significant, but their response uses the word in its less technical meaning, as a synonym for "important."

Am I wrong about this? If the loss isn't statistically or analytically significant, then it shouldn't be important, either.

"But shouldn’t we really be comparing it to the status quo for the vast majority of school districts – occasional, pre-announced and subjectively evaluated classroom visits, once every few years?"

Of course not, and you know it. Unions are offering a variety of ways to fire ineffective teachers. The best would be peer review, which I support. I want my union to litigate to the end against VAMs in the hands of managment, but I'd support the Grand Bargain where VAM data is controlled by peer review committees.

You repeatedly address that straw man, but you slide past your better questions:

"And what of the reliability of some of the other “multiple measures” everyone agrees should be a component of teacher evaluations? What is the “error rate” of parent and student surveys, portfolio reviews, and the various observation rubrics? The answer: nobody knows"

But then you assume that those methods in the hands of the National Board or the judgements at Wilbur do not prove that your VAMs are not ready for prime time. Where do you raise the question that Caruso and Wilbur could be taken as evidence that your VAM is often flat wrong.

That is a very real scenario but you can't raise it because you've already invested a year's worth of time and money on your theories. And the same would apply to school systems that invest RttT and other money. Once you've paid billions for the system, you have to fire people with it or admit you were wrong.

The better comparison would be the spending of millions - not billions - on diagnostic testing that is proven - not theorized - to be valuable in informing instruction. In fact, our current CRT tests are so primitive, I suspect that's a reason why this testing craze has failed so miserably.

Answer the question of the ceiling effect in conjunction with the admission you made that the LA tests focus on low achievers and are designed to help them.

I raised the question of the ceiling effect at Wilbur because common sense says that the model overshoot. I contrasted it with two other messures of that school used over the last three years. Did you ask that question BEFORE going to print? Of course not. How many other thousands of questions have you not addressed?

I used this opportunity to reread the long pieces on firing teachers. I found one passage, as I recall, of administrators doing a poor job of making their case even though it should have been an easy case to make. Then you seem to take their word that they don't take on the harder task of firing teachers for ineffectiveness. If they can't make a case with these extreme cases, perhaps that speaks to the incompetence and dysfunctionality of the administrative systems - the systems that you want to handle the much more complex VAM data.

I've worked with my AFT for over a decade trying to get the central office to let us help them fire bad teachers.

It is a random-assignment experimental validation of several value-added methodologies.
..................................
One tires of this idiocy.

A random assignment would be to randomly assign every student in every classroom in every public schools in an area.

Then one would have random assignment where each classroom mirrors the results of all students on a previous test.

If the rate of failure for all students was 33 percent in the 4th grade this would mean that every classroom in all schools for the 5th grade would have 33 percent of students that failed the previous 4th grade test.

This is not done since students are not moved to classrooms on a random assignment basis. None of the students in school A go to class rooms in school B.

The failure rate at all schools are usually different. The total failure rate for all students may be 33 percent for a previous test but some schools will have higher failure rates while other schools will have lower failure rates.

Imagine comparing classes with 50% of students that failed a previous test with a classroom with 25% of students that failed a previous test.

This is the usual for Jay Mathews where he simply presents fake data that is built on obvious mathematical flaws.

Random assignment would mean every classroom in every school having the same composition of students based on the results of previous tests.

You will not have randomly assignment in classrooms until teleporters are created to allow for students to be randomly assigned to classes in each school.

good comments. for richardcmiller I did mean statistical significance, but we will get to that issue. Once they release the data base there were be a lot more on that. I also hope to put up a post next week of comments from people with expertise who have taken an interest in this, but are perhaps to shy to comment here.

Oh, for god's sake, Linda Retired Teacher, get over yourself. If schools were committing fraud on a massive basis, test scores would be much higher. Instead, it's extremely easy to identify test fraud, and most culprits are generally identified. Read up on the Atlanta scandal.

You're so big on declaring yourself a teacher, as if it's a great fraternity. Well, teachers have to sign a paper attesting that they didn't cheat, that they didn't look at student results, that they didn't alert the students to test questions, and so on.

The resistance we are seeing from teachers regarding value-added evaluations is similar to the resistance I've seen in the healthcare system as outside entities invade hospitals and clinics and begin to impose standards on providers.

Closer examination of teacher performance is coming. Teacher unions can either work with it to try bring fairness and sanity to the evaluations, or they can oppose it, and be left out of the process of developing new teacher evaluation methods.

Here is the question I have when discussing data and schools: so what? So what does a student scoring a 600 on the Virginia 3rd grade science SOL tell me? Is this data telling me that the teacher is a good teacher, the student is a good memorizer, the student is a good test taker, the student enjoys science, or the student could care less about science and finished their test to get to lunch? Analyzing and over-analyzing data is too much of a focus in education and may not really tell educators or administrators anything.

"We said in our first story that teachers matter most," False statement according to the research. Curriculum and effective preschool interventions matter most. See: http://www.brookings.edu/papers/2009/1014_curriculum_whitehurst.aspx

You are exactly right, and you're backed by an incontrovertibly large body of social and cognive science. "Reformers" typically use a caveat or two when they repeat that sentence. But notice how the truth is overlooked. The sentence in this post that you cite is literally false.

I've seen the same pattern in even the NYT Magazine or New Yorker. There standard source, The New Teacher Project always puts in wiggle words when making the statement that teachers have the largest effect for which we can influence, and I'd argue that that is technically inacurate. The true statement is that teacher effect is the strongest effect we can control given the way our schools are constituted, given our failure to invest in preschool and community schools and seek to remain focues within the four walls of the classroom. But in its published work the TNTP is not inaccurate. Invaribaly, The TNTP's Jacob Weisberg and/or Tim Daly are then quoted and they make the manifestly false statement. So the reporter invites an inaccurate quote, and let's it stand.

This gets back to the discussion of the word significant. Yes, teacher effects are significant. And they can be big. We can debate whether they aare 6% or 16% or whatever. But they are still very small in comparison to other factors.

We should be seeing teaching and learning as a team effort, but this blame and shame game is killing that approach.

As I'm explaining in my book, the best way to increase student performance is to stop making stupid unforced errors. The football team that wins doesn't "lay the ball on the ground." The fear engendered by VAMs and naming names will be devastating.

Also, the "reformers" presentations were heavily influenced by the McKinsey Group, the international consultants that gave us Enron and helped break unions in England. They claim that schools alone can close the achievement gap. Do you know what their sole evidence is? The only evidence I could find in their research was the Ohio Hispanics have higher test scores than Whites in my state of Oklahoma and several other Southern states. Ohio has few Hispanics though, and fewer immigrants, and their family income is 50% greater (if I recall correctly) than White's income down here.

I am struck at the number of people who simultaneously believe that teachers don't really have much influence -- it is all socioeconomics -- but that as a society we need to drastically increase teacher salaries to get and keep better teachers in the profession. Why would we pay more if we believe that teachers really don't make a difference?

Teachers want it both ways -- they want to be treated as professionals with high pay, but they don't want to be held accountable for results. I don't think that is going to fly anymore.

Ten or fifteen years ago the teachers' unions had the chance to work together with districts and states to agree on methods to weed out bad teachers, and the unions fought tooth and nail to protect every incompetent teacher, throwing obstruction after obstruction in the way of getting rid of them. Well, the tide is finally turning, and it may not go as the unions like, but if they don't like the way it works they have only to look in the mirror to know who to blame.

good comments. for richardcmiller I did mean statistical significance, but we will get to that issue. Once they release the data base there were be a lot more on that. I also hope to put up a post next week of comments from people with expertise who have taken an interest in this, but are perhaps to shy to comment here.

Posted by: Jay Mathews | August 25, 2010 7:10 PM
...................
The usual from Jay Mathews. Point out a glaring mathematical flaw in the data and Jay Mathews simply ignores the flaw and pretends we should simply accept the flawed data and the conclusions from the flawed data.

On this basis one could create articles on anything with a pretense that data means something when major flaws are simply ignore.

Investors lost about 4 trillion dollars on worthless bundles of mortgages with supposedly mathematical proof that these mortgages would be profitable.

Apparently the problem with math taught in the public schools has been with us for some time since Americans appear so willing to accept claims with obvious significant mathematical flaws.

"In other fields, we talk about success constantly, with statistics and other measures to prove it," Duncan said
..............................
Yes and where are all the statistics and other measures to prove the effectiveness of our policy in Afghanistan or policies regarding massive unemployment.

Oh but I forgot we have no policy on massive unemployment.

In the past our President has only offered "jobs follow growth".

Any Americans in the country would be fired for such an ineffective attitude to dealing with problems.

Yes more flawed data regarding public education to simply cover the problem of children that have great difficulty in learning. See it not your child or the school, it is the teacher. Just get the right teachers and the 56 percent of students who fail the national tests of 4th grade reading will miraculously learn to read in the fifth grade.

The value added evaluations seem to be based on a mistaken idea that good teachers get kids to score well. Only if the test is a good one. I am starting to despair about education and these quotes I am reading from Arne Duncan don't help. Why is he acting like kids are some sort of product, like automobiles?

To see just a little inkling of what bsallamack writes about, pleae read this article:
Challenges clear for School 61 kindergartners
http://www.indystar.com/article/20100824/NEWS1003/8220394/1013/NEWS04

Short excerpt:
One little boy needed to be taught how to hold a crayon.

One little girl -- when told to put her blocks back on the shelf -- responded with a blank stare. She had no idea what a shelf was.
One boy was given a box of little three- and four-piece puzzles that many toddlers can do. He held two pieces up in the air. But he had no idea what to do with them. He'd never seen a puzzle.
Other children arrived at Indianapolis Public School 61 for the first day of kindergarten ready to go. They knew how to color between the lines, how to write their own name, how to count to 10 or higher.

As much as kindergarten is about starting school, it is also a gauge of how we raise our babies. At kindergarten, you can easily tell which child has been read to, been talked to, been loved. You can tell which children get enough sleep, which are hungry and which have been to a doctor or a dentist.

On his first trip to the library, he wandered to the back of the room and started spinning in a chair. Unresponsive to the librarian, he earned a trip to the principal's office -- two hours into his school career.

It wouldn't be his last. By the end of his first week, the boy called a lunchroom worker a name that included the F-word and was reported by a bus driver for dangerously waving sticks around other kids at the bus stop.

But for most of the children, that's not where they are. And really, said Indiana University professor of early childhood education Mary Benson McMullen, that shouldn't matter to teachers.

"Children don't need to be ready for you," she said. "You need to be ready for each and every child who walks through your door."
...............................
The insanity of educators that simply believe normal children in a class room are not affected when disruptive and/or violence prone children are simply tolerated in classes of normal children.

When do the educators deal with the problem instead of making normal children suffer from the neglect of educators.

A parent would be neglectful if they allowed their child to stay with a child that was disruptive and/or prone to violence, but the public schools are doing their job when they simply ignore the problem.

During one play period, a dispute arose between a boy and a girl about some foam blocks.

The boy, who wanted them all to himself, picked up a hardcover copy of "The Cat in the Hat" and whacked the girl with the book -- twice, on her leg.

When she refused to give in, the boy thrust a clenched fist into her face.

Then, he grabbed her by the throat. His 5-year-old hands weren't big enough to cause real injury, but they were enough to force the girl to give up the blocks.

His triumph would be short-lived.

A second boy who had been playing with the girl earlier returned and found the blocks in new hands. He took them back.

The first boy couldn't allow this. He got up in the newcomer's face just as he had the girl's.

This time, though, he had messed with the wrong kindergartner.

The second boy punched him square in the jawbone, knocking him back into a bookshelf. Stunned, the reeling lad felt his reddening cheek and then gathered himself.

With both boys on the floor, he retaliated with a two-footed kick to his challenger's chest.

Before the two adults in the classroom knew what was going on, the boys separated themselves. The girl and her knight -- in white sneakers -- got their blocks back.

Kindergarten has many lessons to offer. Some are taught by teachers. Some are taught by bullies. Some are taught by little boys who won't be pushed around.
.....................................
At five years old there are already the signs of a problems and it is totally ignored.

bk0512 wrote: Ten or fifteen years ago the teachers' unions had the chance to work together with districts and states to agree on methods to weed out bad teachers, and the unions fought tooth and nail to protect every incompetent teacher, throwing obstruction after obstruction in the way of getting rid of them.
__________________________
The local teachers' association in my district has worked with the district to develop a comprehensive evaluation system. Contrary to what people think, effective teachers do no want to protect ineffective teachers. They do, however, expect that due process will be followed when dismissing a teacher.

what neither the LAT reporters nor you, jay, have addressed is whether rating individual teachers publicly was (a) necessary for the story to make its point and have its impact, or (b) part of the expectation of the researcher involved -- there are at least two places in which buddin has been quoted saying he had no idea it was going to happen.

check out my annotated version of the LAT webchat, which suggest more than a few holes in the thinking we're being handed (and a mysterious absence of researchers who were supposedly consulted about naming individual teachers publicly):

Posted by: celestun100
...........................
I am surprised that any teacher in this nation still does not understand that the teachers are the scapegoats and bashing the unions of teachers helps in making the teachers the scapegoat.

I thought every teacher recognized that this was the policy of the President.