By the end of her second year at MacFarland Middle School, fifth-grade teacher Sarah Wysocki was coming into her own.

“It is a pleasure to visit a classroom in which the elements of sound teaching, motivated students and a positive learning environment are so effectively combined,” Assistant Principal Kennard Branch wrote in her May 2011 evaluation.

He urged Wysocki to share her methods with colleagues at the D.C. public school. Other observations of her classroom that year yielded good ratings.

Two months later, she was fired.

Wysocki, 31, was let go because the reading and math scores of her students didn’t grow as predicted. Her undoing was “value-added,” a complex statistical tool used to measure a teacher’s direct contribution to test results. The District and at least 25 states, under prodding from the Obama administration, have adopted or are developing value-added systems to assess teachers.

When her students fell short, the low value-added trumped her positives in the classroom. Under the D.C. teacher evaluation system, called IMPACT, the measurement counted for 50 percent of her annual appraisal. Classroom observations, such as the one Branch conducted, represented 35 percent, and collaboration with the school community and schoolwide testing trends made up the remaining 15 percent.

Her story opens a rare window into the revolution in how teachers across the country are increasingly appraised — a mix of human observation and remorseless algorithm that is supposed to yield an authentic assessment of effectiveness. In the view of school officials, Wysocki, one of 206 D.C. teachers fired for poor performance in 2011, was appropriately judged by the same standards as her peers. Colleagues and friends say she was swept aside by a system that doesn’t always capture a teacher’s true value.

Proponents of value-added contend that it is a more meaningful yardstick of teacher effectiveness — growth over time — than a single year’s test scores. They also contend that classroom observations by school administrators can easily be colored by personal sentiments or grudges. Researchers for the Bill & Melinda Gates Foundation reported in 2010 that a teacher’s value-added track record is among the strongest predictors of student achievement gains.

Which is why D.C. school officials have made it the largest component of their evaluation system for teachers in grades with standardized tests. The District aims to expand testing so that 75 percent of classroom teachers can be rated using value-added data. Now, only about 12 percent are eligible.

“We put a lot of stock in it,” said Jason Kamras, chief of human capital for D.C. schools.

Yet even researchers and educators who support value-added caution that it can, in essence, be overvalued. Test results are too vulnerable to conditions outside a teacher’s control, some experts say, to count so heavily in a high-stakes evaluation. Poverty, learning disabilities and random testing day incidents such as illness, crime or a family emergency can skew scores.

The District attempts to compensate for some of these factors, weighing special education status, English proficiency, attendance, and eligibility for free or reduced-price lunch — a common proxy for poverty — in developing growth predictions for students.

But some experts say it should never be a decisive factor in a teacher’s future.

“It has a place, but I wouldn’t give it pride of place,” said Henry Braun, professor of education and public policy at Boston College. He contends that only random assignment of teachers and students — wholly impractical in big school systems — can eliminate enough bias and error to obtain a valid measure of how much teachers improve student performance.

Some states are taking a more conservative approach than the District. New York recently set value-added at 20 percent of annual evaluations. Tennessee and Minnesota have the ceiling at 35 percent. Other states, such as Colorado and Ohio, mandate that 50 percent of teacher assessments must use student growth data but leave it up to local school districts whether to use value-added or other measures.

“You can get me to walk down the road with you to say value-added is relevant, but 50 percent is too weighted,” said Washington Teachers’ Union President Nathan Saunders.

Kamras said the disconnect between the observations of Wysocki’s classroom and her value-added scores was “quite rare.” Most teachers with poor ratings in one area, he said, are also substandard in the other.

“It doesn’t necessarily suggest that anything wrong happened,” he said. “Sometimes it’s just not possible to know for sure.”

Wysocki said there is another possible explanation: Many students arrived at her class in August 2010 after receiving inflated test scores in fourth grade.

Fourteen of her 25 students had attended Barnard Elementary. The school is one of 41 in which publishers of the D.C. Comprehensive Assessment System tests found unusually high numbers of answer sheet erasures in spring 2010, with wrong answers changed to right. Twenty-nine percent of Barnard’s 2010 fourth-graders scored at the advanced level in reading, about five times the District average.

D.C. and federal investigators are examining whether there was cheating, but school officials stand by the city’s test scores.

Kamras acknowledged that the Barnard data are “suggestive” of a problem but said that without clear evidence, nothing could be done. Overall, he said that Wysocki was treated fairly and that her case does not reflect a deeper issue with IMPACT.

Wysocki was out of work for only a few days. She is teaching at Hybla Valley Elementary School in Fairfax County and came forward to tell her story because she believes it is one that D.C. teachers and parents should know.

“I think what it says is how flawed this system is.”

‘Needs to be clear’

Like many young educators, Wysocki struggled at first. The Chicago-born daughter of a physicist, she came to the District in 2009 from Washington state, where she was a teacher assistant in a private Waldorf school that minimized testing and focused on the emotional and ethical development of the whole child.

In D.C. schools, she found another culture entirely. IMPACT spans an exacting set of nine performance criteria covering virtually every aspect of pedagogy, including clear presentation, behavior management and skill at asking questions. Teachers are graded on a 1-to-4 scale (ineffective, minimally effective, effective and highly effective).

Wysocki’s 2009-10 evaluation was peppered with twos.

“Your instruction needs to be clear and differentiated to meet your students’ diverse needs,” Sean Precious, then MacFarland’s principal, wrote. “Instructional time should be maximized and student misbehavior should be minimized. Please review your IMPACT binder.”

For the year, classroom observers rated her just short of effective. Her value-added score was low. That left her overall rating in her rookie year as “minimally effective.” If it happened again, she would face dismissal.

MacFarland, on Iowa Avenue NW in the Petworth neighborhood, also was struggling. Four out of five students at the school come from families poor enough to qualify for meal subsidies. Fewer than three in 10 scored proficient on the 2010 city reading test.

But Wysocki got better in 2010-11, improving her ability to tailor lessons and gaining a reputation for her skill at managing multiple groups of children in various activity centers. She drew praise from Assistant Principal Branch for “new and innovative ways” of engaging parents, “dedicating a truly exceptional amount of time towards partnering with them,” through invitations to class events and walking home students who live nearby.

“One of the best teachers I’ve ever come in contact with,” said Bryan Dorsey, head of the MacFarland PTA in 2010-11, who had a daughter in Wysocki’s class. “Every time I saw her, she was attentive to the children, went over their schoolwork, she took time with them and made sure.”

The twos from her first year’s classroom observations were replaced by threes and fours.

But Wysocki was worried. Some students who had scored advanced in fourth grade, she said, could barely read.

“I’m getting a little nervous about testing,” she wrote in an e-mail to Branch and new Principal Andre Samuels in February 2011.

Complicated system

The calculus that ended Wysocki’s career in D.C. schools started as a way of measuring the value of strawberries turned into jam. Value-added began in agriculture, where it was employed to establish the worth of farm products as they changed form. Statistician William Sanders pioneered its conversion to classroom use, starting in Tennessee in the early 1990s.

It’s complicated enough that D.C. schools hired Mathematica Policy Research of Princeton, N.J., to crunch the numbers for each of the 471 teachers in the District from fourth through eighth grades whose students took reading and math tests.

In Wysocki’s case, the firm took the fourth-grade scores of each student in her class and searched for all students in the city with the same numbers. Then, after the students took the spring 2011 tests, Mathematica averaged the scores, weighted for the actual amount of time each student spent in her class and taking into account demographic variables.

Wysocki’s actual average reading score was 54.2 out of 99, less than Mathematica’s predicted average of 59. Her math score, 56.2, was more than 6 points shy of the forecast. Her classroom observation score was 3.2 out of a possible 4, but she was still rated minimally effective and fired in July.

Wysocki was furious. “I want to know how my IVA [Individual Value-Added] can be so OUTRAGEOUSLY different from ALL my other data,” she wrote to the central office on July 19.

School officials said that if she had concerns about cheating she should have alerted the Office of Data and Accountability with specific information, including names of teachers and students from whom she heard allegations of cheating. They said that the office told her this in July 2011 but never heard back from her.

She appealed her dismissal in August to a three-member panel of central office staff, writing a detailed letter outlining concerns about possible cheating on the Barnard scores.

“I was under the impression that the letter with my appeal would be enough to prompt an investigation,” Wysocki said.

It was December before she learned that the firing was upheld. Panel members repeated that Wysocki should have gone to the accountability office sooner. But the panel added that it wouldn’t have mattered.

“The Board and the Chancellor note that investigations of cheating are outside the scope of the Chancellor’s appeals process,” the panel wrote. “As a result, the [value-added] score remains valid.”

Colleagues said they were stunned to hear of the firing. MacFarland’s other fifth-grade teacher, also highly regarded by administrators, also was let go. Teachers said they were bewildered because Samuels and Branch had repeatedly pointed out the progress fifth-graders were making in reading.

“It was celebrated within the school,” said one of several former colleagues of Wysocki’s who spoke on the condition of anonymity to avoid reprisals.

Samuels, who did not respond to calls or e-mails for comment, left Wysocki a sterling recommendation. He endorsed her “without reservation” and described her as “enthusiastic, creative, visionary, flexible, motivating and encouraging.”

Wysocki said she is comfortable in Fairfax. Hybla Valley Elementary, in the Alexandria section of the county, is not a cushy suburban posting. Of her 18 fifth-graders, half are children of immigrants. She is taking courses toward a master’s degree in education at Trinity University.

“We feel fortunate to have Sarah supporting the students at Hybla Valley,” Principal Lauren Sheehy said in an e-mail. “She is a positive and valued team player. Sarah has created a positive learning environment allowing students to be successful academically and socially.”

Wysocki said she feels more at ease generally, especially in seeking help from Sheehy and other mentors. In the District, she said, she often felt that reaching out was considered a sign of weakness.

“Teaching is an art,” she said. “There are so many things to improve on.”