Decoding the matrix: How the grid works and why it's not all about percentages

by Cathy Woodruff

On Board Online • June 8, 2015

By Cathy Woodruff Senior Writer

State leaders who crafted the latest version of New York's teacher evaluation system created a new way to combine two key ingredients - classroom observations and student test performance - into a single rating. It's called a matrix.

A well-designed matrix, some experts say, can be a strong mechanism for translating evaluations into actions that can help teachers and administrators improve teaching and learning in their schools.

But so far, New York's evaluation matrix has been a lightning rod for criticism, and it often is characterized incorrectly as increasing the role of student test scores in evaluating teachers.

"It's frustrating," said Bob Lowry, deputy director of the New York State Council of School Superintendents. When state officials were looking for ideas in March, the superintendents were among those who noted that Massachusetts uses a matrix and offered some specific ideas on how New York could emulate that approach.

"One of the attractions was that a matrix gets away from putting a number on each teacher," Lowry said. That's desirable, he said, because it avoids "the foolish supposition that someone who gets an 89 is better than someone who gets an 85."

During budget negotiations, staffs for the governor and the Legislature came up with their own version of a matrix, which was embedded in the education policy portion of New York's 2015-16 state budget.

It is true that New York's matrix combines two elements - observations and student testing - in a square box. But it is not accurate to say that the system assigns equal weight to observations and to student testing. Nor is it correct to say the weight given to test results, part of a 40 percent student performance component in the old APPR system, increases to 50 percent under the new matrix - at least that does not happen uniformly.

Several experts who examined the system at On Board's request say it all comes down to understanding how the matrix works.

"It is not the case that such a matrix equally weights observation and student performance," said Hamp Lankford, an economist and research professor with the Department of Education Administration and Policy Studies at the University at Albany.

Rather, Lankford said, the relative weights of observations and student performance in any single evaluation will depend on the scoring ranges and other decisions made about the measurements underlying the two variables.

There is no consistent formula that governs the outcome when the component ratings from the two categories are combined through New York's evaluation matrix. A teacher's "composite" score is determined by the value written on the square that lies at the intersection of the two component scores on a 4-by-4, 16-box grid (Figure 1).

While the values assigned to eight of the squares could be reached with a 50 percent weight (or mathematical average) for each of the two components, the mix is different for other squares.

Technically, that lack of mathematical consistency could disqualify New York's evaluation grid from even being categorized as a matrix, said Bruce Piper, an associate professor of mathematics at Rensselaer Polytechnic Institute.

"I would call it a table," Piper said. "A matrix would imply more of a numerical or mathematical relationship."

The people who negotiated the APPR portion of the budget approved by the Legislature and the governor decided which of the four possible "HEDI" ratings - Highly effective, Effective, Developing or Ineffective - would go into each box but did not explain how those rating values were determined.

Eight of the 16 combinations would translate, logically, into equal weights for both components. In four of those cases, identical component ratings yield the same composite rating. (Example: a teacher with ratings of "effective" on both components is "effective" overall.) In the other four, the composite result lies halfway between the two components. (Example: a teacher with one "highly effective" component and one "developing" component receives a composite rating of "effective.")

But the other eight combinations cannot translate into equal weights for both components. Four are pulled up by the higher score. (Example: a teacher rated "developing" on one component and "effective" on the other gets a composite rating of "effective"). Four are pulled down by the lower score. (Example: a teacher rated "developing" on one component and "ineffective" on the other gets a composite rating of "ineffective").

The weight of observation versus student performance "literally depends on which box on the matrix you are talking about," Deputy Education Commissioner Ken Wagner explained during a recent Regents meeting.

A matrix-style educator evaluation framework used in Massachusetts (Figure 2) shows how a different sort of design can shift the focus away from scores and place more emphasis on professional improvement. Lowry said it's more like the model NYSCOSS had in mind when the superintendents' group contemplated a matrix.

The Massachusetts version, which also relies on components that reflect observations (summative performance) and student growth and achievement (student impact), produces no explicit scores or overall composite ratings.

"The entire educational community was thrown a curveball when the governor and Legislature came up with the matrix, because it is structurally different from the past APPR format," said NYSSBA Executive Director Timothy G. Kremer.

Also, the timeframe for implementation (June 30 for the Regents and Nov. 15 for school districts) is unrealistic, Kremer said, which is why NYSSBA has called for a delay.

"The next hurdle is to get the details right," he said. NYSSBA has made recommendations to State Education Department officials as they prepare regulations for review and adoption by the Board of Regents, he said.

"To me, focusing on what percent student test scores represent in a given evaluation score misses the point," Kremer said. "The issue is: how good is the evaluation system, and do local officials have what they need to do the job right? We need teacher and principal evaluations that help people improve and serve the goal of improving student performance."