Although value-added
assessments are defensible for evaluating teacher effectiveness, student test
scores need not be the only measure of teacher quality. Principal and vice
principal evaluations can also help pinpoint good teaching, and policymakers who
face resistance to value-added assessment may want to consider offering to
include supervisor evaluations as well. As a practical matter, however, many of
the same groups that unremittingly point out flaws of value-added measurements
also argue that supervisor evaluations are biased and capricious.

Yet principal or vice principal evaluations are superior to peer evaluations or parent evaluations, which are more likely to suffer from subjectivity.[*] Research findings also suggest that principals are capable of measuring teacher
effectiveness.[†]

A recent RAND Corp. working
paper on merit pay by Richard Buddin and colleagues lists some potential
limitations to supervisor evaluations of worker effectiveness.[134]
The researchers explain that it can be difficult to correct for the inherent
subjectivity of any performance evaluation that involves individual supervisor
judgment. They add that problems can also arise when workers perceive favoritism
and that a subordinate’s personality or demographics can interfere with
supervisor objectivity. They also note that supervisors may be hesitant to judge
performance accurately out of fear of reprisals from disgruntled workers.
Finally, they write, "Compression of scores or rankings towards the upper end of
the distribution is likely to occur when evaluations are used as part of a pay
setting."[135]
Buddin et al. also refer to a recent study of principals’ ability to evaluate
teacher performance by Brian Jacob of the University of Michigan and Lars
Lefgren of Brigham Young University.

Jacob and Lefgren asked principals in an unidentified Midwestern school district
to rate 202 teachers of core subjects during the 2002-2003 school year in grades
two through six on a scale from one to 10 on a number of different traits
traditionally seen as related to teacher effectiveness, such as classroom
management skills.[136]
Jacob and Lefgren also calculated the student achievement test score gains for
each teacher. Then they compared principals’ ratings of effectiveness to actual
effectiveness as measured by student achievement gains. They found that
principal ratings and value-added calculations were roughly equal in identifying
the most and least effective teachers, but that principals were less able to
differentiate effectiveness in the middle of the teacher quality distribution.
They also examined the extent to which a teacher’s education and experience,
which are the basis of the single salary schedule, are good predictors of
student achievement growth. On this question, they found that education and
experience were inferior predictive measures of teacher quality.

Interestingly, Jacob and Lefgren found that principal evaluations were better
predictors of parent preferences for specific teachers than were the teachers’
value-added achievement measures, years of experience, education or
compensation. While this finding could be taken as a sign that principals and
parents are equally "wrong," the finding probably indicates that principals
perceive teacher characteristics that parents tend to value, even though these
characteristics may not be measured by standardized tests.

Despite the fact that principal ratings are good indicators of teacher
effectiveness in the classroom, Jacob and Lefgren are careful about recommending
the use of this rating mechanism. They note that their experiment was carried
out in a setting in which principals did not face job pressure to identify
effective teachers. They explain that the effect of a higher-stakes environment
is unclear: While the increased importance of the evaluation might motivate
principals to be even more accurate, it might also make them reluctant to assess
teachers honestly for fear of reprisals.[137]
(Principals’ evaluations were kept confidential and not made available to the
teachers themselves.[138])

Jacob and Lefgren also found that principals, regardless of their own sex,
routinely discriminated against male and untenured faculty. They wrote:
"Specifically, principals rate both male and untenured teachers roughly 0.3 to
[0.5] standard deviations lower than their female and tenured colleagues with
the same actual proficiency."[139]
They offered a lengthy set of possible explanations for this discrimination
without any firm conclusion, but stated, "Regardless of the cause, however, this
discrimination may place male and untenured teachers at a disadvantage in a
system that relies more heavily on principal assessment."[140]
Ultimately, this and the study’s other findings indicate that although principal
evaluations may have drawbacks, they can help identify good teachers.

Recent research findings by Douglas Harris and Florida State University’s Tim
Sass also suggest that principal evaluations can help identify teacher quality.
In a 2007 study, Harris and Sass compared principals’ private ratings of
teachers in an anonymous Florida school district to value-added calculations of
teacher effectiveness.[141]
The 30 principals included in the study spanned elementary, middle and high
school grades. Harris and Sass wrote, "We find a positive and significant
correlation between teacher value-added and principals’ subjective ratings and
that principals’ evaluations are generally, though not always, better predictors
of a teacher’s value-added than traditional approaches to teacher compensation
that focus on experience and formal education."[142]
Like Jacob and Lefgren, Harris and Sass advised caution in the use of principal
evaluations for use in teacher accountability or reward systems; they do not
dismiss this possibility, however.

As
this research suggests, principals are generally capable of evaluating teacher
effectiveness. Principals’ input can be used as a supplement to value-added
assessment and to help address concerns over value-added measures of teacher
effectiveness.

[†] In a recent report on teacher evaluation systems, Thomas Toch and Robert Rothman of Education Sector, an education policy think tank in Washington, D.C., raise concerns about the current methods of measuring teacher quality (see Thomas Toch and Robert Rothman, “Rush to Judgment: Teacher Evaluation in Public Education” (Education Sector, 2008),
www.educationsector.org/usr_doc/RushToJudgment_ES_Jan08.pdf (accessed
June 26, 2008)). In particular, Toch and Rothman criticize the common practice
of having a single supervisor assess teacher performance through a single
classroom observation.

It is valid to criticize the practice of principals’
making uninformed personnel evaluations, and it is reasonable to encourage
principals to supplement the information gathered through their own observations
of teachers with input from lead teachers, parents and students through formal
and informal methods as appropriate. However, not all of Toch and Rothman’s
recommendations for fixing the problems inherent in conventional rating systems
are likely to bring about meaningful changes.

Toch and Rothman call for the use of multiple
measures and multiple evaluators. Regarding multiple measures, they write: “The
experiences of the leading comprehensive evaluation systems suggest that samples
of student work, teachers’ assignments, and other ‘artifacts’ of teaching are
valuable compliments to classroom observations and should be included in
evaluations” (Page 19). Moreover, they write, “To get a fuller and fairer sense
of teachers’ performance, evaluations should focus on teachers’ instruction —
the way they plan, teach, test, manage, and motivate” (Page 18). As I argue
throughout this primer, teacher performance is best measured by student
outcomes. Including these varied measures of teacher inputs sounds compelling,
but confuses the central focus of teaching. Planning, teaching, testing,
managing and motivating can help a teacher to be successful, but at the end of
these efforts, success on these tasks does not guarantee the desired outcome.
Thus, teacher evaluation should stay focused on the outcome — student
achievement — not the means of achieving that outcome.

Although they do not completely disregard the use of
standardized test scores for teacher evaluation, Toch and Rothman argue that
“test scores should have a minor role, accounting for under 50 percent of a
teacher’s evaluation” (Page 18). They refine this recommendation by stating that
test scores should not be used to measure individual teacher progress, only
schoolwide progress. Toch and Rothman support this claim by writing, “That’s
because many teachers don’t teach tested subjects, the small number of students
that many teachers teach skews the results, and using schoolwide scores
encourages school staffs to collaborate rather than compete” (Page 18).

The goal of value-added measurement is to improve
upon teacher evaluation by centering on the outcomes that matter most. The fact
that not all teachers teach currently tested subjects or large classes does not
preclude the use of the test scores to measure the performance of teachers for
whom we do have sufficient relevant data. Even so, some teachers will need to be
measured by schoolwide gains. Under a bonus system, teachers measured by
schoolwide gains could have lower potential rewards than teachers who are under
higher level of scrutiny. Alternatively, schools can introduce new assessments
in a wider variety of subjects. The data from these additional tests could be
helpful for diagnosing student progress and for measuring teacher performance.
The common complaint that teachers will compete, rather than collaborate, under
evaluation systems that use test scores to measure individual teacher
performance can also be addressed. Including a schoolwide performance measure
for all teachers — including those who will be measured individually — will
ensure that teachers continue to collaborate. In fact, it may drive them to
collaborate more than before.

Concerning the use of multiple evaluators, Toch and
Rothman argue that principals often fail to differentiate levels of performance
when evaluating teachers. Toch and Rothman suggest that this phenomenon may be
due both to the unwillingness of principals and their inability to measure
teachers accurately. To address these problems and principals’ subjectivity,
Toch and Rothman recommend the use of carefully trained peer evaluators
(typically senior teachers) whose perspectives can broaden the pool of
viewpoints.

Unfortunately, allowing teachers to evaluate one
another simply replaces one type of subjectivity with another. Teachers can use
evaluations of peers as a way to solve petty grievances and vendettas. The work
of the University of Michigan's Brian Jacob and Brigham Young University's Lars Lefgren and of Douglas Harris and Florida State University's Tim Sass indicates that principals are
capable of evaluating teachers accurately. The problems with principal
evaluations arise under the current system of teacher tenure, in which the
process of removing a low-performing teacher is doubtful and can take several
years. Principals thus face real disincentives to giving negative performance
evaluations and thereby alienating teachers.