He's Got Your Number

These days, the hottest speaker on the education-policy circuit is a
soft- spoken 58-year-old professor of statistics who has spent much of
his career crunching numbers for agricultural researchers. "I turned
down four speaking requests yesterday," says William Sanders, a
professor at the University of Tennessee's school of agriculture and,
oddly enough, the man who has created a controversial method of judging
the effectiveness of teachers and schools. Last year, Sanders flew more
than 30,000 miles on Delta Airlines alone, traveling, as he puts it,
"from sea to shining sea" to give his standard hour-and-a-half
presentation on the benefits of his evaluation system.

Sanders insists that he is more interested in helping weak
teachers improve than getting rid of them.

"The message is the same wherever I go and whoever I talk to," says the
Tennessee native, a courtly man with gray hair who wears wire-frame
glasses and speaks with a lilting Southern drawl. "I'm a spokesman for
the data." By that, he means that the art of teaching can be quantified
by taking students' test scores, plugging the numbers into a computer,
and measuring how those students improve from one academic year to the
next. Sanders claims his system is "more fair, more realistic, and more
reasonable" than any other method of evaluating schools and
teachers.

Not everyone agrees, but that hasn't stopped Sanders' system from
gaining acceptance. Tennessee has used it since 1992 as part of a
statewide school reform package. Florida is taking a close look at it,
as are a number of school districts across the country. Some, like
District 60 in Pueblo, Colorado, are already using it.

What policymakers find so appealing about the method is that it
seems to filter out external factors—in particular, the
socioeconomic levels of students—that make traditional
school-by-school comparisons so suspect. For years, teachers and
principals in poor inner-city schools have argued that because their
students come from disadvantaged backgrounds, it is unfair to compare
their scores on standardized tests with those of more affluent
children. Sanders agrees and has devised a system that he says measures
only the "value added" to a child's learning by a school or a teacher.
The advantage of such an approach, he believes, is that it focuses on
students' academic gains rather than their raw achievement
scores. And by doing that, one can draw some conclusions about how
schools, and even individual teachers, are performing.

"Society has a right to expect that schools will provide students
with the opportunity for academic gain regardless of the level at which
the students enter the educational venue," Sanders says. "In other
words, all students can and should learn commensurate with their
abilities."

The policy implications of Sanders' system are enormous. To
supporters, it promises—finally—an objective method of
teacher evaluation, one that is based on classroom results, not peer
review or credentials or observation. However, the value-added method
alarms many teachers—does good teaching always translate to
improved test scores?—and they worry that it could be used
against them unfairly.

Sanders insists that he is more interested in helping weak teachers
improve than getting rid of them. "We hope our research is used for
diagnostic purposes," he says, "so teachers can consider what they're
doing and how to improve it." In Tennessee, much to Sanders' delight,
some teachers are doing just that.

'I've told this story 8 million times," says Sanders, before
recounting yet again how he stumbled upon his method of evaluating
schools and teachers. He is sitting in his second-floor office at the
Statistical and Computing Consulting Services Department, in Morgan
Hall, a red-brick Gothic Revival building on the University of
Tennessee's Knoxville campus. This morning, Sanders is dressed casually
in a blue-and-red plaid shirt and khaki dress slacks, held up by both a
belt and suspenders. (You can never be too careful.) His office, with
vintage metal chairs and bookcases, wood-paneled walls, and pea-green
carpet, recalls another era, say, 1955. Sanders, too, seems out of the
past. He is friendly and gracious, with Southern manners and rural
charm. In another life, he might have been a railroad clerk or a
small-town postmaster. Every now and then, he takes a dip of Skoal
chewing tobacco and places it between his cheek and gums.

Sanders grew up on a small dairy farm in middle Tennessee. He
attended public schools and, upon graduating from high school, entered
the University of Tennessee, where he studied physics and animal
science. Statistics, however, became his primary focus. "I was
intrigued with how statistical methodology could be applied to so many
real-world situations," he says. After earning a doctorate in
biostatistics and quantitative genetics, he was hired in 1968 by the
Oak Ridge National Laboratory, which was studying the biological
effects of radiation. Four years later, however, he was asked to return
to UT to create the Statistical and Computing Consulting Services
Department, which conducts research for UT scientists from a wide
variety of academic disciplines.

Ultimately,
Sanders hopes his system will help struggling teachers get
support and assistance.

In 1982, Sanders happened to read a newspaper article asserting that
teacher effectiveness could not be measured quantitatively by looking
at students' test scores. Then-governor Lamar Alexander was proposing a
plan that would identify Tennessee's best teachers and allow them to
receive higher salaries, greater status, and new roles. "The big
issue," Sanders says, "was this: If you're going to do something like
that, how are you going to measure the teachers? The article I read
cited two or three statistical reasons why you couldn't use student
achievement data to do that. And I thought, Well, there may be good
reasons for not doing it, but those are not good reasons." Sanders was
convinced it could be done, and he decided to try to prove it.

He and a colleague, Robert McLean, sent a letter ("More on a lark
than anything else," Sanders says) to the governor's office explaining
that they would like to use something called a "mixed model"
statistical methodology—an approach originally developed by a
renowned animal breeder at Cornell University—to show that
student achievement data could in fact be used for teacher assessment.
The letter got bounced to the state department of education, which
eventually furnished the researchers with data. Using three years'
worth of student test scores from Knox County, Tennessee, Sanders and
McLean proved their theory. The researchers wrote up their findings,
and then Sanders called his contact at the education department in
Nashville and said, "I'm through."

"With what?" came the reply.

"It's obvious that they had not taken it nearly as seriously as I
had," says the professor, smiling. Two later studies confirmed the
Knoxville data, but no one seemed particularly interested in Sanders'
research. "I thought the whole world was waiting on this!" he says.

Several years later, however, policymakers found new reason to look
at Sanders' numbers. Education reform was in the air, and Tennessee
legislators were looking for a fresh approach to school accountability.
In 1990, Sanders spent about four months in the state capitol meeting
with officials, including the new governor, Ned McWherter. Sanders
advocated that his system be used statewide, but when it began to look
as if that might actually happen, he was momentarily caught off guard.
"I was like the dog that chased the car," he recalls. "It looked like
the car was going to stop, and I didn't know what I was going to do
with it."

Eventually, legislators incorporated Sanders' methodology into the
1992 Education Improvement Act, signed into law by McWherter. Partly a
response to a court order to equalize funding between rural and urban
schools, the law established a statewide half-cent sales tax to boost
K-12 education spending. But it also created a number of reform
measures, including the groundbreaking school-accountability program
based on Sanders' work, dubbed the Tennessee Value-Added Assessment
System, or TVAAS. Sanders: "I told the state commissioner of education
that he was going to have to let me go home and assemble a team of
folks to build a software system to allow me to do the very things that
I had been advocating." The result was the Value-Added Research and
Assessment Center at the University of Tennessee, with Sanders at the
helm. (Sanders continues to run the university's Statistical and
Computing Consulting Services Department, but most of his time is now
devoted to education research.)

After the law passed, teachers in Tennessee were skeptical of the
value-added component. Many directed their hostilities toward Sanders,
an easy target given his affiliation with UT's school of agriculture.
How, they demanded to know, could you use a statistical method
developed for evaluating farm animals to measure the effectiveness of
teachers? But Sanders stood his ground, and he patiently explained his
complicated method to anyone who took the time to call or write.

Here's how it works: Each year, in late March or early April,
students in grades 3 through 8 take a battery of tests known as the
Tennessee Comprehensive Assessment Program, or TCAP, in five subjects:
reading, language, math, science, and social studies. Scores are sent
to Sanders and his staff, who plug the numbers into an IBM RS/6000
series computer, which merges new test data with scores from previous
years. This enables the statisticians to track student achievement over
time.

Sanders advocated that his system be used statewide, but when it
began to look as if that might actually happen, he was momentarily
caught off guard.

"We follow the progress of each child individually," Sanders says, "and
compare each child to his own past performance, not to test scores of
other kids." Of course, the growth rate of one student doesn't say much
about his or her teacher. But when you look at the data by classroom,
by school, and by district, patterns begin to emerge. "And if you find
that the majority of kids in a particular classroom have flat spots on
the growth curve," Sanders explains, "it becomes strong, powerful
evidence that something regarding instruction is not happening in that
classroom."

In the fall, schools receive their report cards for the previous
academic year. The summaries, which are made public and printed in
Tennessee newspapers, show how each grade has improved—or
not—in each subject, based on national norms. In other words, it
reveals how much the children have learned—or how much "value"
has been added—over the course of a year. For example, a report
card might show that an elementary school's 4th grade reading scores
have gone from 85 percent of the national norm to 110 percent, a gain
of 25 percent. Or, it might show that scores have flattened or even
dropped.

In Tennessee, the scores have consequences. Once run through
Sanders' value-added system, they are combined with other indicators to
determine rewards for individual schools and sanctions for districts.
Schools whose cumulative gains in the value-added scores from each of
the five subjects at least match the gains in the national norm are
eligible for additional state funds. Meanwhile, districts that fail to
achieve average gains equal to at least 95 percent of the national norm
are subject to sanctions.

Each teacher also receives a report card, which reveals the academic
progress of his or her students. These reports are not made public, and
their contents are known only to the teacher and his or her principal.
(Of course, in schools where, say, there is only one 4th grade teacher,
that teacher's scores are effectively made public in the school
report.)

Sanders has always said that scores for individual teachers should
not be released publicly. "That would be totally inappropriate," he
says. "This is about trying to improve our schools, not embarrassing
teachers. If their scores were made available, it would create chaos
because most parents would be trying to get their kids into the same
classroom."

Still, Sanders says, it's critical that ineffective teachers be
identified. "The evidence is overwhelming," he says, "that if any child
catches two very weak teachers in a row, unless there is a major
intervention, that kid never recovers from it. And that's something
that as a society we can't ignore."

Ultimately, Sanders hopes his system will help struggling teachers
get support and assistance. But he adds, "After a reasonable period of
time, either if they don't try to improve or if they don't improve,
then they should be encouraged to seek employment elsewhere."

No teacher in Tennessee has lost his or her job due to poor TVAAS
scores, according to Al Mance, executive director of the Tennessee
Education Association. The union fought hard to keep the scores
confidential, and so far, they are only considered as one of many
factors in personnel evaluations. Test results are "only a small piece
of the pie," says Mance.

But that's not to say that others aren't eager to use the scores for
their own purposes. A few years ago, Dave Shearon, a school board
member in Nashville, proposed that the city's school system use the
TVAAS data to plot on a map where the most effective and least
effective teachers work, using a system of red and green dots—red
for the best teachers, green for the worst.

Shearon told a reporter for the Tennessean, "I am concerned
about whether or not our distribution of most and least effective
teachers is slanted one way or another." He suggested using the
information to reshuffle teaching assignments on a voluntary basis. But
he backed down after Nashville teachers began wearing red and green
stickers to school, mocking the idea.

Marsha Denton wasn't very happy the first time she saw her
value-added scores. A social studies teacher at Buena Vista Middle
School in Nashville, Denton had always considered herself a first-rate
teacher.

"When I looked at my data," she says, "I saw tremendous strength in
7th grade, but my data for 8th grade wasn't as good. It was OK, but not
nearly as strong." The numbers, she confesses, "messed with my
head."

"I bawled and screamed for a couple of days," she says. "But then I
realized it was just like anything else. You can sit around and whine
and cry about it all day, or you can say, 'What am I going to do about
it?' " Denton decided she would try to use the data to improve her
teaching. First, however, she set about learning as much as she could
about the value-added approach. And the more she learned, the more
convinced she became that it was a useful—and valid—tool
for measuring the strengths and weaknesses of teachers.

‘It seemed
to me to be the first reasonable method of evaluating students
because it wasn't biased about socioeconomic status, And it
seemed like something that I could use.’

Marsha
Denton,
Consultant,
Metropolitan Nashville Public Schools

"It just made so much sense," says the 40-year-old teacher, who has
long been suspicious of the efficacy of standardized-test scores. "It
seemed to me to be the first reasonable method of evaluating students
because it wasn't biased about socioeconomic status. And it seemed like
something that I could use."

Denton looked at her scores and concluded that she was using
different teaching styles for her two grades. "In my 7th grade
classes," she says, "I was getting the students more involved in the
learning process. I was more of a facilitator. But in my 8th grade
classes, I was the one busting my chops. The kids were sitting there
listening and taking notes, and we were interacting and talking. But
they weren't doing the thinking—I was the one doing the
thinking." So the following year,

Denton made some changes in her classroom, and her scores went up.
"I had improved in the areas that I had hoped I would."

Word of Denton's expertise with the TVAAS scores spread to other
teachers, and it wasn't long before she was in demand throughout the
district. Eventually, she was getting so many phone calls that she
decided to take a two-year leave of absence from Buena Vista. Now, she
holds two jobs: one as a consultant for the Metropolitan Nashville
Public Schools and the other as an associate professor of education at
Trevecca Nazarene University. She spends much of her time on the road,
meeting with teachers, showing them how to use their value-added scores
to improve their teaching. Often, she admits, it's a hard sell. "Most
teachers are skeptical of value-added because they don't understand
it," she says. "I'm not there to change their minds or to tell them
what to do. I'm there to educate."

Franklin, Tennessee, is a small town about 15 miles south of
Nashville. Last summer, Denton spent three days meeting with a group of
teachers who work in the Franklin Special School District. Among them
was Jane Brown, a 4th grade teacher at Moore Elementary School.

"I wasn't buying the TVAAS data," Brown says. "I couldn't make it
work. But Marsha told us, 'It really doesn't matter if you buy it or
not. It's not going to go away. Deal with it.' And she was the first
person who said, 'I can give you a way to use this information to your
advantage.' "

By traditional measures, Moore Elementary, which serves 500 K-4
students, seems like it's doing a good job. The school consistently
scores above both state and national norms in all subject areas.
However, the school's value-added scores are consistently below
expected growth targets. "And that's been very frustrating," Brown
says.

"We used to just look at our test scores," says principal Patricia
Green, "and the majority of the students were doing very well. We
thought, We must be doing something right. But when we looked at the
value-added scores, some of the students weren't doing as well as they
should, in terms of growth. And these were some of the better kids.
Sure, they were scoring in the 99th percentile, but they weren't
gaining as much as they should from one year to the next."

With Denton's help, Brown and her colleagues concluded that they
were devoting more time and energy to the school's neediest
children—"which is the teacher's first impulse," Brown
notes—but not spending as much time working with the top
students. This was causing what Sanders calls a "shed" pattern, in
which academic gains drop off as achievement level rises, creating a
downward slope that resembles a slanting roof. (Indeed, Sanders'
research has shown that, in Tennessee, high-achieving
students—especially high-achieving minority students—make
the least academic progress from year to year.)

When you look at the data by classroom, by school, and by
district, patterns begin to emerge.

Moore's teachers are now trying to figure out what to change to reach
all students effectively. "But you don't want to throw out the baby
with the bath water," Brown says. "You can only change a certain number
of variables at a time and then wait and see what happens on the
test."

Still, Brown concedes that the TVAAS data is useful. "Now," she
says, "instead of change being based on gut feelings, it can be based
on quantitative information about the students. I'm looking for every
clue I can to find out how I'm doing as a teacher." And while she
praises her district for "not getting too out of whack" about the test
results, Brown notes that some teachers "stress out" when her school's
TVAAS scores are made public every year. "It can be demoralizing," she
says.

"There are some teachers here who think this will go away," she
adds, "but I don't think it will. And if I had the power to do away
with it, I probably wouldn't."

Sanders admits that some teachers "absolutely refuse to look at the
data." On that point, Paul Webb, a teacher and longtime critic of
Sanders, agrees. "The dirty little secret," he says, "is that most
teachers don't pay much attention to it."

Paul and Judy Webb both teach school in and around Newport,
Tennessee, about an hour from Knoxville. When it comes to Sanders and
his value-added approach, they don't mince words, calling it "an
unreliable, invalid, and often slanderous evaluation system." Several
years ago, they set up a Web site to disseminate their views. On it can
be found just about every charge that has been leveled against Sanders
and the TVAAS.

For one thing, they assert, it's too complicated. (Others have made
the same charge.) Teachers must take it on faith that Sanders' computer
methodology is fair and accurate. "That may be OK for religion," Paul
Webb says, "but not when you're talking about education." The Webbs
take particular offense at a handbook published by Sanders' office
titled, "Using and Interpreting Tennessee's Value-Added Assessment
System." The handbook concedes that while mixed-model statistics can be
learned, "the subject is complex, and without spending some time and
energy on it, one will probably have to go with faith. Incidentally,
there is nothing wrong with faith, and it is certainly preferable to
ignorance and prejudice."

The Webbs call such statements "arrogant." In short, they argue that
the art of teaching cannot be quantified. "Principals," they say, "need
only to know great teaching when they see it to ensure quality for
Tennessee's children."

To Monty Neill, executive director of the National Center for Fair
and Open Testing, or FairTest, Tennessee's value-added system is but
one more example of the nation's current obsession with testing and
assessment. He admits that "the concept behind Sanders' method is
reasonable—that is, kids begin school in different places and
grow at different rates. But the problem is in using standardized tests
to determine that." And, he adds, "testing all students with a
norm-referenced, multiple-choice test that will substantially control
curriculum and instruction is too high an educational price to pay for
the information gained."

Neill also argues that Sanders' approach "falsely assumes that kids
only learn academics at school." What about learning that takes place
at home? he asks.

This criticism echoes a 1995 review of Sanders' system by the state
comptroller's office. The review cites several failings of the TVAAS,
including large year-to-year swings in value-added scores that
administrators can't explain. It also faults the system for its
assumption that all learning takes place in the classroom. "The model
seems to assume that all gain (or lack thereof) is purely
teacher-related," the report reads, "while it has not provided adequate
evidence to support this contention."

Sanders has spent a lot of time answering the critics. Yes, he
agrees, his system is complex, and he doesn't expect teachers to
understand the mathematical analysis behind it. "I can use a cell phone
without knowing how it works," he says. "All I want to know is that I'm
talking to someone on the other end of the line."

As for the concerns about statistical variations, Sanders argues
that they are to be expected in the early stages of a new model, and
that they will decrease over time.

Regarding the fundamental question of whether teaching, in all its
messy and complicated glory, can be quantified, Sanders has this to
say: "There is no way you can measure all of the important things a
teacher does in the classroom. But that doesn't mean you shouldn't be
measuring the things that can be measured."

Sanders likes to say, "I'm the numbers guy, the measurement guy. I'm
not the policy guy." But that's misleading. In fact, Sanders has become
something of a guru, and wherever he goes, he uses his data to draw
some very specific conclusions about education, teachers, and exactly
what schools should be doing to make sure that all students are
achieving at higher levels. He isn't shy about voicing his opinions on
such matters, in person and in print.

Sanders, for example, says that teacher effectiveness is "the single
biggest factor influencing gains in achievement, an influence many
times greater than poverty or per-pupil expenditures"—a statement
that challenges long-held assumption about the influence of a child's
socioeconomic background on his or her learning.

When it comes to
Sanders and his value-added approach, they don't mince words,
calling it "an unreliable, invalid, and often slanderous
evaulation system."

And Sanders has lots to say about the classroom and what makes for good
instruction. Effective teachers, he says, "get excellent gains across
the entire spectrum of kids in their classroom [because] they've got
kids working at different paces and at different places." Ineffective
teachers, on the other hand, "tend to focus on the lower-end kids. They
may be sincere and conscientious, but they're holding back the
others."

Despite his claims that he is an agnostic on policy questions,
Sanders argues, "It is imperative that we focus on bringing all our
energy and effort to try to shrink the variability of teacher
effectiveness, so that it doesn't make so much difference which
classroom a child walks into."

In a paper titled "Cumulative and Residual Effects of Teachers on
Future Student Academic Achievement," written with his wife, June
Rivers—a former teacher who holds a Ph.D. in K-12
administration—Sanders argues that "teacher assignment sequences
should be determined to [ensure] that no child is assigned to a teacher
sequence that will be unduly hurtful to his or her academic
achievement." Translation: Assignments should be reshuffled so that no
student has an ineffective instructor more than once.

Sanders also criticizes the standards movement embraced by the
nation's policymakers for setting "unrealistic goals" that not all
students can attain. "I have a problem with statements like, 'What
should 4th graders know and be able to do?' " he says. "Pray tell,
which 4th graders are we talking about?"

He adds, "I believe we should visualize the curriculum not as stair
steps, but rather as a ramp. I want all kids to go up the ramp, but I
recognize that not all kids are going to be at the same place at the
same time. What I want is to hold educators accountable for is the
speed of movement up the ramp, not the position on the ramp."

But ultimately should all students get to the top of that ramp?

"I don't think that's realistic either," he says. "What I want to do
is to push all kids as far up that ramp as possible, and if we focus on
the gain rate, then achievement levels for all kids are going to be
higher than any of us can imagine."

Maybe a statistician has no business making these kinds of policy
recommendations. If Sanders is indeed "the measurement guy" and not
"the policy guy," why does he use his data to tell educators and
policymakers what they should be doing?

Still, it's hard to disagree with much of what he says. Teachers
do matter a great deal—everybody knows that—and the
best ones probably are the ones who make every effort to reach students
at all levels. But determining exactly who are the best and who are the
worst—can such judgments really be made based on the results of a
single, annual battery of standardized tests?

Though Sanders denies it, he seems to be on a mission to convince
anybody who'll listen that his system is by far the best way to measure
what goes on in the classroom. "No," he counters, "it's more that I
have a responsibility to explain and defend what I advocate." The
nonstop travel, he admits, is "physically draining." Indeed, he hopes
to cut back on his speaking engagements soon. This summer, he and five
members of his value-added research team, including his wife, will
leave the University of Tennessee to join SAS Institute, a research
firm based near Raleigh, North Carolina. Sanders will be head of the
firm's new Educational Value Added Assessment Services division. (He
and his staff will continue to crunch the TVAAS numbers for the state
of Tennessee.)

But keeping Sanders off the road may be difficult. Preaching seems
to be in his blood.

Not long ago, Sanders spoke to a group of teachers in a large urban
district in the Northeast. (He won't say where.) "And there's a very
strong union in this district," he says. "When I started to speak about
using test data as part of teacher evaluation—here I am, a small,
white Southerner with a strong Southern accent, in an area where folks
are not accustomed to hearing people with Southern accents—there
was this din in the audience, lots of people having conversations.
Attendance was required by the superintendent, and it was clear that
these teachers had just as soon not be there. Needless to say, it was
not the warmest reception that I've had in my life. As I started
speaking, I thought to myself, Sanders, what are you doing here?

"But I think it's fair to say that, for the last 15 minutes, you
could hear a pin drop. One of the union officials came up to me
afterwards and said, 'You didn't say a thing that I really disagreed
with.' "

Understanding
and Using TVAAS is a booklet by Samuel E. Bratton, Jr., Sandra
P. Horn, and S. Paul Wright. The booklet contains sections on the
evaluation and use of TVAAS results, basic principles, and use of
various assessments. It can be found at Shearon for
Schools, a Web site that contains a number of TVAAS resources such
as research findings and news articles.

Read a 1999 article, "Value-Added Assessment:
An Accountability Revolution," by J. E. Stone. In the article,
Stone, an educational psychologist and education professor, provides a
detailed analysis of TVAAS, emphasizing strengths and providing a list
of resources for further information.

Ground Rules for Posting
We encourage lively debate, but please be respectful of others. Profanity and personal attacks are prohibited. By commenting, you are agreeing to abide by our user agreement.
All comments are public.