The Irascible Professor
SMIrreverent Commentary
on the State of Education in America Today

by Dr. Mark H. Shapiro

"Experience
is the worst teacher; it gives the test before presenting the lesson."....
...Vernon Law.

Commentary
of the Day - April 5, 2006: Stairway to Nowhere. Guest commentary
by Poor Elijah (Peter Berger).

There's an old
joke about English teachers who grade compositions by tossing them down
the stairs. Landing on the first step means an A, and so on down
the alphabet. This system offers advantages. First, it's fast.
Schools and students can have the results within minutes. Second,
it's cheap. All you need is a two-story building. Finally,
it's likely no less meaningful than your state's academic assessment system.

Before you get
the wrong idea, I'm not against tests. I disagree with The Washington
Post columnist who's "never given a test" because he "respects [his]
students too much to demean them with exercises in fake knowledge."
He boasts that he's spent his career "telling students not to worry about
answering the questions." Instead he wants them to "question the
answers." This, of course, is difficult to do if you don't know any
answers.

I give tests because
they help me determine whether my students have learned what I'm trying
to teach them, like how to read and why Americans fought the Civil War.
I don't consider these things "fake knowledge."

Standardized tests
can be helpful, too, provided you remember that questions frequently don't
match what a particular class has covered at a particular time. If
your class studies division in May, and the test is in April, it'll look
like your students can't divide when they just haven't learned how yet.
Standardized scores are also increasingly based on subjective judgments
by scorers with questionable training. As a result, scores have become
less reliable and therefore less meaningful.

Public schools
have suffered for decades under the lunatic rule of experts like the Post
columnist. Their disdain for content, knowledge, and tests has spawned
a backlash obsession with standardized testing.

The detected errors
have been alarming enough. According to The New York Times,
scoring errors have forced officials nationwide to recall and void results
involving "millions of students," including episodes here in Vermont and
a single glitch that affected "250,000 students in six states." In
2004 the Educational Testing
Service misscored 27,000 teacher licensing tests. Recently the
College
Board confessed that 5000 students had received incorrect SAT scores
last October. These meltdowns meant that the wrong kids stayed back,
qualified freshmen didn't get into college, and 4000 competent teachers
couldn't get jobs.

These are the
tests that determine whether your school has made the "adequate yearly
progress" mandated by No Child Left Behind. These are the data compiled
in the charts in your local newspaper, even though a study presented at
the
Brookings Institution found that "between 50 and 80 percent of the
improvement in a school's average test scores from one year to the next
was caused by fluctuations that had nothing to do with long-term changes
in learning or productivity." A RAND
analyst concurred that all the touted testing is identifying "lucky and
unlucky schools," not "good” and "bad" schools. A Government
Accountability Office report concluded that all the "poor and unreliable"
data render comparisons "meaningless."

In Vermont we
recently received the results from last fall's statewide tests. My
students did pretty well, but the results are nonetheless pretty meaningless.
They'd be equally meaningless if my students had done poorly, but then
my objections might be considered "sour grapes".

Our state test,
the New
England Common Assessment Program, is administered in Vermont, New
Hampshire, and Rhode Island. I don't doubt that the officials and
educators who designed NECAP deliberated long and hard and had good intentions.
But that doesn't make the process scientific, or the resulting scores valid
and reliable.

For example, students
took the tests in October after their teachers had known them for just
a few weeks. NECAP nevertheless required us to grade each student
based on our "perception" of his or her "demonstrated readiness."
After so little time our "perceptions" predictably were little more than
guesses. Yet our tentative speculations were a "critical piece" and
a "key part of the NECAP standard setting process."

As with most assessments,
most NECAP scorers have never been teachers. Scoring was based on
ephemeral distinctions like the difference between details that "support
the purpose" and those that "mostly support the purpose." Raters
next selected pieces exhibiting the worst writing that in their subjective
judgment was still "proficient with distinction," "proficient," "partially
proficient," and "substantially below proficient." The "cut scores"
were based on these determinations, and those arbitrary numbers based on
this year's subjective judgments will be the permanent NECAP thresholds
for passing and failing in future years.

By the way, when
it comes to scoring writing portfolios, Vermont's Department of Education
standard for reliability is sixty percent. That means scorers can
be wrong almost half the time and still be considered accurate.

The NECAP writing
test required students to complete "planning boxes" for brainstorming their
"focus, details, and conclusion." Even if the finished essay itself
perfectly provided all three, if the boxes weren't filled in satisfactorily,
the student's score automatically was lowered by twenty percent.

Officials vetted
essay topics in an attempt to eliminate bias against any group. For
example, a writing task involving a fire was disqualified by the Bias/Sensitivity
Review on the grounds that Rhode Island students might have been traumatized
by a night club fire that occurred there a few years back. However,
this year's extended essay required writers to reflect on baby sitting,
a subject which is arguably more familiar territory for female students.
It might be that both topics are fine, but the process for making these
decisions is hardly scientific.

Meanwhile, on
the math exam some students could use calculators, while others could not.
Despite NECAP's repeated insistence on uniformity, on this critical point
each school gets to decide for itself. Even more bizarrely, there
is no place to indicate on the test whether calculators were used and no
allowance or distinction made when schools' scores are published and compared.

Most statisticians
consider groups smaller than thirty, or even sixty, too small for statistical
comparisons. NECAP reported scores from classes as small as ten.

District and school
scoring results typically fluctuate widely from year to year. In
our school, as in many, the teachers don't change, which means the annual
variations are due to other factors. Either the students are different
each year, which they are, rendering NCLB-mandated comparisons between
this year's eighth grade and next year's absurd, or the tests are shamelessly
invalid and unreliable. Either way, it makes no sense to rely on
those results to rate your school.

Schools typically
respond to low scores by conceding their disappointment and voicing clichés
about how standardized assessments provide only a piece of the picture.
We need to stop giving today's tests even that much credit.

Yes, some schools
have genuine problems. Whether or not you like the sound of the word,
some students are failing. And I need to be accountable for the job
I do in my classroom. But tossing kids' compositions down the stairs
wouldn't deserve much notice, analysis, or weight.

Neither do all
the charts and meaningless numbers for which we're paying so dearly in
time, money, and focus.

The
IP comments: Poor Elijah's concerns about the shortcomings of standardized
tests are well founded. However, The IP is of the opinion that standardized
tests do have some value. For all the reasons that Poor Elijah mentions,
they often don't measure what we would like to know about individual students
or teachers. And, in many cases they don't tell us much about individual
schools. However, they do have value for making more global comparisons.
In addition, many of the problems that Poor Elijah raises about the particular
test used in the Vermont schools seem to be ones that could be corrected
with a little effort.