Usefulness of Annual Testing Varies by State

President Bush has made annual testing of 3rd through 8th graders a
centerpiece of his education agenda, saying such reading and math tests
provide a "vital diagnostic tool for schools." But in states that
already engage in annual testing in those grades, educators are divided
over its usefulness.

"It's very powerful," said Rosaena Garza, the director of academics
for the 40,000-student district in Corpus Christi, Texas, referring to
her state's assessment system. "A principal has a very good picture of
his or her campus. If you're seeing a pattern in a particular teacher's
classroom, then you can begin to look at what's happening in that
classroom."

But Linda L. Clark, the director of instruction for the 25,000-
student Joint School District No. 2 in Meridian, Idaho, finds that the
test her state gives, the Iowa Tests of Basic Skills, is useful mainly
as an instrument for "ranking and sorting students."

"Our question of the state board was, 'How many times during a
student's career do they need to be ranked and sorted?' " Ms. Clark
said.

The different attitudes can be traced in large measure to the types
of tests states give, the details they provide schools about test
results, and the timeliness of the information. Mr. Bush has modeled
his proposal on the Texas system, which provides schools with detailed,
grade- by-grade information about how students perform according to the
state's standards. Yet few state assessment programs resemble that of
former Gov. Bush's Texas.

The Texas Assessment of Academic Skills is a "criterion referenced"
test, meaning it was designed specifically to measure achievement
against the state's standards in reading and mathematics in grades 3-8
and at the high school level. It also measures achievement in writing,
science, and social studies in some of those grades.

Texas schools receive information on whether each grade level and
each student have mastered a given objective, such as "summarization,"
including the number of test items answered correctly. The state holds
schools accountable for achieving minimum passing rates for all
students, poor and minority students included.

"You can look and see how the kids did before they went into a
teacher's classroom and how they did when they left, and it's all
broken down by objectives," said Deborah Scates, the principal of the
500-student Paul R. Haas Middle School in Corpus Christi.

"It points out strengths, too," she continued. "Last year, I had a
couple of 7th grade teachers, who I discovered are wonderful teaching
algebraic functions, which is a very difficult concept. That helps me
know where I need to place teachers in the school, so they can teach to
their strengths."

An Imperfect Match

Many educators elsewhere say that while their states' test results
are useful up to a point, they don't provide enough information to
guide instruction.

In Tennessee, for example, students in grades 3-8 take the
multiple-choice component of the TerraNova-2nd Edition, created by
commercial test-maker CTB/McGraw-Hill. But Earl H. Wiman, the principal
of the 522-student Alexander Elementary School in Jackson, said schools
typically get the data too late to be useful. "We took the test last
April.We got information in October. "

In addition, Mr. Wiman contends, the test does not reflect the
state's academic standards closely enough to help focus
instruction.

"If we're going to hold schools accountable, we need to very clearly
identify for teachers and schools what needs to be taught, and we need
to very clearly identify for teachers how that's going to be tested,"
he said. 'That link is just not there in so many different testing
programs."

TerraNova is a norm-referenced test, meaning it was designed
primarily to compare the performance of students with that of their
peers nationally, and not to measure how they perform based on a
state's own standards. States and districts can pay more to help
customize the exam.

California currently rates schools based chiefly on the Stanford
Achievement Test-9th Edition, produced by Harcourt Educational
Measurement, another norm-referenced test given every year in grades
2-11.

"If you take the standards that we have in this state and you align
them with that test, you're not going to get a perfect alignment," said
Charles G. Jackson, who directs instructional-support services for
District G, a 62,000-student subdistrict of the Los Angeles Unified
School District. "So even the specific information that we get back may
not match what the efforts need to be at the school level."

Ms. Clark's Idaho district has received a waiver to administer the
state's norm-referenced test less often because the district believes
its own testing program is more useful. Although the state administers
the ITBS— produced by Riverside Publishing Company—in
grades 3-8, the district administers the test only in grades 3 and
7.

"What we're interested in having is a comprehensive assessment
program, based on multiple measures, that measures a student's growth
toward the standards," Ms. Clark said, "and in that kind of a system,
there's a lot of power, in terms of being able to plan
instruction."

As the backbone of its testing system, Ms. Clark's district uses the
Achievement Level Tests produced by the Portland, Ore.-based Northwest
Evaluation Association, a nonprofit group. The district can create
tests that reflect its standards from a bank of more than 15,000 test
items and use the results to measure the growth of individual students
over time.

Homing In on Data

One of the primary arguments for testing each child each year is to
be able to track year-to-year growth. Some contend that such
information provides a fairer way to judge schools, based on how much
schools "add value" to a student's knowledge and skills.

Tennessee, for example, uses the results from the TerraNova to focus
on the gains students make over time. William L. Sanders, a research
fellow at the University of North Carolina, who helped develop the
approach, is a strong advocate of so-called value-added testing. If
states test in only a few grades, he argued, it's hard to pinpoint
where problems lie.

"In terms of thinking about getting positive diagnostic information,
I see no way to do it without having at least annual testing," said Mr.
Sanders, who also directs the value-added assessment and research
center at the Cary, N.C.-based SAS Institute, a for-profit software
company.

The 50,000-student Minneapolis school district also uses a
value-added approach. "We like the notion of annual testing," said
David Heistad, the director of research and evaluation for the
district. "We can home in on which grade levels at which schools are
producing the greatest gains and beating the odds, and we can also find
relative weaknesses within a school. If you test too infrequently, you
can never localize the information to a particular grade level or
classroom."

But, he added, "the quality of the assessment is the key for me. We
would continue to report out our local measures, even if the state went
to annual testing, simply because ours are so clearly aligned with our
standards."

'Too Much Hullabaloo'

Mr. Sanders maintains that if a test is highly—but not
perfectly—correlated with a state's curriculum objectives,
measures the progress of students at both the high end and the low end
of the performance scale, and is reliable, it doesn't matter whether
it's norm-referenced or criterion-referenced. One criticism of the
Texas tests is that they do not measure the progress of students at the
high end of the scale.

"There is too much hullabaloo made over distinctions among tests,"
Mr. Sanders said. "When you're using these things as indicators of
student progress over time, then a lot of these distinctions blur."

Florida is one of the few states that, beginning this year, will
give students in grades 3-8 both a norm-referenced test in reading and
math and a criterion-referenced test designed to measure its own
standards.

The results of the latter will be used to gauge whether students are
mastering a year's worth of learning in a year's worth of time, said
JoAnn Carrin, a spokeswoman for the Florida education department.
What's more, she said, the tests will be tied to the state's standards,
"which is the real important part of the whole program."

Mario J. Crocetti, the principal of the 1,500-student Wellington
Landing Community Middle School in Palm Beach County, Fla., said, until
now, the usefulness of the state's testing system has been limited
because "all we've been able to do is compare this year's 8th grade
class to last year's 8th grade class. That's been a real stumbling
block."

Back to the Future?

Until Congress rewrote the law, the federal Title I program for
disadvantaged students (known at the time as Chapter 1) required
districts to assess the achievement of students in the program annually
in grades 2-12, using a national norm-referenced test. But in 1994, the
law was rewritten to require states and districts to use tests that
actually reflected a state's standards and that were the same tests
used to measure the performance of other students in the state. The
tests must be given at least once annually in grades 3 through 5, 6
through 9, and 10 through 12.

In part, Congress was reacting to a 1993 evaluation of the program
that found norm-referenced tests provided objective, reliable
information for relatively little time and money, but were having some
undesirable consequences in schools.

In particular, noted the "Final Report of the National Assessment of
the Chapter 1 Program," because the tests were designed to be
independent of the local curriculum, they could not give teachers much
help in pinpointing the parts of the curriculum in which a student
needed more work. In addition, the report found, the reliance on
multiple-choice questions was encouraging teachers to spend too much
time on test-taking skills or low-level basic skills instead of on more
challenging academic content.

The report also raised concerns, at a time when states were
instituting challenging academic standards for all students, that the
Chapter 1 program was encouraging children in high- poverty schools to
be held to lower expectations.

It wasn't until this year that the U.S. Department of Education
began holding states responsible for having in place new testing
programs that meet the Title I requirements.

A Bush adviser said last week that, ideally, the administration
wants states to use standards-based exams, but that they could use an
off-the-shelf, norm-referenced test if they chose. The same adviser
said the administration could propose as much as $100 million to help
states write, and perhaps administer, such tests the first few
years.

Two central concerns are the costs of creating new exams and the
capacity of the testing industry to handle the demand. In the past five
years, state testing expenditures have almost tripled, from $141
million to $390 million, according to Achieve, a Cambridge, Mass.-based
nonprofit group that promotes standards- based initiatives. One study
estimated the average cost of multiple-choice tests at $17.50 per
student in 1998 dollars, while tests with any "performance" section,
such as essays or short-answer questions, averaged about $28 per
exam.

Many states test in only a few grades because of the cost and time
entailed in more extensive testing, said Brian Gong, the associate
director of the National Center for the Improvement of Educational
Assessment, a nonprofit organization in Dover, N.H. They also feel that
less frequent testing is adequate to hold schools accountable for
whether students are meeting standards.

"Development is a small part of the investment," Mr. Gong added.
"It's the every-year operational cost that is driving most of these
decisions, and testing time. If you test in a number of subjects every
year, then what you end up having is machine-scoreable, multiple-choice
tests, and a lot of states say, philosophically, we don't think that
will influence instruction the way we want."

To lessen costs, Mr. Sanders, the expert on value-added testing,
suggested that states with similar standards form "buying cooperatives"
that would enable them to pool their resources for the construction and
maintenance of testing programs.

But he acknowledged that, in the short term, the three big testing
companies, which handle the bulk of state
contracts—CTB/McGraw-Hill, Riverside Publishing Co., and Harcourt
Educational Measurement—will have trouble keeping up with demand
if President Bush's proposal is enacted.

"There are definitely snafus because they're running at more than
full capacity," Mr. Sanders said. "That problem will get exacerbated
for a while, but I think it's like the utility problem on the West
Coast. At some point in time, you'll have more power generation, and at
some point in time, you will have more capacity to do this."

The real problem, he and others say, is that even with the best
assessments, too few schools use the test results for diagnostic
purposes or have staff members who are trained to do so. "If I knew how
to accelerate people learning to use the data in positive ways, I would
do it," Mr. Sanders said. "That's the frustrating part."

Meanwhile, even educators who make good use of assessment data worry
about more testing. "That's the biggest complaint I hear from
teachers," said Laurie B. Abeel, the assessment coordinator for the
9,600-student Fauquier County schools in Virginia. "I'm testing all the
time, and when do I teach?"

Ground Rules for Posting
We encourage lively debate, but please be respectful of others. Profanity and personal attacks are prohibited. By commenting, you are agreeing to abide by our user agreement.
All comments are public.