Performance-Based Assessment Gains Prominent Place on Research Docket

Spurred in part by strong interest from educators and policymakers,
researchers have turned the study of alternatives to traditional
standardized tests into one of the hottest topics in their field, and
their work has begun to bear fruit.

Over the past few years, more and more states and districts have
added performance-based assessments and portfolios to their testing
repertoires amid criticism that such methods were untried and their
consequences uncertain.

But in recent months, a host of scientists--many assisted by the
federal government--has begun to develop an understanding of the new
instruments, researchers say. Although much of the work is preliminary,
the studies have found some promise--as well as some potential
pitfalls--in the alternative forms of assessment.

One much-anticipated study, slated to be released next week, is
expected to show both the benefits and drawbacks of the new methods. In
an analysis of Vermont's pioneering portfolio assessment, researchers
from the RAND Corporation are said to have found that the program has
improved instruction in writing and mathematics, but that the state may
not yet be able to use the data as a measure of student performance in
those subjects.

Performance assessment "is not a magic bullet that will solve all
the problems of schools,'' said Stephen P. Klein, a senior researcher
at RAND, who is not involved in the Vermont study. "But I wouldn't say
it's a false chase.''

Mr. Klein and others caution, however, that policymakers' interest
in the new tools continues to outpace the scholarship. And, they warn,
the movement may collapse without a firm research base.

"We're trying to fly the airplane and fix it at the same time,''
said Eva L. Baker, a co-director of the center for research on
evaluation, standards, and student testing at the University of
California at Los Angeles, which held a recent conference on the
topic.

"The real problem,'' she said, "is how to keep the thing alive long
enough so reasonable data can be obtained, and not have people jump to
conclusions that it's the greatest thing, or that it's been
oversold.''

Ideas From Research

Hailed as one of the most promising school-reform strategies,
performance-based and portfolio assessment has quickly risen to the top
of the educational agenda in the past few years.

In contrast to traditional tests, the alternatives measure students'
abilities to write essays, conduct science experiments, and construct
and solve math problems, rather than simply answer multiple-choice or
short-answer questions. Advocates of the assessments maintain that they
can improve instruction by encouraging teachers to focus on
higher-order skills, rather than memorization and rote drill.

At least 30 states have implemented such techniques as part of their
testing programs, according to a study by the Pittsburgh school
district, and others are developing them or are planning to do so. In
addition, the New Standards Project, a partnership involving 17 states
and six districts, has piloted a national examination system that is
completely performance-based.

But while policymakers have embraced the new instruments,
researchers have tried to raise caution flags. They have stressed that
performance-based assessments will cost considerably more than
conventional tests but have not proved themselves better methods of
measuring student performance. (See Education Week, Sept. 12,
1990.)

In 1990, in fact, Ms. Baker conducted an extensive search of the
literature on performance assessment and found almost no data.

Such findings are ironic, since the concept of measuring student
performance through alternative techniques grew out of research and
practice, not out of the policy community, Lorraine M. McDonnell, a
professor of political science at the University of California at Santa
Barbara, observed.

"It was not elected officials who dreamt up performance assessment
and portfolios,'' she said. "The ideas came from research and education
reform.''

Effects of Current Tests

Specifically, cognitive scientists seeking ways to change curricula
and instruction to enhance student learning embraced assessment reform
as a powerful lever, said John R. Frederiksen, the director of the
cognitive-sciences-research center in the Educational Testing Service's
San Francisco Bay-area office.

"A lot of people in cognitive science have come to the conclusion
that it is wonderful to work on pieces of cognition, but, at a certain
point, one believes enough in one's ideas that one wants to do
something to bring about change,'' he said. "There was a migration into
the notion that assessment is the key place to begin.''

At the same time, said George F. Madaus, the director of the center
for the study of testing, evaluation, and educational policy at Boston
College, researchers have amassed evidence of the harmful effects of
traditional tests.

In a recently released study funded by the National Science
Foundation, for example, Mr. Madaus found that the six major
standardized achievement tests, as well as the tests contained in
textbooks, drive instruction toward low-level knowledge and skills,
rather than the higher-order abilities advocated by reformers. (See
Education Week, Oct. 21, 1992.)

"We've got to get something better than what we've got,'' Mr. Madaus
said. "They certainly aren't measuring what the National Council of
Teachers of Mathematics is calling for, what the science community is
calling for.''

'In for the Long Haul'

Responding to such concerns, researchers have stepped up their
efforts to study the new forms of assessment.

Federal agencies have also pitched in to support such projects. The
U.C.L.A. research center on testing, which is sponsored by the U.S.
Education Department, has seen its budget increase by 2 1/2 times over
the past three years, according to Ms. Baker.

But one Education Department study of the topic was canceled last
month after funding for it dried up. The department had awarded a grant
to Pelavin Associates, a Washington-based firm, to study the effects of
new assessments on schools as part of an evaluation of school reforms.
But the department was forced to cancel the program when funding for it
was not included in the department's fiscal 1993 budget.

In addition to the Education Department projects, the N.S.F. last
year launched a $4 million project to fund studies on alternative
assessments in math and science.

Francis X. Sutman, a program director in the science foundation's
directorate for education and human resources, said the agency "is in
this for the long haul.''

"We are not going to be able to solve problems and give solutions
immediately,'' he said. "But there has to be a research base upon which
the movement can build.''

Close Touch With Teachers

In addition to the federally backed projects, the National Board for
Professional Teaching Standards has sponsored research on performance
assessment to be used in assessing teachers for its certificates.

And many states and districts that have put new assessments in place
have also conducted or sponsored evaluations and studies of them. For
example, researchers from the Pittsburgh public schools collected and
analyzed a wealth of data as part of the district's effort to evaluate
student writing on the basis of portfolios. (See Education Week, Aug.
5, 1992.)

Ms. Baker of U.C.L.A. noted that studies of performance assessment
have brought researchers in close touch with teachers and may help
close the traditional gap between research and practice.

"Because of the nature of the questions we are asking,'' she said,
"the research has to be embedded in practice--with teachers, rather
than exclusively using teachers as sites or subjects for us.''

"Years ago,'' she added, "education researchers used to work on
solutions to problems no one had. Now, problems are coming out of
classrooms, and people are ready for collaboration.''

Uses and Limitations

The work that has been conducted thus far has unearthed evidence
that alternative forms of assessment can be used to measure student and
teacher performance.

Studies conducted for the teaching-standards board, which have
examined the use of "video portfolios'' of teachers' work, have found
that such techniques "can be used reliably to score teacher performance
in a variety of settings,'' said Joan C. Baratz-Snowden, the board's
vice president for assessment development. "That's a major
breakthrough.''

But the RAND study of the Vermont portfolios is expected to show
that the problem of coming up with reliable scores has not been
completely solved.

And other studies have also uncovered limitations in the alternative
assessments.

Noreen Webb, a professor of educational psychology at the U.C.L.A.
graduate school of education, found that assessing students in groups
cannot be used to judge individual students' performance.
Performance-assessment advocates have cited as one advantage the fact
that such methods could, unlike traditional tests, be used to gauge
abilities to work in groups.

Working with 7th graders from a Los Angeles middle school, Ms. Webb
asked students to work in groups to solve certain math problems, such
as calculating the cost of a long-distance telephone call. Weeks later,
when she and her colleagues tested the students individually on the
same problems, she found that many students, who were able to solve the
problems as part of a group, could not do so on their own.

"Should we throw group assessment out the window? No, of course
not,'' Ms. Webb said. "But we can't use group-assessment scores to shed
light on what students could accomplish by themselves.''

How Many Tasks?

In a separate set of studies, Richard C. Shavelson, the dean of the
school of education at the University of California at Santa Barbara,
pointed up a potentially more serious issue: the number of tasks
required to gauge a student's performance.

Examining middle school students who conducted a series of science
experiments, such as determining which paper towels are most absorbent,
Mr. Shavelson found that each task measured only a fraction of
students' abilities in science. Over all, he found, the assessment
needed to include as many as 10 tasks--a formidable number, given the
amount of time each task takes--in order to gauge science
performance.

But Ms. Baker of U.C.L.A. said she and her colleagues may have come
up with a solution to that problem.

Studying performance assessments in history, such as asking students
to write essays demonstrating their understanding of the
Lincoln-Douglas debates, Ms. Baker found that student performance could
be evaluated in as few as three tasks by specifying at the outset the
skills and knowledge that are to be assessed.

The problem that Mr. Shavelson and others found, she said, arose
because teachers scoring the assessments came up with the criteria for
evaluating performance after the tasks were administered.

Ms. Baratz-Snowden said researchers at the teacher-standards board
have come up with similar findings.

"If video is used to find out if accomplished teaching is going on,
you need a lot of video,'' she said. "But if you want to find out if
teachers can do certain things--and you specify up front the things you
are looking for--you don't need as much.''

Comparisons Difficult

In addition to examining the designs of the assessments, researchers
have begun to learn about the problems in implementing them.

Daniel M. Koretz, a senior social scientist at the RAND Corporation,
said that the study of Vermont's portfolio assessment found that it is
time-consuming and costly to put in place. However, teachers and
principals said they considered the program a "worthwhile burden.''
(See Education Week, Sept. 9, 1992.)

Mr. Madaus of Boston College, who has studied the
performance-based-assessment system in Britain, said that country's
system has pointed up the difficulty of comparing the results of the
new assessments.

The British education secretary, he noted, has said standards are
declining because the number of students passing the examinations has
increased over the past four years. The only way to determine if he is
correct, Mr. Madaus said, would be to equate the different years'
exams, which is "a tough thing to do.''

"One issue keeps coming up,'' Mr. Madaus said. "How do you build a
bunch of tests, all purporting to measure the same things, and equate
them? That hasn't yet been solved.''

Questions of Equity

Other researchers have pointed out that implementing performance
assessments and portfolios also raises questions of fairness and
equity.

Mr. Shavelson of the University of California at Santa Barbara, for
example, noted that his studies have found that the assessments are
"extraordinarily curriculum-sensitive,'' which puts students from
disadvantaged areas, who have had less access to hands-on science
instruction, at a disadvantage.

"If you had not had access to the curriculum, you're not going to
look pretty good'' on the assessment, Mr. Shavelson said.

Similarly, said Dennie Palmer Wolf, a senior research associate at
the Harvard University graduate school of education, portfolios tend to
offer an advantage to students who are relatively fluent in
English.

"Until we break the stranglehold of language on portfolios and open
them up,'' said Ms. Wolf, who is studying the implementation of
portfolios in middle schools in four cities, "we will again have in
portfolios just a different sort of sorting method.''

In fact, said H.D. Hoover, the director of the Iowa Basic Skills
Testing Program, the literature on performance assessment suggests that
the gaps between advantaged and disadvantaged students are larger on
such measures than they are for multiple-choice tests.

"People say about standardized tests, 'If different groups perform
differently, there is cultural bias,' '' Mr. Hoover said. "If you buy
that argument, most performance assessments to date are more biased
than standardized tests.''

Student Enthusiasm

But despite such problems, researchers have also found that
implementing performance assessments and portfolios has paid dividends
for schools.

For one thing, Ms. Wolf said, the new methods, as advocates had
expected, have indeed tapped knowledge and skills that traditional
tests--and most schools--often miss.

"Students are capable of extraordinary work,'' she said. "Schools do
not require extraordinary work.''

And Daniel P. Resnick, the director of research for the New
Standards Project, also noted that teachers and students in that
project have demonstrated tremendous enthusiasm for the new
instruments. In addition to conducting technical studies of validity,
reliability, and fairness, Mr. Resnick said, researchers should conduct
case studies that show the "pleasure and fun'' that teachers and
students experience.

"Research has to look at the enthusiasm, as well as the hard side,''
he said.

Classroom Assessments

In addition to studying the use of alternative assessments in
large-scale testing programs, researchers have explored improving
teachers' use of the tools in their own classrooms.

In many ways, said Thomas A. Romberg, the director of the national
center for research in mathematical-sciences education at the
University of Wisconsin at Madison, the classroom assessments may be
more important than the large-scale programs.

Although advocates of alternative assessments at the national level
argue that changing assessment will change instruction, Mr. Romberg
said, "That's wishful thinking.''

"Real changes are largely dependent on teachers,'' he said. "If they
don't believe in and understand the direction we are going, the rest of
it will not happen.''

In his work, Mr. Romberg is developing ways to help teachers make
judgments about student performance.

"The argument we make,'' he said, "is that teachers have been
de-skilled to such an extent that they don't believe their judgment has
value.''

In another set of studies, Maryl Gearhart, a project director at the
center for the study of evaluation at U.C.L.A., is working to improve
teachers' use of portfolios to gauge student performance in writing and
math.

In early tests of writing portfolios, she noted, teachers generally
failed to provide enough guidance to enable students to make genuine
assessments of their own work. As a result, she said, students seldom
made substantive revisions, or held their work to high standards.

Mr. Sutman of the N.S.F. also argued that more work is needed to
ensure that alternative assessments in math and science test students'
knowledge of content in those subjects. In preliminary efforts, he
said, teachers have developed items that were "content-void or -weak,
or have content errors in them.''

No Longer Seen as 'Difficult'

While most researchers acknowledge that there is much work left to
do, the progress so far has convinced at least one former skeptic of
the value of alternative assessments.

"When I started here,'' said Ms. Baratz-Snowden of the
teacher-standards board, "I said, 'I can't see myself with a terrarium
rotting on a radiator while there is a lawsuit [from a candidate denied
a certificate based on a portfolio assessment].' That was my view of
portfolio assessment.''

"But I no longer see it as difficult,'' she continued. "I see it, in
fact, as a critical element in helping professional-development
activities, and in substantiating the standards. I think it's
terrific.''

Notice: We recently upgraded our comments. (Learn more here.) If you are logged in as a subscriber or registered user and already have a Display Name on edweek.org, you can post comments. If you do not already have a Display Name, please create one here.

Ground Rules for Posting
We encourage lively debate, but please be respectful of others. Profanity and personal attacks are prohibited. By commenting, you are agreeing to abide by our user agreement.
All comments are public.