Faculty Senate

Faculty Forum Papers

May 1978 - The Uses and Misuses of Student Evaluations
By

Charles F. Warnath
Psychology

April 26, 1978

Student evaluation of their college courses has a long tradition.
In the 40's the faculty of the university which I attended accepted them
as a routine part of each course. These evaluations were designed and
analyzed by students and published in a special issue of the school newspaper.
The main purpose of the evaluations were to tell students very specifically
what they might expect when they signed for a course and, secondarily, to
tell instructors how students reacted to specific aspects of the course.
The items were developed by students to answer the questions which students
have about a course: quantity and quality of the reading; tests and papers;
value of lectures or discussions; willingness of instructor to meet with
individual students; and the like.

Now, I am not trying to sell this format as the best of all possible
approaches to student evaluations but rather to contrast this format
with the system which seems to predominate at OSU in order to raise
some questions about what we are doing. The most basic difference
between the two systems is that the one described was student
sponsored and operated. It had no official sanction except that
faculty members gave up one of their class periods. Although it was
assumed that faculty and administrators could read the results as well
as the students, there was no implication that administrative decisions
would be affected by those results. Students evaluated aspects of the
course, including the instructor's participation, primarily with the
intent of telling other students what they might anticipate in the course
and not with the intent of "sending a message" to the administration.
By including items on the reading, papers, homework assignments, and
tests, students were made aware that the course was a total learning
experience and not restricted to aspects of the course in which the
instructor was personally involved.

Secondly, the purpose of the evaluation was clear: students were evaluating
courses for the benefit of their fellow students. Evaluations which are
supplied through an administrative office can carry mixed messages to
students, particularly when no effort is made to feed back to students
any of the information which has been collected. This is particularly
true where, as in the case of the official OSU evaluation, all items
relate to some function or characteristic of the instructor. Perhaps
it is a fine point, but I believe there is a difference in the set with
which student responds when the focus of the evaluation seems to be
entirely on the instructor, with that evaluation going to the administration,
and when the focus is on the course with the feedback going to the instructor
or to other students.

Third, the specificity and baseline of the questions encourage a
significant difference in the types of responses which students will
make to the items. In the evaluation described, the items were designed
to elicit responses to specific aspects of the course based on individual
quantitative or qualitative judgments, not on a scale requiring comparisons
with other courses or other instructors. With generalized items and an
unspecified baseline for responses, the probability exists that students
will respond with different item interpretations and react with different
comparison scales. To test this possibility, I gave the items appearing
on the OSU punch card to two of my classes and asked them to respond in
terms of what the items meant to them and how they made decisions about
their ratings. It was obvious from their responses that there was little
agreement on the meaning of the items and that their judgements covered a
wide range of expectations and individual experiences. For instance,
"Clarity of Presentation" was responded to by some students in terms of
whether the instructor speaks in a loud, clear voice.

So far as the baseline for responses is concerned, the OSU punch card implies
some sort of comparison but the student is left with the task of deciding
the criteria by which he/she will judge "average" and the extremes.
As might be expected, students differed in terms of responding from their
personal experience with the particular instructor, comparing the instructor
to others (whether all, others in the department, or those most clearly
remembered is not clear), or making judgements against some sort of "ideal
expectation." For example, on the "Concern for the Student" item, some
apparently rate on the basis of such specific factors as whether the
instructor remains in the classroom at the end of class period to talk
with students, while others rate on some general concept such as instructor
is a "human person." Ironically, a characteristic of the instructor which
results in a high rating on one item can result in a low rating on another
item. "Acting as an authority" apparently impresses some students with the
instructor's lack of concern for them while this same characteristic indicates
to the other students the instructor's mastery of the subject. The baseline
used by students for judging an instructor's "Availability" shows almost no
consistency. While some students rate on the basis of the number of posted
office hours, others rate on the basis of whether or not the instructor was
in his/her office when they personally wanted to talk to the instructor.

Fourth, where the student is encouraged to select a rating for every item by
the omission of a "No Opinion" or "No Basis for Judgment" choice, some students
are obviously making judgments on the basis of second-hand information or a
"halo effect" carry over from other items. This is likely to occur when a
student has not had personal experience which he/she generally uses as the
basis for rating a particular item. The clearest example would be that of
the hours posted but be rated below average by a student who has never made
an attempt to meet with the instructor. Being forced to make some rating,
the student may well use one item to reinforce his/her feelings, good or
bad, about some other aspect of the course. One of my colleagues teaches
a large lecture course for lower division students during the same term
that he teaches a small, upper division class. While the lower division
students give him low ratings on "Availability," his upper division students
give him very high ratings on the same item. It seems logical that he is
equally available to both groups except for immediate after-class contact.

Moreover, in classes where attendance is not mandatory and test items are not
keyed specifically to class presentations, some students may spend few hours
in class and, yet, in the OSU evaluation, they are required to make judgments
about the instructor which they can only reasonably through an on-going contact
with the instructor in the classroom.

As reported in recent issue (December, 1977) of Teaching Psychology, numerous
studies indicate a consistency of ratings for an instructor teaching the same
course at different times; however, the prediction of the ratings for an
instructor from one course to another can drop almost to zero. The point
which the authors of this research project were attempting to make was that,
because of the lack of consistency of ratings for instructors teaching
different courses, students could not predict the quality of course or
characteristics of instructor in another course. The results of this
research also indicate that only a fraction of the variance in the ratings
seems to be due to the characteristics of the instructor of the usually
identifiable factors within the course. I have a feeling that all of us
know that there are some among us who are "stars" and scintillate in all
their classes and there are some at the other end of the scale who mumble
along or confuse students in whatever class they appear, but, that most of
us have some good courses and we have some poor ones and the "goodness" or
"badness" of a particular course is only partially within the control of the
instructor. We like to think that every instructor could, with the proper
instruction, become the complete Mr. Chips, loved by all students, beautifully
organized in all class presentations and able to stimulate the most reluctant
student. But let's face it, Mr. Chips is a myth along with the idea that the
instructor is necessarily the cause of all the problems assigned to education
by politicians and students. As one faculty member has remarked, "Each of us
is the best instructor some students have had." Whether this is true or not,
I have serious doubts our impact as instructors can be neatly summarized in a
set of three-digit averages.

And this brings me to my final point. The effectiveness of a particular course
is not simply a matter of the instructor's "teaching well." What goes on in a
class is a complex transaction between the instructor and a number of individuals
with a variety of needs, expectations, and personal characteristics. The very
instructor style and class design which excite some students can (and do) turn
off other students completely. For those students who are passive listeners and
do only what is specifically outlined for them to do, the class which requires
their participation and initiative may be perceived as disorganized and the
instructor as not doing his/her job. This variation in response extends even
to the course details such as the source of reading materials. In classes
where I have no text, some students complain about having to spend time
withdrawing books from the Reserve Room while others are enthusiastic about
the choices they have in their reading.

From the comments made by student representatives on a committee to draw up
guidelines for student participation in administrative reviews of faculty as
well as comments I have heard from students in general, it would appear that
students assume that there exists a set of judgments about an instructor held
by all students and that the criteria for those judgments are the ones which
they personally apply.

The purpose of the above discussion is not to build a case for abolishing student
evaluations. That would be a futile gesture at this point since they have become
an integral part of consumer politics and a staple in the public relations "concern
for students" approach to potential students and their families. Student evaluation
of faculty are now too much identified with "accountability" to be eliminated.
Beyond that, I do feel that direct student input should be considered in administrative
decisions about faculty since the alternative is to rely on hearsay and the gripes of
disgruntled students who complain to chairmen, deans, and the president. Moreover,
instructors can learn from good student feedback about aspects of a course which would
improve the learning possibilities for some of the students.

My hope is that the points I have raised will sensitize faculty to some of the problems
involved in student evaluations and that poor evaluations with ambiguous items can result
only in misinformation. The purpose and goals of an evaluation must be absolutely clear
to the students; the items must be specific; and the baselines for making the projective
test which allows the student to respond from individual, often idiosyncratic, interpretations.
With ambiguous questions to which students respond with only their own personal baselines as a
guide, neither the instructor nor the administration can expect to receive helpful information.
Since everyone seems to be taking the results of our present evaluation forms seriously, it would
seem that faculty should become more concerned about the instruments which generate the
information which administrators are using to make judgements related to salary, tenure
and promotion.