An example of a performance evaluation form used by students to rate their courses and instructors.

In my prior post, Gender, Professor Evaluations, and Key Words Part 1, I described an interesting website that can be used to show how male and female professors differ in their professor evaluations by any key word across two dozen disciplines. That post ended with this cliffhanger question:Can we interpret all this information to tell us that male professors are more boring, but female professors are more frequently incompetent and beautiful?

When we attempt to understand differences along a key dimension, (such as difference in whether a teacher is rated as "boring" by gender), we need to consider whether the relationship we are observing is capturing the "whole story," or whether there is some other confounding factor that could explain what we are observing. In this example, in order to determine whether students rate professors differently whether they are male or female, we need to make sure we are comparing identical professors along all other dimensions. Among economists and statisticians, drawing a conclusion about a relationship between two factors when ignoring a key alternative factor is known as omitted variable bias.

It won't surprise you that a range of factors could go into a student's subjective professor evaluations. Just off the top of my head, I came up with a few examples:

How difficult was the class and subject material?

Did the student get an A in the class?

How much homework did the student receive?

Was the class at a time that college students might not like (such as the 8 AM Friday class?)

How accessible was the professor?

Was the class an introductory course, a course required for a major, or an elective class?

Now, the fact that a number of factors could ultimately determine student rankings is still not enough to draw a conclusion. The important question is whether any of these "other factors" vary systematically by gender. Take an extreme example: Assume in the Math department, female professors were always assigned to teach the much harder required Calculus class, whereas male professors were assigned to teach the very popular statistics elective. If we observed significantly more negative ratings for female professors, it could be simply driven by the fact that female professors were disproportionately teaching the difficult, less popular class that students were more prone to rank negatively in end of course evaluations.

This simple example is not to imply that the differences may not be due to the fact that students rank professors differently depending on gender--but we surely can't tell that from a statistical perspective based on these simple observed differences.

And, as an aside, for an interesting view on how effective student evaluations are in assessing professor performance, I point you to this blogfrom Berkeley Statistics Professor Philip Stark.