grades: inflation, compression, systematic inequalities

I chair UNC’s Educational Policy Committee, and we are in the process of seeking some new policy initiatives to address grading. A reporter for the Daily Tar Heel asked me a while ago why I am such a grading “hawk”, meaning that I worry about grading problems (more on the identification of these problems below the break). The reason for his question is that I am a relatively humanities-oriented scholar in a department and discipline not exactly known for rigorous grading policies. Below the break I’ll discuss what I see as the problems, possible solutions, UNC’s current status with regard to these solutions, and why I care so much about them. Warning: this is a long and somewhat rambling post.

First, some definitions. I will refer to grade inflation as the general rise in grades over time, even though this leaves open the question of whether the “product” (quality of student work) is getting better, in which case the inflation metaphor is inexact at best. Grade compression, on the other hand, is a result of the combination of grade inflation raising the floor and the nature of the grading scale including a ceiling, such that all variation in student achievement must be represented in a narrowing window (since we don’t, as of yet, have AA, AAA, etc. grades). Finally, systematic grading inequality refers to the fact that grades received by students vary in far-from-random fashion across disciplines and (crucially) instructors within disciplines. Each of these is a distinct problem, even though they’re of course closely interrelated. Each raises a different set of thorny issues. And each is amenable to a different set of policy interventions, though again, there is much overlap.

What are the harms?

Lots of people agree that grades are rising, are unequal across departments and instructors, and are becoming more compressed, but ask who is harmed. After all, just about everyone is getting higher grades; professors are getting fewer complaints; graduate instructors and TAs have less grading hassles to worry about. It’s a win-win solution, right?

Wrong. Here’s a list of reasons to care about this set of problems:

Accuracy. We have a responsibility to provide accurate information to
students and the public we serve

Reduction in value. An A at Carolina (or wherever) means less if it is the most common grade

Incentives. It is very difficult to recognize and reward outstanding
performance

Perverse incentives. Inappropriate incentives to select some courses or majors

Unfairness. Cross-department comparisons and rankings are invalid and
unfair.

The pervasive use of GPA as a mode of cross-class comparison and student ranking is a case of reactivity: students have altered behavior, in some cases radically, in the service of gaining or maintaining a GPA that reflects their aspirations. That means, ceteris paribus, that students are selecting into courses and potentially even majors that are not the best intellectual homes for them because of a statistical artifact.

At UNC, since 1995 the proportion of students eligible for the Dean’s List has climbed from 25% to 40%. The likelihood is that it will climb further soon, except that my committee has been asked to review the situation. We will be approaching the Faculty Council with the problem and a set of proposals for handling the Dean’s List in specific, but this is clearly a symptom that will not really be fixed unless and until we handle grades in general. Consider, for example, if we were to set 25% as a goal and just set GPA targets based on that goal (one of the proposals). This serves to magnify inequality based on instructor and department and, in turn, increase perverse incentives on students and faculty alike, to the extent that Dean’s List is a motivating distinction.

What to do?

But I digress. Here’s a (fairly) comprehensive list of possible policy responses to the problems:

Separate evaluation of student performance from teaching (e.g., Swarthmore’s honors program). Professors concentrate on teaching; a distinct mechanism is devised to assess achievement. If done right, this is an intellectually attractive but financially disastrous plan. If done wrong, it is financially feasible but intellectually disastrous (think No Child Left Behind at the college level).

Limit instructor and/or department grades through rationing (e.g., Princeton) or defined averaging (e.g., Wellesley). Recognize that grades are in some sense zero-sum, so treat them as scarce goods on the “front end.” This is appealing in that it aims actually to fix the ontological assessment problem instead of simply fiddling around. But it has serious implications vis-a-vis intellectual freedom and teaching philosophy that ought to be a major concern philosophically. Here at UNC, it was clear at my presentation this fall that this class of options was unacceptable among the Faculty Council.

Report context information for each grade on the transcript (e.g., Indiana, Cornell). Thus prospective employers, graduate schools, etc., could evaluate the relative difficulty involved in achieving each grade on the transcript. However, aggregate comparisons among students would remain invalid.

Provide an adjusted measure of accomplishment for use in reporting relative performance and rankings. This is what we tried to do a few years ago with the Achievement Index, which still leaves a bitter taste in the mouths of some around UNC. Ironically, though, it was probably the most popular option during the Faculty Council discussion in which I outlined these options. Essentially this would involve pulling in information about the relative difficulty of achieving a student’s grade mix to develop a valid aggregate comparison of student achievement for use in Dean’s List, Distinction, and other comparative tasks. (GPA would probably remain alongside it for comparability, even though it is essentially meaningless as a measure of overall performance.)

Prohibit comparison of students’ accomplishment across departments and instructors. Since we know unequivocally that grades without any adjustment are an invalid way of comparing students’ achievement, the University could choose to acknowledge this by simply saying we have no way of evaluating our students’ overall achievement. While of course we cannot keep third-party consumers from doing what they want with reported grades, we could abolish Dean’s List, Distinction, etc.; prohibit all University units from using grades to compare student achievement for fellowships, honors, etc.; and prohibit the Registrar from reporting any aggregate information such as GPA.

Ongoing, University-wide discussion and deliberation process on grading meaning and philosophy (e.g., Seton Hall). Hey, everybody likes the idea of ongoing deliberation. But we would need to do much, much more to establish a kind of Durkheimian moral order in which the grade is a sacred instrument not to be messed with.

Watch and wait (do nothing now). Always a popular one. The big question: if not now, when? What would constitute a crisis sufficient to warrant intervention if the situation we now find ourselves in isn’t it?

An additional concern is that this is a national trend, not a single university’s problem. Furthermore, our graduates will find themselves in competition for graduate school admissions and jobs with graduates of other universities who are not, or may not be, changing practices. To mitigate this concern, the Faculty Council has asked us to build bridges with faculty senates at other universities interested in addressing these issues. While I agree in principle that this is a good idea, I am less concerned with students’ prospects because I think UNC making any bold move on this issue will make enough headlines that grade “consumers” will know what we’ve done.

UNC’s Status and Next Steps

At the January meeting of the Faculty Council, we have been asked to present the Dean’s List problem and possible solutions. The options we plan to present will be:

Each spring, set a GPA “bar” for the following academic year targeted at making roughly 25% of students eligible for Dean’s List.

Reduce the bar to a 2.0 GPA and publicize that fact, thereby emphasizing that the Dean’s List is meaningless; such a low GPA bar will also reduce the perverse incentives to instructors and students.

Abolish the Dean’s List altogether.

In April we have been asked to present a comprehensive policy proposal on grading to Faculty Council. That will be the real “zinger.” I hope we can present a proposal for some sort of valid comparison statistic that will begin to reduce the inequality problems at least!

Why do I care so much?

Sociology is too often seen as an “Easy A” discipline; I’ve had far too many students tell me to my face that they’re taking my class in order to get an easy A. Two students asked me this semester if I would sign special forms to allow them to drop my class after the final exam because they didn’t like the grade they expected to get. I feel that the current grading regime is bad for the intellectual quality of the education we can offer our students. It’s bad for the “good” students because they can’t be rewarded for outstanding work; it’s bad for “bad” students because they aren’t motivated to do better. While this comes across as being a conservative hard-ass, and I’m the first to admit that I hate having to talk to students who are disappointed by their grades, I think it’s our responsibility as serious scholars and educators to maintain and increase the quality of our education–which includes seriously, carefully, and judiciously evaluating students’ real achievement in our classes.

I think you are right that this is a real problem that really should be addressed. Personally, I prefer the rationing solution. I think it is great you are taking this on. I hope your university appreciates it.
Here is another idea: Teaching evaluations for instructors could be reported in as both raw numbers and grade-assigned-adjusted numbers. At most institutions, one reason grades are high is that there are disincentives and no incentives to be a hard grader. Hard grading can lower course evaluations at least a bit, but there is no downside to giving everyone grades of “A”. It would be easy to report a second set of numeric course evaluations for faculty–perhaps only for internal use–that are based on adjusting these evaluations statistically for the distribution of grades given in the course. This would reduce the problem that faculty who grade hard are penalized in their course evaluations. Have any institutions done anything like this?

I agree that evaluations are a big problem, I wish there were some way of evaluating the class based on what students actually learned. Seems like much of the effort professors put into making sure the content is relevant is invisible to students. Maybe separating testing from teaching would be an idea, like they do in Britain, as I understand

Good for you for working on this. I think something along the lines of the achievement index is potentially even better than rationing. I also think that whenever a transcript is requested, extra details about the index, and about the grade distribution in different colleges might be included.

I was just asked to select students for a very prestigious fellowship. And what was VERY helpful in the process was that my college puts the % As next to each grade (an A or A-). At first I thought this was for the class in general. But then I noted that the same class had different percentages, and this made me realize that it was in the student’s class itself. Two things surprised me:

1.) The high percentage of A’s given out. By the time you reach junior and senior year at my college, more than 50% of the grades are A or A- in most classes. Also notable were the recommendations from one particularly department, which shall remain nameless. In several of the recommendations grades were mentioned, in particular, “How challenging it is to have a 4.0 in our major — one known for not giving out A’s.” I found that in many of the upper level classes, upwards of 80% of the grades in this department were A’s. In fact, this department seemed to have some of the highest grades. But many letters boasted of how tough it was. Funny. (I don’t think this is disingenuous, but instead a myth they believe about themselves).

2.) The enormous variability. Even within courses. Some “core” classes at Columbia had about 25% As. Others were up around 60-70%.

I think a “weighted GPA” would be useful. This might punish students for taking easy classes. But it would also deter faculty from being too easy (their courses would count for less, students would slow their enrollment).

First, let me say that I had considered myself an easy grader because most of my grades are Bs and ABs (where we have AB and BC grades between the letter grades) and grades below B are rare, even though I try to hold As to 10-20% of the class, until I found out what some other people were doing. Our university actually puts grade distributions for all classes on the web, but they are “hidden” in a place where you would never think to look and are available only in one giant 700-page PDF document. Our department is wildly variable in its grade distributions from class to class. So I think this matters.

But there are some factors being left out of the discussion. (1) Late drop dates (our deadline is week 9) mean that the low end of the distribution exits the course. Depending on the student and the course, some students drop for any grade lower than an A. This implies two things. First, a lot of the “inflation” in grades over time is, in fact, due to the ability to drop courses; I believe someone published an analysis of this, possibly someone I know who is going to be mad at me for forgetting the citation. “Curving” the final grade distribution to some artificial standard without putting the drops into the bottom of the curve puts an unfair and impossible burden on the students who don’t drop. Similarly, any attempt to calculate standardized grades has to factor drops into them. (2) Anyone who has taught a lot knows that there is huge sampling variation from term to term in the performance level of students in a class, especially when section sizes are small. Expecting professors to generate standard distributions across multiple semesters and multiple sections is one thing; imposing such distributions on a section by section basis is statistically insane and grossly unfair. (3) My students’ performance is better when I teach better. I’m measuring myself, not just them. I know students wrongly want to put all of the blame on me, but they are not wrong that some of it belongs to me. When I teach better and they perform better, shouldn’t they get a better grade, if the grade indicates mastery? I have always preferred to grade to absolute standards of quality, not artificial distributions. In courses I’ve taught repeatedly, I believe I can do this.

My impression is that part of grade inflation, at least in my classes, is that the students are actually performing at a higher level relative to ten years ago. So the same absolute standards produce higher grades. This whole cranky “today’s youth are illiterate” stuff doesn’t actually fit the data, you know. Somehow grades need to be a combination of relative and absolute standards. Not easy stuff.

FYI I’ve seen a lot of Canadian transcripts that give the grade distributions of classes on the transcripts. Useful.