To which gender’s disadvantage are school grades biased – girls or boys?

School grades do not only provide students with feedback on their current performance, they also determine the chances of admission to universities and the success of job applications. The question of whether teachers evaluate irrespective of
gender is therefore very important. Traditionally, the issue of
gender fairness has mostly focused on
discrimination against girls. In recent decades, however, there has been much media debate about whether the school system also discriminates against boys. This article summarizes the current state of research on whether the same school performance by girls and boys is graded differently.

The battle for equal rights and equal treatment of women and men, and girls and boys, has traditionally been characterized by efforts to reduce
discrimination against women and girls. Today, women still earn less on average and are less likely than men to hold a position of
power in society (1,2). In line with traditional
gender roles, men are considered to be more competent, whereas women are regarded as warmer in the social sense (3,4). If teachers hold such
stereotypes (i.e., general assumptions about groups), these might inadvertently guide their
perception and lead to systematically worse grades for the supposedly less competent girls.

Recently, however, newspapers have published articles (e.g., 5) suggesting that our current educational system has in fact disadvantaged boys rather than girls. A large international
meta-analysis (a summary of numerous studies) confirms that girls receive slightly better grades on average (6). Besides assuming that teachers grade with a general bias against girls or boys, there are reasons to assume that who is favored depends on the subject and more specific
stereotypes about skills. High skills are attributed to boys particularly in mathematics and science (7). Reading
competence and other language skills, by contrast, are more likely to be expected in girls (8). If such
domain specific
stereotypes affect teachers’ perceptions, this could result in boys being graded worse for the same performance in verbal domains and girls being graded worse for the same performance in mathematical domains.

In this article, I address the question of whether girls and boys receive different grades for identical performance. Whether boys and girls differ in their performance levels or possibly underlying abilities is a different research question. Research however suggests there are small
gender differences in abilities and instead boys and girls are more similar than they are different (9–11). The question to be answered here is: Are there scientific findings as to whether the same school performance is evaluated differently based on the
gender of the student, and if so, are girls or boys, or both, disadvantaged by this?

What suggests that girls are being graded worse, and in what subjects?

Several similar experiments directly investigated whether girls and boys received different grades for the identical performance. In these studies teachers were always asked to correct one or more texts written by alleged students (12–14, and 15 with a math test instead of a text). The same mathematics or science-related texts were presented to half of the teachers with a male name and to the other half with a female name. The evaluations were then compared. Since the texts were of the same average performance level and only differed in the
gender of the authors, systematic differences in the evaluation can only be attributed to
genderstereotypes. Most of these studies (except 15) showed that the work of girls received lower grades than that of boys, especially if the teachers were not yet very experienced (12).

Similar studies have examined written texts in non-educational contexts. They, too, found that identical essays by female authors in areas regarded as typically male or as neutral were rated worse than those by male authors (16). However, it seems legitimate to question whether these findings are still relevant today. After all, most of these studies are several decades old. The study by Hofer (12) is the only experimental study suggesting worse grading of girls for the same performance as late as 2015. For other detrimental effects of
genderstereotypes, e.g., on girls’ and women’s feelings in math and science, there is more and clearer recent evidence (e.g., 17,18)

It is possible the performance of boys is graded worse in stereotypical female domains (e.g., in the verbal
domain); however, there is a lack of experimental studies on this question. Experiments have the advantage that they can directly show the causal effect of
gender while carefully excluding other influences on the grade. However, they mostly concern specific and smaller samples of individual evaluations. The evidence on bias against boys mostly stems from another type of study, which uses actual student data from representative data sets. There, the grades given by teachers at school are linked to standardized (and thus more objective) school performance tests.

Since such performance tests are evaluated in a standardized way irrespective of
gender (e.g., based on instructions that clearly define how many points are given for each answer); they can be used to check whether students of the same objective performance level receive the same grades from teachers. Although it is not possible to directly test the causal effect of
gender, indirect conclusions about the role of students’
gender for grades – beyond other important impact factors – can be drawn.

Typically, studies of this type imply that the greater success of girls in school (6) is not consistently reflected in a comparably greater success in standardized performance tests such as PISA (i.e., Program for International Student Assessment, 19). Analyses of several large datasets from various western countries show that girls received better grades than boys at the same performance level in a standardized test in mathematics and, in some cases, in other subjects as well (e.g., 20,21). This seems to imply that boys are being disadvantaged when they are evaluated by their teacher instead of a standardized test. At first glance, this possible grading bias against boys is surprising, since mathematics is regarded as a stereotypically male
domain. The differences in grading, however, appear to be explained mainly by the finding that teachers also include behavior (e.g., participation in class, learning strategies) in their grading, and that they perceive girls – in line with
genderstereotypes - as more industrious, more motivated and better-behaved (20).

The ambivalent bonus for girls’ classroom behavior

The argument that girls are graded more favorably does not hold up to a closer look though: A new analysis of the data set from Cornwell et al. (20) confirmed that teachers include classroom behavior in the evaluation of performance in mathematics. As they perceive girls as better-behaved, they give them better grades compared to boys at the same objective level of performance. However, if girls and boys with the same standardized test performance and the same behavior were compared, the girls received worse math grades than the boys (22,23). In addition, teachers considered girls to be less mathematically competent, despite the relatively good grades they gave them (22). The less
competence the teachers attributed to the girls, the more the girls’ performance fell behind that of boys. The authors argue that the achievements of girls are often attributed on their behavior, but not on their abilities (see also 24). In the long run, this often subtle bias may lead girls to have less confidence in themselves (22).

In the
domain of verbal
competence, which is regarded as stereotypically female, girls receive better grades than boys at the same level of performance; however, the grades do not differ if boys’ and girls’ behavior is perceived as similar by the teachers (23). In this context, it could be that the
perception of learning behavior is also influenced by
stereotypes: The stereotypic expectation of lazy boys and industrious girls could lead to perceiving boys as lazier than they really are(25). In any case, there is reason to believe that subject-related or behavioral
stereotypes of teachers can be harmful to both boys and girls. For example, gendered role expectations of teachers can affect boys' self-assessment in reading and affect their later performance (26).

Judging by different standards, with best intentions?

Stereotypical expectations, for that matter, may not always function as a pre-judgement, but sometimes also as a reference standard (27). For example, the ability of a girl with a mediocre math performance or that of a boy with a mediocre reading performance could be assessed as "good". This happens when the evaluators compare the individual’s performance to a within-category standards, where girls are compared to the performance level expected from girls and boys are compared to the performance level expected from boys.

In one experiment on this shifting-standards effect in the school context, future teachers were asked to estimate how well a fictitious student had scored on a test (28). They only knew about the student's
gender and that he or she had been placed at an intermediate
competence level based on a prior test (e.g., using criteria such as "can usually identify missing numbers in rows"). Participating teachers were then asked to estimate how the student had scored on the test. If the test was described as a standardized performance test, girls were assumed to have achieved a lower score than boys. Teachers therefore assumed retrospectively that the same existing
competence placements of girls and boys reflected different objective performance assessments. If, on the other hand, the placement test consisted of a more subjective collection of school works (a learning portfolio), the teachers estimated the assessed performance of girls and boys as performing equally well. The assumed
gender differences in the objective test thus disappeared in the more discretionary subjective test. Pre-service teachers therefore assume that discretionary leeway would be used in a test to evaluate girls in math by a more generous reference standard than boys.

It is reasonable to assume that teachers want to evaluate fairly and are mostly motivated to reduce
gender inequality. If there is room for discretion, they could – consciously or unconsciously – use lower assessment standards for the supposedly less able
gender. This interpretation may explain why girls' proficiency in mathematics is estimated as lower, but they get better grades than boys anyway (22). By relying partially on subjective perceptions of classroom behavior when grading, teachers could compensate for supposed inequalities within the resulting wider margin of discretion. For example, suppose a teacher considers girls to be less competent in mathematics but perceives them as more motivated and hardworking in class. If the teacher now has to grade a test without knowledge about the classroom behavior, they might unconsciously rate the girl's test worse than the boy's (as found in experimental studies). When assigning grades at school, the teacher subjectively weighs performance and behavior. This could result in better grades for girls than boys with the same objective performance (as found in the large data set analyses). This explanation, however, is speculative until confirmed by further studies.

Conclusion

Overall, the results indicate that girls still are considered less competent in mathematics than boys by their teachers, which sometimes leads to unfairly negative and sometimes to unfairly positive grades for them. They are evaluated better than boys if their classroom behavior is taken into account, which of course initially harms the boys. If, however, girls and boys with the same behavior are compared, the same performance of girls in mathematics and science is evaluated worse than that of boys, which in the long-term harms girls and exacerbates
gender differences.

In stereotypically female subjects such as the verbal
domain, boys get worse grades than girls with the same performance. More studies are needed to examine the role of stereotypical perceptions of ability as well as behavior in this
gender bias against boys. Overall, the
stereotype that girls are more industrious and more motivated in school may be problematic for both genders. In addition to the hidden costs for girls described, it can raise insecurities in boys and limit their ability to perform (Hartley & Sutton, 2013). There is also evidence that teachers’ stereotypical expectations reduce boys’ later reading motivation (Wolter et al., 2015). All biases –whether they disadvantage girls or boys– are based on classic
genderstereotypes about the abilities and behaviors.

How can biased evaluation be prevented?

A general conclusion from all studies is that
gender-biased evaluations occur when there is a considerable margin for discretion. This can be the case if, when evaluating an essay, it is not clearly defined which criteria are to be evaluated and how much they are weighted. It can also stem from the unsystematic consideration of aspects beyond the actual achievement, for example learning behavior. If, on the other hand, there are precisely defined and clearly weighted evaluation criteria from the very beginning, there is little room for any potential source of bias, one of which is
gender.

Another way could be the evaluation without knowledge of the
gender of the evaluated person. The practicality of this approach may, however, be limited in the classroom, where the handwriting and expression tend to be recognizable to the teacher.

In addition, it is important to train teachers to pass on less of their stereotypical expectations to students and to not let them guide their evaluations. This in no way contradicts thinking about which teaching forms and materials correspond to the average preferences and characteristics of boys and girls and expanding the repertoire accordingly. But when interacting with individuals, stereotypical expectations – of the difficult boys and hard-working girls, the unequally distributed mathematical and verbal abilities – can be harmful for both boys and girls.