Character assessment: a middle-class ramp?

My last two posts (here and here) have looked at how teacher assessments can be biased, and how tests can help to offset some of these biases. I’ve been quite sceptical of the possibility of improving teacher assessment so that it can become less biased: the more you try to reduce the bias in teacher assessment, the less it looks like teacher assessment. Still, that’s not to say I am against all alternative forms of assessment. I think exams have many strengths, and are often unfairly maligned, but they have weaknesses too and we should always be looking to innovate to try and address such weaknesses. In this post and the next, I will look at two recent innovations in educational assessment: one which I think is hugely promising, and one which is less so. First, the less promising method.

Assessing character
Teaching character, or non-cognitive skills, is very popular at the moment, and for good reason. Children don’t just need academic skills to succeed in life; they need good character too. As E.D. Hirsch says here, character development has rightly been one of the major focuses of education from classical times.

Whilst we can probably all agree on the importance of teaching character in some form, assessing it is far more fraught. Angela Duckworth, whose research focusses on ‘grit’, or perseverance for long-term goals, has created a very simple 12 and 8 item ‘grit scale’ which ask you to answer a series of questions like this one:

Setbacks don’t discourage me.
a) Very much like me
b) Mostly like me
c) Somewhat like me
d) Not much like me
e) Not like me at all

Duckworth et al discuss the development, validation and limitations of the grit scale here. It’s obviously a self-report scale, with all of the problems they entail, but despite this limitation it can tell us some useful information about how ‘gritty’ individuals are, and the impact this will have on their success in other areas.

However, a self-report scale like this one is very obviously going to be of much less use in any more sophisticated or high-stakes assessments. For example, if you wanted to measure the success of a particular ‘character’ intervention, this scale is not going to allow you to measure whether a cohort’s grit has increased over time. Similarly, if anyone wanted to use a measure of grit or character for accountability purposes, the grit scale is not going to be able to do that either. As the teaching of character has become more popular, more people clearly want a grit scale that is capable of carrying out these kinds of functions. As a result, Duckworth has actually written a paper here outlining in detail why the grit scale is not capable of these kinds of functions, and why it shouldn’t be used like this.

Of course, this doesn’t mean we should stop teaching character. And nor does it mean we have no way of measuring how effective our character education is. As Dan Willingham says here, we could always use measures of academic achievement to see how effective our character education is. The disadvantage is that of course other things will impact on academic achievement, not just the character education, but the advantage is that we are actually measuring one of the things we care about: ‘Indeed, there would not be much point in crowing about my ability to improve my psychometrically sound measure of resilience if such improvement meant nothing to education.’

However, whilst we shouldn’t stop teaching character, I think Duckworth’s paper and the problems surrounding the measurement of character mean we do have to be careful about how we assess it. To return to the theme of my previous posts, we know that teacher assessment is biased against pupils with poor behaviour, with SEN, from low-income backgrounds, and from ethnic minorities. There is every risk that subjective assessments of character might suffer from the same flaws. In fact, I would argue that there is more of a risk. School-level maths is a fairly well-defined concept, and yet teacher assessments of it are biased. I don’t think ‘character’ or ‘grit’ are nearly as well-defined as school maths, so the risk of bias is even greater. Whilst Duckworth’s work on ‘grit’ is clearly defined, in general the recent interest in character education has served to expand the concept rather than define it more precisely. I am often struck by the number of different meanings ‘character’ seems to have, and how often people seem to use the term to mean ‘personal qualities that I have and/or approve of’. Given this, there is a real risk that subjective assessments of character would inadvertently tend to reinforce stereotypes about social classes, gender and ethnic groups, and end up disadvantaging pupils who are already disadvantaged.

Not only that, but if we loaded high-stakes outcomes onto character assessments – for example, giving assessments of character weight in university admissions – then there would be an incentive to game such assessments and again, it is not too far a stretch to think that it would be middle-class parents who would be adept at gaming such assessments for their children, and that students from less privileged backgrounds might be disadvantaged by such assessments. To put it bluntly, I’d worry that character assessments would become a middle-class ramp, a way for underachieving middle-class children to use their nice manners to compensate for their poor grades. Character assessments need a lot of improvement before they can be relied on in the same way as traditional academic exams.

One of the difficulties in math with all this grit stuff is that one of the most useful “qualities” is knowing when to abandon a particular unproductive line of inquiry and stop for a period of refection, or sleep.

A connected problem which is very widespread is the requirement to provide effort grades. When we do this, we are, in a sense, assessing character. These grades can have quite a knock-on effect, and they are extremely subjective.