The fatal flaw in attempts to assess soft skills

Imagine I gave you a test to assess your maths skills. It starts like this:

Be honest – there are no right or wrong answers

1. I enjoy working at maths problems

❑ Very much like me
❑ Mostly like me
❑ Somewhat like me
❑ Not much like me
❑ Not like me at all

2. I worry that I’m not good at maths

❑ Very much like me
❑ Mostly like me
❑ Somewhat like me
❑ Not much like me
❑ Not like me at all …

I haven’t tested these particular questions with real people but, hopefully, you will agree that giving something like this to a group of adults could potentially give a good indication of who was better or worse at maths. So why don’t maths tests tend to look like this? After all, isn’t this what they are for?

First we need to consider the difference between this kind of survey and a traditional maths test. The survey asks participants to report a view of themselves on maths related criteria. A traditional test assesses maths by getting students to do the maths. It asks them to prove their ability.

The reason we have developed such an approach is because self-reports can be gamed. If there is nothing at stake then there is no reason to game a survey but what if your term grade depends on it? And teachers aren’t dispassionate observers, they are actors who are seeking to cause change. This provides conscious and unconscious pressure.

A survey like the one above is fairly transparent. We know the answers that the better mathematicians are likely to give and so it’s tempting to choose those. If there is any doubt then teachers could instruct students in what the best answers are: “Good mathematicians enjoy solving maths problems.” Maths teaching is therefore displaced by meta-maths teaching.

What’s the problem? Nobody is suggesting such an eccentric approach to assessing maths so what am I going on about?

The issue is that self-reports like this survey are the only means by which we can reliably assess many of the soft skills that have become all the rage in schools in recent years. The maths test above is based upon Angela Duckworth’s survey for assessing grit. Once soft skills inveigle their way into the curriculum then people want to assess and report on them. We risk a whole lot of wasted effort where teachers effectively instruct students in the correct answers to survey questions. Nobody will think that’s what they’re doing but it’s what it will reduce to.

If we tried to assess these soft skills directly then we would need to set up situations where a performance depended upon the use of the desired skill, just like in the traditional maths test. But what would that look like?

Almost every test I can imagine would conflate a soft skill with the application of a more conventional skill or body of knowledge. For instance, resilient cyclists may cycle further but cycling further is also easier if you are a skilled cyclist and have good aerobic fitness. It’s hard to think of tasks that depend only on resilience and nothing else.

And when you think about it further, doesn’t a traditional maths test also depend on resilience as well as maths? Perhaps we are directly measuring some of these soft skills already.

10 Comments on “The fatal flaw in attempts to assess soft skills ”

Of course you are right in this, which is why when Carol Dweck’s mindset assessment questionnaire moved out of the laboratory into the classroom (or indeed teachers’ cpd) it became unreliable. She herself has talked about the emergence of ‘false growth mindset’, arguably an inevitable consequence of the test gaining in significance for those taking it.

However, your final paragraph actually highlights, perhaps unintentionally, why it is still vital that we seek ways to independently assess soft skills. You wrote: “And when you think about it further, doesn’t a traditional maths test also depend on resilience as well as maths? Perhaps we are directly measuring some of these soft skills already.” And it is exactly because the traditional assessments, perhaps especially high stakes assessments, are conflating the measurement of subject skills and ‘soft’ skills that it is so important that we seek ways to distinguish between them. Without an accurate diagnosis we may end up prescribing the wrong treatment.

I can’t imagine how you would teach resilience directly, so how could you assess it directly? Resilience is a desirable character trait, but it is best fostered by encouraging students to persist at some task in the face of difficulty until, eventually, they are rewarded by success. If you teach the subject matter effectively, you are automatically teaching resilience. The expectation that the teacher is there to entertain you — that everything will be fun — is what undermines resilience. There is no need to either teach or assess resilience directly.

In an earlier life, I was a senior research director at the Educational Testing Service in Princeton, NJ. One of the research centers in the cluster that I led was the “Centre for New Constructs” and we had some of the world’s leading researchers in this field working on how to avoid the problem that Greg identifies (which researchers in this area usually describe as “gaming the test”—students giving socially desirable rather than honest answers). We never did figure out a way of getting round this problem, and I, for one, am fairly confident that we never will. The reason that maths tests work is because if you do not know how to solve the problem, you can’t fake it. This will always be a problem in the assessment of “non-cognitive” aspects of human performance. In this context, it is also worth noting that some of Angela Duckworth’s best known studies suffer from what statisticians call restriction of range. If you look at students at highly-selective institutions like West Point, then of course non-cognitive measures will account for more of the variation in final performance, because you have ensured that, on cognitive measures, all the candidates are almost identical. With the range of these variables found in the whole population, however, cognitive variables almost always have higher correlations with performance outcomes than do non-cognitive measures, even when students aren’t gaming the test. If students are gaming the test, then of course those who score higher on cognitive tests will be better at it, so much of the variation you think is due to the non-cognitive factors is actually just a difference in how smart test-takers are in figuring out what you want. Conversely, about one-third of the variation in IQ test scores is actually just perseverance. Some test-takers give up as soon as they hit a string of questions they can’t answer, while others persist. You might think that this might provide a way of assessing non-cognitive skills reliably—let’s just look at which students reach the end of the test, and which students give up. But as soon as students know this is what you are going to look at, they’ll make sure that every question has an answer.

This may be a stupid or flippant thought, but wouldn’t banging your head against a brick wall provide a reasonable test of resilience?
Just thinking about Dylan Wiliam’s IQ test example above, what about a test that was effectively infinite in length (both in time allowed and number of questions), and high-stakes for both resilience and IQ, and you measured the number of questions answered before performance dropped off, and the IQ score up to that point. Would that not give a possible measure of resilience? You would be measuring how long someone was prepared to keep trying hard. It would still need the IQ score to be more important to the candidate than the resilience score but, if so, it might be hard to game.
What about some kind of infinite maths problem. I’ve always thought Andrew Wiles must have a pretty high maths resilience score. What about something like Sweller’s ‘find as many angles as you can’ problem.
Maybe making the stakes similarly high for everyone is impossible, though. Then again, maybe the different value of the rewards is at the heart of the thing anyway.
Best wishes

No banging your head against a wall wouldn’t work. People could game this by building up a thicker skin. To avoid this you need a wide range of tests that require skills most people don’t train for and randomly assign several of them to each student. That would take up a lot of time.

We have plenty of what amount to infinite problems out there. They are called computer games.

The thing is that the kids who supposedly have to learn “resilience” are extremely resilient at playing them. They don’t, generally, lack resilience as such. What kids lack, but many develop as they get older, is resilience at tasks they don’t like.

So infinite Maths problems will just test how much they like Maths, combined with any actual resilience I suppose. As a kid I could sit at Maths problems for ages, even if really difficult and not making much progress, but I would give up on musical problems in seconds, even if really easy. Was I both resilient and not-resilient at the same time?