Everyone likes to imagine they are rational, fair, and free from prejudice. But how easily are we misled by appearances? Noola Griffiths studies the psychology of music, and she's published a cracking paper on how what women wear affects your judgment of their performance. The results are predictable but the context is interesting.

Four female musicians were filmed playing in three different outfits: a concert dress, jeans, and a nightclubbing dress. They were also all filmed as points of light, wearing a black tracksuit in the dark, so that the only thing to be seen – once the images had been treated – was the movement of some bright white tape attached to their joints.

All these violinists were music students, from the top 10% of their year, and they were vetted to ensure comparability : they were all white Europeans, size 10 dress, size 4 or 5 shoe, and aged between 20 and 22.

They were even equivalently attractive, according to their score on the MBA California facial mask, which seems to be some kind of effort to derive a numerical hotness quotient from the best fit of a geometric mask over someone's face. I'm not saying that's not ridiculous, I'm just saying they tried.

In fact they did better. All the performances were also standardised at 104 beats per minute, so the audio tracks from each musician could be replaced with a recording of a single performance, recorded by someone who was never filmed, for each of the various pieces in the study.

This meant there was no room for anyone to argue that the clothes made the musicians perform differently, and when the researchers checked in a pilot study, nobody watching the clips had spotted the switch.

Then they got 30 different musicians – a mixture of music students and members of the Sheffield Philharmonic – to watch video clips with various different permutations of clothing, player and piece. All were invited to give each performance a score out of six for technical proficiency and musicality, and the results were inevitable.

For technical proficiency, performers in a concert dress were rated higher than if they were in jeans or a clubbing dress, even though the actual audio performance was exactly the same every time (and played by a separate musician who was never filmed). The results for musicality were similar: musicians in a clubbing dress were rated worst.

Experiments offer small constricted worlds, which we hope act as models for wider phenomena. How far can you apply this to wider society? Women are still discriminated against in the workplace, but each situation has so many variables it can be difficult to assess.

In the world of music, assessment of performance goals can be restricted to make individuals broadly comparable, and so there's a reasonably long tradition of the field being used as a test tube for bigotry. In the 1970s and 1980s, in an attempt to overcome biases in hiring, most orchestras changed their audition policy, and began using screens to conceal the identity of the candidate.

Female musicians in the top five US symphony orchestras rose from 5% in the 1970s to around 25%. This could have been due to wider societal shifts, so Goldin and Rouse conducted a very elegant study, Orchestrating Impartality: they compared the number of women being hired at auditions with and without screens, and found women were several times more likely to be hired when nobody could see that they were a woman.

What's more, using data on the changing gender makeup of orchestras over time, they were able to estimate that from the 1970s to 2000 – the era which shifted from casual racism and sexism in popular culture, to more covert forms – the trend towards greater equality was driven simply by selectors being forced not to see who they were selecting. I don't know how you'd apply the same tools to every workplace. But I'd like to see someone try.