One eyewitness to a robbery reports that the culprit was a male in his 40s with brown hair, wearing a light-coloured T-shirt. Another describes a blond man in his early 30s wearing a denim shirt. If you’re a police officer investigating the crime, whose memory do you trust?

Identifying which of two apparently credible but conflicting eye-witness statements to trust is a big problem for law enforcement agencies (as is deciding, in the case of a witness who has no incentive to lie, which among their memories are accurate). Now a new paper, reported in the Journal of Experimental Psychology, provides initial evidence for a new, objectively verifiable method for doing this. This work has, as the researchers write, “potentially far-reaching significance, not the least in the legal context.”

Torun Lindholm at Stockholm University and her fellow researchers report two studies. For the first, they recruited 34 men, who separately watched a video of a staged abduction of a woman by two men. Afterwards, the men, who were interviewed separately, were first asked to recall all they could about the crime, then they were given 12 specific prompts – to describe the clothing of the men who waited outside the victim’s house, for instance.

These interviews were recorded, and raters later identified and pooled all the “accurate” and “inaccurate” responses. Next, two more raters then coded each of these individual responses, this time looking for cues that indicated how hard the participant had to work to produce their response (so-called “effort cues”). These included pauses in speech; filler non-words (like “um” or “uh”); filler words (like “you know” or “well”); and hedges (words that reduced the force of an assertion, such as “maybe” or “I guess”).

Earlier work has found that correct memories are recalled faster than incorrect memories, which led the researchers to suspect that signs of slow, more effortful recall – like the use of lots of filler words – might indicate that a memory was faulty. As expected, the analysis showed that inaccurate responses included significantly more hedges and filler non-words and words than accurate responses. This result raises the possibility that these cues could in future be used to differentiate accurate from mistaken memories in an objective way.

A second study on 10 men and 10 women mostly replicated these findings, but also found that, for this group, delays while reporting a memory – another effort cue – were associated with inaccuracy. (This study also found that asking a participant how confident they were in the accuracy of a particular memory provided no additionally useful information, on top of the hedge, delay and filler data.)

These studies do have some limitations. The number of participants was small. And they were interviewed immediately after viewing the events they were asked to describe, which would rarely happen in real life eye-witness situations. Might delays in interviewing make it harder to use effort cues to predict statement accuracy? Only further research will tell.

Also, while use of a greater use of filler words and hedges was strongly associated with inaccuracy, that doesn’t mean that participants reporting accurate memories never used them. Will the total number of effort cues observed in a witness’s testimony allow police officers easily to distinguish between accurate and faulty memories? As yet, no one knows. Still, this work does reveal some objectively verifiable markers, and that’s an important first step. As the researchers point out, “Given research showing that most people have vast difficulties in judging the quality of others’ memories, combined with the scarcity of research on genuinely reported memories, these initial findings suggest unexplored alternatives that may prove highly useful for improving accuracy judgements.”

Whilst the basic premise is solid, the problem with using this method as a general guide is the variability between individuals. Whilst in a given individual more delays, hedges etc may indicate poorer recall, the amount of these properties between individuals may well vary sufficiently for the same accurate recall across a number of individuals to include the full range of accuracy cues.

The other variable is age with older people typically slower to recall even when recalling accurately. Medication such as SSRIs and other psychoactive medications may also alter recollection with slow recall being a hallmark of some medications, for instance anti-anxiety medications.

As with a polygraph, for this metric to be useful a baseline for a given individual needs to be established before the recollection in question can be evaluated, it being relative to the established baseline.

What would be the effect if testimony is given in an individual’s second or third language? Might it then be that the non-words and filler words are used while the individual is “searching” for the correct words / translating in his/her mind? So it seems that there might be many nuisance variables at play here.