Speak up! Why some TV dialogue is so hard to understand

Author

Doctoral researcher in Audio Engineering and General Sir John Monash Scholar, University of Salford

Disclosure statement

Lauren Ward does not work for, consult, own shares in or receive funding from any company or organization that would benefit from this article, and has disclosed no relevant affiliations beyond their academic appointment.

So is making television sound understandable as simple as asking actors to speak up? The short answer is: no. Clean recordings and well enunciated speech will always make dialogue easier to understand. However, the relationship between the audio from our television and what we understand as speech is much more complex.

There is some evidence to support this idea. A recent study
investigating how television sets effect speech intelligibility showed the frequency responses (how loud different frequencies are, relative to each other) in different television sets differed by 10 to 20 decibels. This means the low pitched, rumbling background sounds might be made louder than intended, while the higher pitched voices stay the same volume. This issue is made worse by locating the speakers in the television sets so they point downwards or even backwards.

Speaker quality is likely a contributing factor but not all television programmes have suffered the same complaints as SS-GB. Assuming that viewers did not exclusively watch SS-GB with poor quality television speakers, this means there are other factors at play.

Have I heard this before?

Humans are quite good at understanding speech in challenging or noisy situations. Research indicates personal and psychological factors play a role in how well we are able to do this. Similarly, these factors may affect how we hear dialogue on television.

For example, you might find it easy to understand Bart and Homer’s banter in your 500th episode of The Simpsons while multitasking on Twitter and making a cuppa. But when the first episode of the newest crime drama comes on, you may find that you have to sit down and pay full attention to understand the speech. How well we understand speech is effected by whether we have heard a talker, a particular accent or what they are talking about before.

The effect of a familiar speaker on how well we understand speech is termed the “Familiar Talker Advantage”. Studies have shown that we are able to understand our spouse’s voice (a highly familiar voice) better than unfamiliar voices. Even voices we have only recently heard are easier to understand than those we are completely unfamiliar with.

How predictable the content of the speech is also effects how easily we understand it. It has been well established that when we have language or content cues in the speech, we recognise speech twice as accurately, even in the most challenging of listening situations. If we hear Homer Simpson’s brazen American voice exclaiming “Who ate all the …”, our brains are likely to insert the missing word as “doughnut”, not “bell peppers”. And we probably wouldn’t even notice we were doing it.

Familiarity with an actor’s voice, their accent and what they may be speaking about changes our perception of the clarity of dialogue. This does not solve the issue of audibility more generally though.

I’m no expert, but I know what I like

Part of what makes the problem of audible speech on television difficult to solve is that there is no consensus on what “good sound” sounds like. Even among the barrage of complaints about SS-GB, some found no issue with the dialogue.

Similar patterns have been seen in previous research by the BBC. An experimental football broadcast by the BBC in 2013 allowed viewers to adjust the volume of the crowd compared with the commentary. While most users (77%) agreed that they liked the personalised broadcast, they differed in their preferences. Some balanced commentary and crowd noise while others preferred all crowd noise or all commentary.

The technology which allowed the user to alter the sound mix in the 2013 experiment is called object based broadcasting. In the future, this may allow viewers to alter the levels of different segments of the broadcast based on their preference or their needs on their own televisions. Studies have shown that using the technology in this way can improve speech intelligibility. It has also been proposed by the BBC as a way forward for improving television sound for the hard of hearing.

The many factors effecting speech intelligibility mean that one particular sound mix will rarely make everyone happy. The provision of “personalisable” broadcast mixes, using object based broadcasting, may be the solution.