Labiodentals /r/ here to stay: Deep learning explains why

Articulatory variation has been well-documented in approximant realisations of English /r/. Despite the diversity of tongue shapes [1–3], the acoustic profile of /r/ is relatively stable [4], characterised by a very low F3 [1,5,6] close to F2 [7,8]. However, the production of /r/ remains enigmatic, especially concerning non-rhotic Englishes and the accompanying labial gesture. The lips are particularly pertinent to Anglo-English /r/ because high-F3 labiodental variants are rapidly gaining currency [9,10]. Labiodentalization may be due to speakers retaining the labial gesture at the expense of the lingual one [9,11,12], implying that /r/ is always labiodental even in lingual productions. We verify this assumption by comparing the labial postures of /r/ and /w/ in Anglo-English speakers who still present a lingual component. If /r/ is labiodental, the labial gesture for /w/, which is unequivocally considered rounded, should differ considerably.

We recorded 23 (21F) native speakers from England reading /r-w/ minimal pairs using ultrasound tongue imaging and a front lip camera. Ultrasound confirmed that subjects produced a lingual gesture for /r/, presenting similar patterns of variation to the continuum of possible tongue shapes reported in rhotic varieties of English, i.e. from curled up retroflex to tip down bunched [1–3,13]. In women, F3 was around 800 Hz lower and F2 500 Hz higher for /r/ than /w/ on average. The image corresponding to maximal labial constriction was manually selected from 414 lip videos. A deep convolutional neural network used these images to automatically learn the difference between /r/ and /w/. The very high accuracy of the model (near 100% for most subjects) supported the sufficiently discriminant role of lip configuration; and occlusion analysis [14] confirmed that the model relied on the lips. In order to get a more detailed understanding in articulatory terms, another deep neural network [15] was trained to automatically segment the lips from the rest of the images. This allowed us to obtain consistent measurements of lip width and vertical position.

Results indicate that the lips differ significantly for /r/ and /w/. The lip corners are brought together at the centre for /w/, whereas for /r/, the lips are protruded upwards, presumably resulting in the bottom lip approaching the upper teeth, thus providing a phonetic account for labiodentalization. The question remains why the labial postures for /r/ and /w/ vary. It has been suggested that rounding in front and back vowels differs in order to enhance the perceptual contrast between them [16]: front vowels are produced with less lip corner contraction to avoid over-lowering F2 [17]. The same strategy may be used to enhance the /r-w/ contrast: /r/ has significantly less horizontal contraction than /w/, ensuring F2 remains in close proximity to F3. Our future research will assess whether these different labial gestures are perceptibly salient.