Affiliation: MARCS Institute, University of Western Sydney, Penrith, New South Wales, Australia; School of Humanities and Communication Arts, University of Western Sydney, Penrith, New South Wales, Australia.

ABSTRACTDrawing on phonology research within the generative linguistics tradition, stochastic methods, and notions from complex systems, we develop a modelling paradigm linking phonological structure, expressed in terms of syllables, to speech movement data acquired with 3D electromagnetic articulography and X-ray microbeam methods. The essential variable in the models is syllable structure. When mapped to discrete coordination topologies, syllabic organization imposes systematic patterns of variability on the temporal dynamics of speech articulation. We simulated these dynamics under different syllabic parses and evaluated simulations against experimental data from Arabic and English, two languages claimed to parse similar strings of segments into different syllabic structures. Model simulations replicated several key experimental results, including the fallibility of past phonetic heuristics for syllable structure, and exposed the range of conditions under which such heuristics remain valid. More importantly, the modelling approach consistently diagnosed syllable structure proving resilient to multiple sources of variability in experimental data including measurement variability, speaker variability, and contextual variability. Prospects for extensions of our modelling paradigm to acoustic data are also discussed.

pone.0124714.g002: Illustration of temporal alignment in Arabic.Positional signals in the y-dimension for 3 different receivers, tongue tip, lower lip and tongue back, for 10 repetitions each of bulha, sbulha, ksbulha. The leftmost vertical line (grey) demarcates the center of the initial consonant cluster (or single consonant as in bulha). The middle vertical line demarcates the release of the prevocalic consonant [b]. The rightmost vertical line demarcates the point of inferred maximum constriction in the post-vocalic consonant [l]. Reprinted from [13] under a CC BY license, with permission from Cambridge University Press (S2 File), original copyright 2011.

Mentions:
Extracting landmarks from speech movements using this procedure yields a series of timestamps. These timestamps are used to quantify patterns of temporal organization corresponding to distinct syllable parses. Fig 2 provides an illustration of how temporal landmarks parsed from the velocity signal are used to define structurally relevant temporal intervals (as schematized in Fig 1) for three Arabic words bulha, sbulha, and ksbulha (‘her urine’, ‘her ear of grain’, ‘they owned it for her’). For each word, the positional signal in the y-dimension (up-down movement) is shown. Only the y-dimension is shown here for simplicity in presentation. Actual data analysis is based on both the vertical and horizontal (front-back) movements of articulators. Each panel of Fig 2 shows ten trajectories (grey lines) corresponding to ten repetitions of the word along with a highlighted ensemble average (black line). Three vertical lines are drawn for each word. The rightmost line corresponds to the anchor, the timestamp that right-delimits the temporal intervals of interest. In this example, the anchor is the maximal vertical displacement of the tongue tip movement corresponding to the segment [l]. That segment was chosen because it is at the end of the hypothesised syllabic unit and is shared across all stimuli (it appears after the vowel in each stimulus word). We refer to this point as CMax. In the analyses to follow, we use the articulatory landmarks of either CMax or VEnd, the offset of the vowel, as anchor points. Changing the anchor point from CMax to VEnd increases the amount of variability in the intervals. Later on, we show that our models can capture how such increases in variability influence phonetic heuristics for syllables.All thirty data tokens in Fig 2 (three words, ten repetitions) are aligned at the CMax anchor timestamp. A second vertical black line is drawn at the mean value of the Release timestamps of the lower lip constriction for the [b], bRelease in bulha, sbulha and ksbulha. Another vertical line is drawn at the center of the word-initial consonant cluster. The center of a cluster is the mean of the midpoints of each consonant in the cluster, where consonant midpoint refers to the point equidistant to the Target and Release landmarks of the consonant. As Fig 2 shows, the interval between bRelease and the anchor point does not seem to change much across bulha, sbulha, ksbulha. In contrast to what is observed for bRelease, the center of the consonant cluster gets farther away from the anchor point with each consonant added. This is indicated by the progressive leftward shift of the vertical grey lines from bulha to sbulha to ksbulha. In these Arabic datasets, then, as consonants are added at the beginning of a word, the local timing relation between the [b] and its adjacent vowel does not change much. This was schematically shown in the left panel of Fig 1. This Arabic temporal organization contrasts with the English one schematized in the right panel of Fig 1. In English, as consonants are added at the beginning of the word, the local timing between the prevocalic consonant and the vowel has been reported to change. In the right panel of Fig 1, this was shown by the progressive rightward shift of the prevocalic consonant.

pone.0124714.g002: Illustration of temporal alignment in Arabic.Positional signals in the y-dimension for 3 different receivers, tongue tip, lower lip and tongue back, for 10 repetitions each of bulha, sbulha, ksbulha. The leftmost vertical line (grey) demarcates the center of the initial consonant cluster (or single consonant as in bulha). The middle vertical line demarcates the release of the prevocalic consonant [b]. The rightmost vertical line demarcates the point of inferred maximum constriction in the post-vocalic consonant [l]. Reprinted from [13] under a CC BY license, with permission from Cambridge University Press (S2 File), original copyright 2011.

Mentions:
Extracting landmarks from speech movements using this procedure yields a series of timestamps. These timestamps are used to quantify patterns of temporal organization corresponding to distinct syllable parses. Fig 2 provides an illustration of how temporal landmarks parsed from the velocity signal are used to define structurally relevant temporal intervals (as schematized in Fig 1) for three Arabic words bulha, sbulha, and ksbulha (‘her urine’, ‘her ear of grain’, ‘they owned it for her’). For each word, the positional signal in the y-dimension (up-down movement) is shown. Only the y-dimension is shown here for simplicity in presentation. Actual data analysis is based on both the vertical and horizontal (front-back) movements of articulators. Each panel of Fig 2 shows ten trajectories (grey lines) corresponding to ten repetitions of the word along with a highlighted ensemble average (black line). Three vertical lines are drawn for each word. The rightmost line corresponds to the anchor, the timestamp that right-delimits the temporal intervals of interest. In this example, the anchor is the maximal vertical displacement of the tongue tip movement corresponding to the segment [l]. That segment was chosen because it is at the end of the hypothesised syllabic unit and is shared across all stimuli (it appears after the vowel in each stimulus word). We refer to this point as CMax. In the analyses to follow, we use the articulatory landmarks of either CMax or VEnd, the offset of the vowel, as anchor points. Changing the anchor point from CMax to VEnd increases the amount of variability in the intervals. Later on, we show that our models can capture how such increases in variability influence phonetic heuristics for syllables.All thirty data tokens in Fig 2 (three words, ten repetitions) are aligned at the CMax anchor timestamp. A second vertical black line is drawn at the mean value of the Release timestamps of the lower lip constriction for the [b], bRelease in bulha, sbulha and ksbulha. Another vertical line is drawn at the center of the word-initial consonant cluster. The center of a cluster is the mean of the midpoints of each consonant in the cluster, where consonant midpoint refers to the point equidistant to the Target and Release landmarks of the consonant. As Fig 2 shows, the interval between bRelease and the anchor point does not seem to change much across bulha, sbulha, ksbulha. In contrast to what is observed for bRelease, the center of the consonant cluster gets farther away from the anchor point with each consonant added. This is indicated by the progressive leftward shift of the vertical grey lines from bulha to sbulha to ksbulha. In these Arabic datasets, then, as consonants are added at the beginning of a word, the local timing relation between the [b] and its adjacent vowel does not change much. This was schematically shown in the left panel of Fig 1. This Arabic temporal organization contrasts with the English one schematized in the right panel of Fig 1. In English, as consonants are added at the beginning of the word, the local timing between the prevocalic consonant and the vowel has been reported to change. In the right panel of Fig 1, this was shown by the progressive rightward shift of the prevocalic consonant.

Affiliation:
MARCS Institute, University of Western Sydney, Penrith, New South Wales, Australia; School of Humanities and Communication Arts, University of Western Sydney, Penrith, New South Wales, Australia.

ABSTRACTDrawing on phonology research within the generative linguistics tradition, stochastic methods, and notions from complex systems, we develop a modelling paradigm linking phonological structure, expressed in terms of syllables, to speech movement data acquired with 3D electromagnetic articulography and X-ray microbeam methods. The essential variable in the models is syllable structure. When mapped to discrete coordination topologies, syllabic organization imposes systematic patterns of variability on the temporal dynamics of speech articulation. We simulated these dynamics under different syllabic parses and evaluated simulations against experimental data from Arabic and English, two languages claimed to parse similar strings of segments into different syllabic structures. Model simulations replicated several key experimental results, including the fallibility of past phonetic heuristics for syllable structure, and exposed the range of conditions under which such heuristics remain valid. More importantly, the modelling approach consistently diagnosed syllable structure proving resilient to multiple sources of variability in experimental data including measurement variability, speaker variability, and contextual variability. Prospects for extensions of our modelling paradigm to acoustic data are also discussed.