Forthcoming

Towards a comparative history of tonal text-setting practices in Southeast Asia. James Kirby.
In Reinhard Strohm (ed.), Transcultural music history. VWB-Verlag. In press.
[ preprint
| abstract
]

In this chapter, I present the beginnings of a systematic investigation into the extent to which different tone languages, and different genres, adhere to principles of alignment between tone and melody. Particular emphasis is placed on attempting to determine the degree to which adherence to principles of text-setting, which govern how a composer may assign notes to words, are a function of a particular linguistic (as opposed to musical) tradition. The question is not simply if there are different textsetting principles active in different languages, which much previous research has shown to be the case. Rather, I seek to determine if there is evidence that principles
active in a given language persist across time and genre.

Singing in tone languages, a perennial source of mystery to speakers of non-tonal languages, has been the subject of a good deal of research since the turn of the century. This research shows that text-setting constraints are the heart of the solution to respecting both the linguistic and the musical functions of pitch. Specifically, in most of the 15 or 20 Asian and African tone languages where the question has been studied, the most important principle in maintaining intelligibility of song texts seems to be the avoidance of what we call contrary settings: musical pitch movement up or down from one syllable to the next should not be the opposite of the linguistically specified pitch direction. We review the variations on this theme that have been described in the recent literature, including differences between languages and musical genres in how strictly the constraint is observed, and other phonetic resources used to signal tonal distinctions in singing. We briefly consider two more general issues: (1) how tonal text-setting might be incorporated into a general theory that includes traditional European metrics, and (2) what the avoidance of contrary settings tells us about the phonological essence of tonal contrasts.

Mainland Southeast Asia is often viewed as a linguistic area where five different language phyla – Austroasiatic, Austronesian, Hmong-Mien, Sino-Tibetan and Kra-Dai – have converged typologically. This chapter illustrates areal features found in their prosodic systems, but also emphasizes their oft-understated diversity.

The first part of the chapter describes word level prosodic properties. A typology of word shapes and stress is first established: we revisit the concept of monosyllabicity, go over the notion of sesquisyllabicity (as typified by languages like Mon or Burmese) and discuss the realization of alternating stress in languages with polysyllabic words (such as Thai and Khmer). Special attention is then paid to tonation. Although many well-known languages of the area have sizeable inventories of complex tone contours, languages with few or no tones are common (20% being atonal). Importantly, the phonetic realization of tone frequently involves more than simply pitch: properties like phonation and duration often play a role in signaling tonal contrasts, along with less expected properties like onset voicing and vowel quality. We also show that complex tone alternations (spreading, neutralization and sandhi processes), although not typical, are well-attested.

The second part of the chapter addresses the less well-understood topic of phrasal prosody: prosodic phrasing and intonation. We reconsider the question of the amount of conventionalized intonation in languages with complex tone paradigms and pervasive final particles. We also show that information structure is often conveyed by means of overt markers and syntactic restructuring, but that it can also be marked by means of intonational strategies.

We consider how lexical stress and phrasal accent influence the acoustic realization of cues to phonological voicing in German plosives. 22 native speakers of Standard German were recorded producing a total of 3168 utterances in both strong (stressed/focused) and weak (unstressed/unfocused) prosodic contexts, while holding prosodic domain constant. Both Voice Onset Time (VOT) and obstruent-intrinsic F0 (CF0) were analyzed. We found that differences in the magnitude of CF0 between voiced and voiceless plosives were greatest in the strong prosodic context, but were not always obliterated in the weak prosodic context. However, individual differences were also observed, with speakers broadly patterning into four groups with respect to the interaction of micro- and macroprosody. VOT differences were also more pronounced in strong prosodic contexts. We consider the implications of our findings for sound changes involving the reanalysis of obstruent-intrinsic F0.

Madurese, a Malayo-Polynesian language of Indonesia, is of interest both areally and typologically: it is described as having a three-way laryngeal contrast between voiced, voiceless unaspirated, and voiceless aspirated plosives, along with a strict phonotactic restriction on consonant voicing-vowel height sequences. We present an acoustic analysis of Madurese consonants and vowels obtained from recordings of fifteen speakers, to assess whether its voiced and aspirated plosives might share acoustic properties indicative of a shared articulatory gesture. Although we find that voiced and voiceless aspirated plosives in word-initial position pattern together in terms of several spectral balance measures, these are most likely due to the following vowel quality, rather than aspects of a shared laryngeal configuration. Conversely, the voiceless (aspirated and unaspirated) plosives share multiple acoustic properties, including F0 trajectories and overlapping voicing lag time distributions, suggesting that they share a glottal aperture target. We discuss the implications of these findings for the typology of laryngeal contrasts and the historical evolution of the Madurese consonant-vowel co-occurrence restriction.

It is sometimes argued that languages with two-way laryngeal contrasts can be classified according to whether one series is realized canonically with voicing lead or the other with voicing lag. In languages of the first type, such as French, the phonologically relevant features is argued to be [voice], while in languages of the second type, such as German, the relevant feature is argued to be [spread glottis]. A crucial assumption of this position is that the presence of certain contextually stable phonetic cues, namely voicing lead or lag, can be used to diagnose the which feature is phonologically active.

In this paper, we present data on obstruent-intrinsic F0 perturbations (CF0) in two [voice] languages, French and Italian. Voiceless obstruents in both languages are found to raise F0, while F0 following (pre)voiced obstruents patterns together with sonorants, similar to the voiceless unaspirated stops of [spread glottis] languages like German and English. The contextual stability of this cue implies that an active devoicing gesture is common to languages of both the [voice] and [spread glottis] types, and undermines the idea that a strict binary dichotomy between true voicing and aspirating languages can be reliably inferred based on properties of the surface phonetics.

We describe the register system of Chru, a Chamic language of Vietnam. In Chru, a historical contrast between prevoiced and voiceless stops is now a system of two registers signalled by differences in f0, voice quality, and F1 in addition to closure voicing. However, closure voicing is in a state of flux: while older men maintain closure voicing in the onsets of low-register items, younger speakers and some older women frequently have no (or only weak) closure voicing in this context. In addition, the distribution of VOT in low register onsets is bimodal, realized either with strong closure voicing or greater VOT than voiceless stops. Interestingly, f0, F1 and voice quality cues are not enhanced after devoiced low-register stops, but instead are more pronounced after stops realized with closure voicing. We argue this indicates that enhancement of cues in phonologization must in some sense be complete before neutralization takes place.

We investigate native speaker perception of cues to voiceless plosives in the Malayo-Polynesian language Madurese. Madurese is described as having a three-way laryngeal contrast between voiced, voiceless aspirated, and voiceless unaspirated plosives. However, voiceless aspirated and unaspirated plosives are always followed by vowels of different but predictable height, and their VOT distributions overlap heavily, raising the question of whether VOT or F1 is primary perceptual cue to this contrast. The trading relation between VOT and F1 in Madurese was investigated using 2AFC identification and AXB discrimination paradigms. Results indicate that the VOT differences between voiceless plosives which exist in production are not exploited in perception, suggesting that Madurese speakers may not have distinct phonetic targets for aspirated and unaspirated plosives. The surface VOT distributions may instead be a result of differences in following vowel height.

We present an acoustic analysis of cues to onset voicing in Dzongkha, the national language of Bhutan. Dzongkha is typically described as having a four-way laryngeal contrast between aspirated, unaspirated, prevoiced and devoiced obstruents. Previous descriptions suggest that this system may be changing, with the devoiced series either merging with the voiced series, or losing closure voicing but retaining contrastive pitch and/or voice quality. Based on data from 12 speakers, we find voiced and devoiced plosives are realised both with and without voicing lead. Tokens realized as phonetically voiced can be redundantly breathy; however, a low register tone always occurs on syllables headed by both voiced and devoiced obstruents, regardless of presence or absence of voicing lead. We discuss the implications of these findings for models of tonogenesis and historical sound change in the Tibeto-Burman context.

This paper presents the first phonetic study of the tone system of Zhajin Gan. While important for our understanding of tonogenesis in general and Chinese historical phonology in particular, the tone systems of Gan dialects have not been analysed acoustically. Zhajin Gan can be analysed as having 6 or 7 tones in non-checked syllables, with most having two allotones conditioned by historical differences in onset type. However, as synchronic cues to the laryngeal contrast have weakened or disappeared, the previously redundant onset f0 differences are being phonologized, setting the stage for additional tone splits. In Zhajin Gan, plosives and affricates from MC Ciqing and Quanzhuo series are synchronically realized as lax voiced stops, correlating with lower onset f0, but without any evidence for synchronic aspiration. We discuss the possible role that non-modal phonation may have played in the evolution of this complex tone system.

Sociolinguistics is often concerned with how variants of a linguistic item (e.g., nothing vs. nothin’) are used by different groups or in different situations. We introduce the task of inducing lexical variables from code-mixed text: that is, identifying equivalence pairs such as (football, fitba) along with their linguis- tic code (football→British, fitba→Scottish). We adapt a framework for identifying gender-biased word pairs to this new task, and present results on three different pairs of English dialects, using tweets as the code-mixed text. Our system achieves precision of over 70% for two of these three datasets, and produces useful results even without extensive parameter tuning. Our success in adapting this framework from gender to language variety suggests that it could be used to discover other types of analogous pairs as well.

This paper investigates the relationship between Voice Onset Time (VOT) and onset f0 perturbations in three languages with a three-way laryngeal contrast between prevoiced, short-lag, and long-lag stops. To assess the relative contributions of aspiration and tonality to the realization of onset f0, a non-tonal language (Khmer) is compared to two tonal languages (Central Thai and Northern Vietnamese) using a common set of methods and materials. While the VOT distributions of the three languages are extremely similar, they differ in terms of their onset f0 behavior. Aspirated stops in general condition higher f0 on the following vowel, but this effect is mediated by tonal and sentential context: it is more prominent in citation forms than in connected speech, and for the tone languages, it is more visible with higher as opposed to lower tones. Examination of individual differences suggests that speakers may differ systematically in terms of their laryngeal adjustments for expressing voicelessness even while maintaining similar timing relations as indicated by VOT. Onset f0 differences may serve a useful complement to VOT, particularly when reasoning about the cross-linguistic implementation of voicing.

It is common practice in the statistical analysis of phonetic data to draw conclusions on the basis of statistical significance. While p-values reflect the probability of incorrectly concluding a null effect is real, they do not provide information about other types of error that are also important for interpreting statistical results. In this paper, we focus on three measures related to these errors. The first, power, reflects the likelihood of detecting an effect that in fact exists. The second and third, Type M and Type S errors, measure the extent to which estimates of the magnitude and direction of an effect are inaccurate. We then provide an example of design analysis (Gelman & Carlin, 2014), using data from an experimental study on German incomplete neutralization, to illustrate how power, magnitude, and sign errors vary with sample and effect size. This case study shows how the informativity of research findings can vary substantially in ways that are not always, or even usually, apparent on the basis of a p-value alone. We conclude by repeating three recommendations for good statistical practice in phonetics from best practices widely recommended for the social and behavioral sciences: report all results; design studies which will produce high-precision estimates; and conduct direct replications of previous findings.

Statistical and empirical methods are in widespread use in present-day phonological research. Researchers are often interested in the problem of model selection, or determining whether or not a particular term in a model is statistically significant, in order to make a judgement about whether or not that term is theoretically significant. If a term is not significant, it is often tempting to conclude that it is not relevant. However, such inferences require an assessment of statistical power, a dimension independent from significance. Assessing power is more difficult than assessing significance because it depends on factors including the true (or expected) effect size, sample size, and degree of noise. In this paper, we provide a non-technical introduction to the issue of power, illustrated with simulations based on experimental investigations of incomplete neutralization, to illustrate how not all null results are equally informative. In particular, depending on the statistical power, a non-significant result can either be uninformative, or reasonably interpreted as providing evidence consistent with a small or zero effect.

This paper presents an acoustic and perceptual study of the r>h shift in the variety of Khmer spoken in Giồng Riềng district, Kiên Giang province, Vietnam. In Phnom Penh Khmer, /r/ is realized as [h] in syllable onsets and onset clusters, and accompanied by lowered pitch, breathiness, and in some cases a change in the quality of the following vowel. In Kiên Giang Khmer, the r>h shift is accompanied by pitch lowering, but without changes in aspiration or vowel quality, and spectral measures did not indicate substantial differences in voice quality. Consistent with their productions, users of this dialect appear to rely solely on differences pitch to identify these lexical items. We discuss the implications of our findings for Khmer dialectology, mechanisms of sound change, and variation in the realization of rhotics more generally.

Sociolinguistic research suggests that speakers modulate their language style in response to their audience. Similar effects have recently been claimed to occur in the informal written context of Twitter, with users choosing less region-specific and non-standard vocabulary when addressing larger audiences. However, these studies have not carefully controlled for the possible confound of topic: that is, tweets addressed to a broad audience might also tend towards topics that engender a more formal style. In addition, it is not clear to what extent previous results generalize to different samples of users. Using mixed-effects models, we show that audience and topic have independent effects on the rate of distinctively Scottish usage in two demographically distinct Twitter user samples. However, not all effects are consistent between the two groups, underscoring the importance of replicating studies on distinct user samples before drawing strong conclustions from social media data.

In this chapter, we address the role of contact in the evolution of tone in mainland Southeast Asia (MSEA). We present an overview of the phonetic, phonological, and genetic characteristics of MSEA tone systems, emphasizing the rich variability of tonal realization found in the region. Next, we discuss the ways in which languages can become tonal, reviewing evidence for the spread of tone through contact as well as for the idea that much of the observed tonality on the ground in modern MSEA might be traced to a small number of ‘tonogenetic events’ rather than a large number of borrowings. In light of this discussion, we consider whether a re-evaluation of the notion of tone as a canonical indicator of ‘linguistic area’ more generally is warranted.

The Tai dialect spoken in Cao Bằng province, Vietnam, is at an intermediate stage between tonal register split and the accompanying transphonologization of a voicing contrast into a dual-register tone system. While the initial sonorants have completely lost their historical voicing distinction and developed a six-way tonal contrast, the obstruent series still preserves the original voicing contrast, leaving the tonal split incomplete. This paper presents the first acoustic study of tones and onsets in Cao Bằng Tai. Although f0, VOT, and voice quality were all found to play a role in the system of laryngeal contrasts, the three speakers considered varied in terms of the patterns of acoustic cues used to distinguish between onset types, particularly the breathy voiced onset /b̤/. From the diachronic perspective, our findings may help to explain why the reflex of modal prevoiced stops (*b) can be either aspirated or unaspirated voiceless stops.

This study investigates consonant-related F0 perturbations (“CF0”) in French and Italian by comparing the effects of voiced and voiceless obstruents on F0 to those of voiced sonorants. The voiceless obstruents /p f/ in both languages are found to have F0-raising properties similar to American English voiceless obstruents, while F0 following the (pre)voiced obstruents /b v/ in French and Italian patterns together with /m/, again similar to English [Hanson (2009). J. Acoust. Soc. Am. 125(1), 425–441]. In both languages, F0 is significantly depressed, relative to sonorants, during the closure for voiced obstruents, but cannot be differentiated from sonorants following the release of oral constriction. These findings are taken as support for a model on which F0 perturbations are fundamentally the result of laryngeal maneuvers initiated to sustain or inhibit phonation, regardless of other language-particular aspects of phonetic realization.

We examine the degree of correspondence between musical and linguistic tone sequences in a corpus of 20 Vietnamese popular songs. Our data suggest that text-setting constraints in Vietnamese, as in some other Asian tone languages, are based primarily on the direction of the pitch transition from one sylla- ble/note to the next. Borrowing some musical terminology, we may say that ‘similar motion’ is favoured, ‘oblique motion’ is allowed in certain cases, and ‘contrary motion’ is disfavoured. As with other Asian tone languages, the definition of these three types of motion depends on the categorisation of the linguistic tones; our current best model achieves a rate of 77% similar motion. We hypothesize that avoidance of contrary motion may be as or more important than achieving correspondence be- tween tonal and melodic transitions in Vietnamese popular song.

Recent work has proposed using network science to analyse the structure of the men- tal lexicon by viewing words as nodes in a phonological network, with edges connect- ing words that differ by a single phoneme. Comparing the structure of phonological networks across different languages could provide insights into linguistic typology and the cognitive pressures that shape lan- guage acquisition, evolution, and processing. However, previous studies have not considered how statistics gathered from these networks are affected by factors such as lexicon size and the distribution of word lengths. We show that these factors can substantially affect the statistics of a phonological network and propose a new method for making more robust comparisons. We then analyse eight languages, finding many commonalities but also some qualitative differences in their lexicon structure.

Southeast Asia is often considered a quintessential Sprachbund where languages from five different language phyla have been converging typologically for millennia. One of the common features shared by many languages of the area is tone: several major national languages of the region have large tone inventories and complex tone contours. In this paper, we suggest a more fine-grained view. We show that in addition to a large number of atonal languages, the tone languages of the region are actually far more diverse than usually assumed, and employ phonation type contrasts at least as often as pitch. Along the same lines, we argue that concepts such as tone and register, while descriptively useful, can obscure important underlying similarities and impede our understanding of the behavior of phonetic properties, typological regularities and diachrony. We finally draw the reader’s attention to some issues of current interest in the study of tone and phonation in Southeast Asia and describe some technical developments that are likely to allow researchers to address new lines of research in years to come.

We report new experimental evidence on consonant- induced F0 perturbations in two languages with prevoiced stops, French and Italian. A positive correlation between duration of voicing lead and F0 at the onset of post-release voicing is observed, consistent with the predictions of an automatic or biomechanical account of the source of this effect. While the findings do not strictly rule out a role for onset F0 as a controlled enhancement, they support the proposal that, if anything, the enhancement is of [-voice] or [stiff] rather than [+voice].

@incollection{ kirby2015stop,
author = {James Kirby and D. Robert Ladd},
year = {2015},
title = {Stop voicing and F0 perturbations: evidence from French and Italian},
booktitle = {Proceedings of the 18th International Congress of Phonetic Sciences},
address = {Glasgow}
}

Madurese is a language with a three-way laryngeal contrast and an unusual consonant-vowel co-occurrence restriction. We provide new data on the phonetic realisation of Madurese stops from a sample of 15 native speakers by examining VOT, f0 and two acoustic correlates of voice quality, H1*-H2* and H1*-A3*. Our data indicate that while f0 distinguishes voiced from voiceless (aspirated and unaspirated) stops, at least one voice quality measure contrasts voiced and voiceless aspirated stops with voiceless unaspirated stops, suggesting that the relationship between these features may be more complex than has previously been assumed. Madurese appears to be best described as 'register system' of the Mon-Khmer type, albeit one in which pitch and voice quality are dissociated.

Why do human languages change at some times, and not others? We address this longstanding question from a computational perspective, focusing on the case of sound change. Sound change arises from the pronunciation variability ubiquitous in every speech community, but most such variability does not lead to change. Hence, an adequate model must allow for stability as well as change. Existing theories of sound change tend to emphasize factors at the level of individual learners promoting one outcome or the other, such as channel bias (which favors change) or inductive bias (which favors stability). Here, we consider how the interaction of these biases can lead to both stability and change in a population setting. We find that population structure itself can act as a source of stability, but that both stability and change are possible only when both types of bias are active, suggesting that it is possible to understand why sound change occurs at some times and not others as the population-level result of the interplay between forces promoting each outcome in individual speakers. In addition, if it is assumed that learners learn from two or more teachers, the transition from stability to change is marked by a phase transition, consistent with the abrupt transitions seen in many empirical cases of sound change. The predictions of multiple-teacher models thus match empirical cases of sound change better than the predictions of single-teacher models, underscoring the importance of modeling language change in a population setting.

Mainland Southeast Asia (MSEA) is often described as the quintessential Sprachbund in which languages belonging to different language families converge as a result of contact. In this paper, we look in detail at the evidence for convergence of a specific phonological feature, tone, as expressed by two of its phonetic correlates, pitch and voice quality. Based on a database of 197 languages and dialects, we assess the extent of tonal diversity in MSEA languages and construct a statistical model of the degree to which tonal inventories can be predicted on the basis of geographic proximity, genealogical relatedness and population size. We find that the most robust predictors of tonality in MSEA languages are family and word shape.

Onset clusters in Khmer (Cambodian) often appear with an acoustic transition between consonants, but the phonological status of these elements is indeterminate. If transitions result from gestural separation, they may disappear in fast speech. Acoustic analysis of data from 10 speakers shows that vocalic transitions in Khmer are found in largely predictable set of consonantal contexts. While their presence is modulated by speech rate, they never disappear completely, in some cases becoming more rather than less frequent in fast speech. Clusters containing transitions are generally longer in duration than those that do not, and are also longer than monosyllables containing a lexical schwas, but the transitions do not show any spectral evidence of a distinct gestural target. The possible interpretations of these findings are discussed in the context of the range of articulatory variation known to occur in the implementation of speech rate.

Unlike many languages of Southeast Asia, Khmer (Cambodian) is not a tone language. However, in the colloquial speech of the capital Phnom Penh, /r/ is lost in onsets, reportedly supplanted by a range of other acoustic cues such as aspiration, a falling- or low-rising f0 contour, breathy voice quality, and in some cases diphthongization, e.g. /krɑː/ ‘poor’ > [kɔ̀ɑ], [kʰɔ̌ɑ], [kɔ̤ɑ̤], /kru:/ ‘teacher’ > [khùː] [kʰǔː], [kṳː]. This paper presents the results of production and perception studies designed to shed light on this unusual sound change. Acoustic evidence shows that colloquial /CrV/ forms differ from reading pronunciation forms in terms of VOT, f0, and spectral balance measures, while a pair of perceptual studies demonstrate that f0 is a sufficient cue for listeners to distinguish underlying /CrV/-initial from /CV/-initial forms, but that F1 is not. I suggest that this sound change may have arisen via the perceptual reanalysis of changes in spectral balance, coupled with the coarticulatory influence of the dorsal gesture for /r/.

In the colloquial Phnom Penh dialect of Khmer (Cambodian), lexical use of F0 is emerging together with an intermediate VOT category and breathy phonation following the loss of /r/ in onsets (e.g. /kruː/ ‘teacher’ > [khṳ̀ː]). I show how this incipient tonogenesis might arise in a series of computational simulations tracing the evolution of multivariate phonetic category distributions in a population of ideal observers. Acoustic production data from a fieldwork study conducted in Phnom Penh was used as the starting point for the simulations. After establishing that the basic framework predicted relative stability over time, two possible responses to a phonetic production bias were considered: one in which agents correctly identified the source of (and thereby compensated for) the effects of the bias, and one in which agents misattributed the acoustic effects of the bias as a property of the onset. Good qualitative fits to the empirical production data were found for the latter group of learners, while the outcome for compensating learners resembled production data from a related dialect. These results are consistent with the sudden and discontinuous nature of many sound changes, and suggest that what appear to be enhancement effects may also emerge under different assumptions about the number of cue dimensions accessible to or deemed relevant by the learner.

It has been claimed that the long established neutralization of the voicing distinction in domain final position in German is phonetically incomplete. However, many studies that have advanced this claim have subsequently been criticized on methodological grounds, calling incomplete neutralization into question. In three production experiments and one perception experiment we address these methodological criticisms. In the first production study, we address the role of orthography. In a large scale auditory task using pseudowords, we confirm that neutralization is indeed incomplete and suggest that previous null results may simply be due to lack of statistical power. In two follow-up production studies (Experiments 2 and 3), we rule out a potential confound of Experiment 1, namely that the effect might be due to accommodation to the presented auditory stimuli, by manipulating the duration of the preceding vowel. While the between-items design (Experiment 2) replicated the findings of Experiment 1, the between-subjects version (Experiment 3) failed to find a statistically significant incomplete neutralization effect, although we found numerical tendencies in the expected direction. Finally, in a perception study (Experiment 4), we demonstrate that the subphonemic differences between final voiceless and “devoiced” stops are audible, but only barely so. Even though the present findings provide evidence for the robustness of incomplete neutralization in German, the small effect sizes highlight the challenges of investigating this phenomenon. We argue that without necessarily postulating functional relevance, incomplete neutralization can be accounted for by recent models of lexical organization.

This paper explores the learnability of covert contrasts (impressionistically homophonous categories that can be reliably distinguished at the phonetic level) through a series of model-based clustering simulations. Allowing the models to learn both the number and parameters of those categories provides a way to explore the potential stability of category structures. The results indicate that while a statistical learner can be quite effective at inducing covert contrasts, success depends crucially on the number and distributional characteristics of the relevant cue dimensions.

We consider the problem of language evolution in a population setting, focusing on the case of continuous parameter learning. While theories of phonetic change tend to emphasize the types of transmission errors that could give rise to a shift in pronunciation norms, it is challenging to develop a model that allows for both stability as well as change. We model the acquisition of vowel-to-vowel coarticulation in both single- and multiple-teacher settings, considering progressively more restrictive prior learning biases. We demonstrate that both stability and change are possible at the population level, but only under fairly strong assumptions about the nature of learning and production biases.

This chapter argues for the role of probabilistic enhancement in phonologization through computational simulation of an ongoing sound change in Seoul Korean. Two challenges faced by a phonologization model of sound change are addressed: explaining which cues are selected for phonologization, and explaining why phonologization is often accompanied by dephonologization. It is proposed that cues are targeted for enhancement as a probabilistic function of their statistical reliability in signaling a contrast. Simulation results using empirically derived cue values are taken to support the idea that loss of contrast precision may drive the phonologization process.

2012

This study investigated the persistence of phonetic cue restructuring in a naturalistic learning environment. 17 native English speaking L2 learners of Korean were tracked over an 8 week period to explore the time course of acquisition of novel phonological contrasts signaled by VOT and f0. Production and perception results suggest that learners can quickly learn to direct attention to a novel dimension even in the absence of explicit feedback, and that continued exposure has a small but significant impact on performance: participants were able to exert more accurate control over L2 phonetic dimensions over the course of the experiment.

2011

This paper explores the learnability of covert contrasts (impressionistically homophonous categories that can be reliably distinguished at the phonetic level) through a series of model-based clustering simulations using human production data. Allowing the models to learn both the number and parameters of those categories provides a way to explore the potential stability of category structures. The results indicate that while a statistical learner can be quite effective at inducing covert contrasts, success depends crucially on the number and distributional characteristics of the relevant cue dimensions.

The computational task of language learning has long been a central issue in theoretical linguistics, and most work has focused on its monolingual formulation, in which the learner's sample is drawn from a single target language. This paper considers a minimal extension of the usual monolingual formulation to accommodate the multilingual setting, and presents a novel strategy for discriminating and learning languages within it by clustering grammatical properties according to their co-occurrence in the sample. The heuristic that we propose is generic in the sense that it is applicable within any parameterized linguistic theory for which it is feasible to compute the possible parameter-settings implied by observing a single input-output mapping; for purposes of concreteness and evaluation, we present the algorithm within the framework of Optimality Theory, using syllable structure grammars as a case study.

Evans (2001a, 2001b) argues that modern Southern Qiang (SQ) developed tones through a somewhat typologically unusual pathway: after developing pitch accent from earlier lexical stress, the languages became increasingly ‘tone-prone’ following phonological reduction of syllables and the segmental inventory (Matisoff, 1998), developing tonal systems after heavy borrowing from Mandarin. Here, I suggest that otherwise phonologically conservative Taoping Qiang also shows evidence of more ‘traditional’ tonogenetic mechanisms, which may have conditioned a tone split from the original *H reflex.

Vietnamese, the ofﬁcial language of Vietnam, is spoken natively by over seventy-ﬁve
million people in Vietnam and greater Southeast Asia as well as by some two million
overseas, predominantly in France, Australia, and the United States. This IPA illustration gives an overview of the phonetics and phonology of the Hanoi dialect.

2010

Changes to the realization of phonetic cues, such as vowel length or voice onset time, can have differential effects on the system of phonological categories. In some cases, variability or bias in phonetic realization may cause a contrast between categories to collapse, while in other cases, the contrast may persist through the phonologization of a redundant cue (Hyman, 1976). The goals of this dissertation are to better understand the subphonemic conditions under which a contrast is likely to survive and when it is likely to collapse, as well as to understand why certain cues are more likely to be phonologized than others.

I explore these questions by considering the transmission of speech sounds over a noisy channel (Shannon and Weaver, 1948), hypothesizing that when the precision of a contrast along one acoustic dimension is reduced, other dimensions may be enhanced to compensate (the probabilistic enhancement hypothesis). Whether this results in phonologization or neutralization depends on both the degree to which the contrast is threatened as well as the informativeness of the cues that signal it.

In order to explore this hypothesis, phonological categories are modeled as finite mixtures, which provide a natural way to generate, classify, and cluster objects in a multivariate setting. These mixtures are then embedded in an agent-based simulation framework and used to simulate the ongoing process of phonologization of pitch in Seoul Korean (Silva, 2006a,b; Kang and Guion, 2008). The results demonstrate that adaptive enhancement can account for both cue selection as well as the appearance of cue trading in phonologization. Additional data from the incomplete neutralization of final voicing in Dutch (Warner, Jongman, Sereno and Kemps, 2004) are then used to show how variation in phonetic realization can influence the loss or maintenance of phonological categories. Together, these case studies illustrate how variation in production and perception of subphonemic cues can impact the system of phonological contrasts.

This study investigated the perceptual dimensions of tone in Vietnamese and the effect of dialect experience on listener’s prelinguistic perception of tone. While Northern Vietnamese tones are cued by a combination of pitch and voice quality, Southern Vietnamese tones are purely pitch based. 30 listeners from two Vietnamese dialects (10 Northern, 20 Southern) participated in a speeded AX discrimination task using northern stimuli. The resulting reaction times were used to compute an INDSCAL multidimensional scaling solution and were submitted to hierarchical clustering analysis. While the analysis revealed a similar three-dimensional perceptual space structure for both listener groups, corresponding roughly to f0 offset, voice quality, and contour type, the relative salience of these dimensions varied by dialect: Southern listeners were more likely to confuse tones produced with nonmodal voice quality, whereas Northern listeners found tones with similar pitch excursions to be more confusable. The results of hierarchical clustering of the stimuli further support an analysis where low-level perceptual similarity is influenced by primary dialect experience.

2009

In Vietnamese quantity comparison structures, differentials are prohibited from appearing phrase-internally. I argue this is because they are athematic measure phrases. However, this leads to a semantic type clash given the meaning of the comparative. I propose to resolve this by means of a comparative-induced event measure relation which type-shifts the predicate in the appropriate context. This relation is also shown to be active in English, suggesting that it may be a more general property of predicates cross-linguistically.

Previous studies have shown phonetic variation can be lexically conditioned (Wright, 1997; Munson and Solomon, 2004; Munson, 2007; Scarborough, 2006). Morphological paradigms have also been implicated in phonetic variation (Steriade, 2000; Kuperman et al., 2007). This paper investigates the nature of morphological paradigm effects on vowel production in German verbs. We report the results of a production experiment showing that, while paradigmatic complexity affects vowel dispersion, the effect is mediated by word frequency.

This paper reports the results of a wordlikeness task designed to investigate Cantonese speakers’ gradient phonotactic knowledge of systematic versus accidental phonotactic gaps. Regression analyses found that wordlikeness judgments correlate with token frequency-weighted neighborhood density and transitional (bigram) probability. This is suggested to be an effect of the relative phonological densities of the Cantonese and English lexica.