SRPP du Vendredi 29 Septembre 2017

Résumé :
Phonetic analyses often involve studies of contrastiveness, the degree to which two elements (such as atomic sounds or words) differ along some perceptually relevant dimension(s). To take just one example, when studying sound changes in progress, the diachronic linguist observes that a section of the population ceases to use an acoustic dimension, which may or may not reduce discriminability of a sound, leading to (never-ending) debates regarding the completeness of neutralization. When seeking to measure contrastiveness, phoneticians today typically have only two alternative approaches within their reach. One is to make an informed decision regarding which phonetic dimension may be involved in the contrast at hand, and then hand-annotate or find an automatic way of measuring this dimension in all the tokens under study. The second alternative is to fall back onto naïve humans’ judgments, and set up a discrimination or classification experiment. There are pro’s and con’s to both of these alternatives, including a common disadvantage : Both are fairly resource-intensive, either in terms of trained annotators or in terms of the effort required for collecting experimental judgments, a disadvantage that is particularly salient for large corpora. I will present a third method that, like human judgments, can provide global contrastiveness estimates, but at a much lower cost, and in a manner that facilitates replication and extension. Specifically, we developed a machine-based ABX task (Schatz et al. 2013 ; Schatz 2016), which works as follows : a first stage identifies all possible ABX triplets in a corpus, where A and B are tokens from two different categories and X is another token of one of the two categories (e.g., /ta1-ti-ta2/). Each token is represented by a set of acoustic or articulatory dimensions. The algorithm then compares the representation of X against that for A and B, and returns an “A” response if X is closer in this (multidimensional) space to the A token than the B token, and “B” otherwise. This response is evaluated against the true category membership (in the example, the correct response is indeed A), and this is repeated for all possible triplets, and then averaged into an accuracy score. The crucial innovative aspect of this task is that it is completely agnostic to the choice of input representation, the only requirement being that the user provide a reasonable way of measuring similarity between tokens. The task can thus be run on phonetically defined dimensions (such as VOT and f0, extracted by hand) as well as on more holistic acoustic measures (such as mel-based spectral representations). It is agnostic not only because it provides a contrastiveness measure that is mathematically well-defined for essentially any input format, but also because, in practice, given a finite sample of speech stimuli, our ability to reliably estimate this ideal measure is not affected by the particular choice of input format. To be more precise, we exhibited a computationally tractable estimator for our measure of contrastiveness whose form and rate of convergence do not depend on the choice of representation and dissimilarity function and which is unbiased and with minimal variance among all unbiased estimators (Schatz, 2016). I will provide examples of application to the study of variability factors in speech such as speaker, phonetic context or speech register (Martin et al. 2015, Bergmann et al. 2016, Schatz et al. 2017) and for comparing speech representations (Schatz et al. 2013, Schatz et al. 2014, Schatz et al. in preparation).