2
University of South Carolina, Communication Sciences and Disorders Arnold School of Public Health, United States

Introduction. Speakers’ acoustic profile carries significant linguistic and non-linguistic information. Employed in clinical practice, it can provide behavioral markers for a quick assessment of primary progressive aphasia (PPA). PPA is a complex language syndrome where different speech and language properties such as prosody, lexical retrieval, and motor speech functioning may be affected. It is classified into three main variants: the nonfluent (nfvPPA), semantic (svPPA), and logopenic (lvPPA). Primary progressive apraxia of speech (PPAOS) is also distinguished (Duffy et al. 2017) but may fall into the category of nfvPPA (Gorno-Tempini et al. 2011). The present study aims to determine the contribution of the acoustic properties of vowels, prosody, and voice quality in the classification of PPA variants by using machine learning models.
Methods. Oral samples from picture description tasks of 50 individuals with PPA (lvPPA:17, svPPA:14, nfvPPA:11, PPAOS:8) were automatically transcribed and segmented into vowels and consonants using the new acoustic analysis platform THEMIS. From the segmented vowels, we measured: i. Vowel formants (F1…F5) (den Ouden, et al. 2017); ii. vowel duration (Duffy, et al., 2017); iii. Mean fundamental frequency (F0), min F0 and max F0 (Hillis, 2014); iv. Pause duration (Mack et al. 2015), and v. H1–H2, H1–A1, H1–A2, H1–A3 measures of voice quality. We compared three machine learning models: support vector machines (SVM) (Cortes and Vapnik, 1995), random forests (RF) (Breiman, 2001), and decision trees (DT) (Hastie et al. 2009) in an one-against all strategy, where each variant was tested against all others. We run all models with a 3-fold group-cross-validation to ensure that the speakers in the training and evaluation sets are different. The models were implemented in Python (Pedregosa et al. 2011).
Results. We report the mean cross-validated accuracy of the best performing model that resulted from model comparison: i. RF model provided the highest classification accuracy for nfvPPA [Mean 82%, SD: 9%], ii. SVM had the highest accuracy for svPPA [Mean 66%, SD: 8%], iii. RF had the highest accuracy for lvPPA [Mean 57%, SD: 15%] and iv. RF provided the highest classification accuracy for PPAOS [Mean 80%, SD: 8%] (Figure 1). In all models, pause duration and F0 measures were ranked higher than most other features (Figure 2).
Discussion. This study employed an innovative method for the classification of PPA variants, using an automated speech transcription, segmentation, feature extraction and modeling. Using just acoustic features the best model classified nfvPP, svPPA, and PPAOS with high accuracy. However, acoustic features alone could not classify lvPPA with such high accuracy. More linguistic markers might be needed for a more accurate classification of lvPPA. Furthermore, we showed that prosody, which is measured by fundamental frequency and pause duration, contributes more than any other factor to the classification of PPA variants as alluded in previous research by our group and others (Hillis 2014, Patel et al. 2018, Mack 2015). Finally, the findings demonstrate the potential benefit of using machine learning models in clinical practice for the subtyping of PPA variants.