This study investigated changes in vowel production and perception among university students from the north of England, as individuals adapt their accent from regional to educated norms. Subjects were tested in their production and perception at regular intervals over a period of : before beginning university, later, and at the end of their first and second years at university. At each testing session, subjects were recorded reading a set of experimental words and a short passage. Subjects also completed two perceptual tasks; they chose best exemplar locations for vowels embedded in either northern or southern English accented carrier sentences and identified words in noise spoken with either a northern or southern English accent. The results demonstrated that subjects at a late stage in their language development, early adulthood, changed their spoken accent after attending university. There were no reliable changes in perception over time, but there was evidence for a between-subjects link between production and perception; subjects chose similar vowels to the ones they produced, and subjects who had a more southern English accent were better at identifying southern English speech in noise.

Listeners’ ability to understand speech in adverse listening conditions is partially due to the redundant nature of speech. Natural redundancies are often lost or altered when speech is filtered, such as done in AI/SII experiments. It is important to study how listeners recognize speech when the speech signal is unfiltered and the entire broadband spectrum is present. A correlational method [R. A. Lutfi, J. Acoust. Soc. Am.97, 1333–1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am.95, 423–424 (1994)] has been used to determine how listeners use spectral cues to perceive nonsense syllables when the full speech spectrum is present [K. A. Doherty and C. W. Turner, J. Acoust. Soc. Am.100, 3769–3773 (1996); C. W. Turner et al., J. Acoust. Soc. Am.104, 1580–1585 (1998)]. The experiments in this study measured spectral-weighting strategies for more naturally occurring speech stimuli, specifically sentences, using a correlational method for normal-hearing listeners. Results indicate that listeners placed the greatest weight on spectral information within bands 2 and 5 (562–1113 and ), respectively. Spectral-weighting strategies for sentences were also compared to weighting strategies for nonsense syllables measured in a previous study (C. W. Turner et al., 1998). Spectral-weighting strategies for sentences were different from those reported for nonsense syllables.

Native English speakers were trained to identify Japanese vowel length in three types of training differing in sentential speaking rate: slow-only, fast-only, and slow-fast. Following Pisoni and Lively’s high phonetic variability hypothesis [Pisoni, D. B., and Lively, S. E., Speech Perception and Linguistic Experience, 433–459 (1995)], higher stimulus variability by means of training with two rates was hypothesized to aid learners in adapting to speech rate variation more effectively than training with only one rate. Trained participants identified the length of the second vowel of disyllables, short or long, embedded in a sentence of the respective rate, and received immediate feedback. The three trained groups’ abilities before and after training were examined with tests containing sentences of slow, normal, and fast rates, and were compared with those of a control that was not trained. A robust effect of slow-fast training, a marginal effect of slow-only training, but no significant effect of fast-only training were found in the overall test scores. Slow-fast and slow-only training showed small advantages over fast-only training on the fast-rate test scores, while effects for all three training types were found on the slow- and normal-rate test scores. The degree to which the results support the high phonetic variability hypothesis is discussed.

Humans were trained to categorize problem non-native phonemes using an animal psychoacoustic procedure that trains monkeys to greater than 90% correct in phoneme identification [Sinnott and Gilmore, Percept. Psychophys.66, 1341–1350 (2004)]. This procedure uses a manual left versus right response on a lever, a continuously repeated stimulus on each trial, extensive feedback for errors in the form of a repeated correction procedure, and training until asymptotic levels of performance. Here, Japanese listeners categorized the English liquid contrast /r-l/, and English listeners categorized the Middle Eastern dental-retroflex contrast /d-D/. Consonant-vowel stimuli were constructed using four talkers and four vowels. Native listeners and phoneme contrasts familiar to all listeners were included as controls. Responses were analyzed using percent correct, response time, and vowel context effects as measures. All measures indicated nativelike Japanese perception of /r-l/ after 32 daily training sessions, but this was not the case for English perception of /d-D/. Results are related to the concept of “robust” (more easily recovered) versus “fragile” (more easily lost) phoneticcontrasts [Burnham, Appl. Psycholing.7, 207–240 (1986)].