Abstract

The research deals with natural perception of word boundaries by native speakers of Standard Russian. A specific feature of Russian word rhythmic structure is a so-called “prosodic core”: not only stressed, but also first pre-stressed vowels differ in duration and quality from vowels that occur in other positions, a phenomenon that is also commonly described as two degrees of reduction. The purpose of this study is to find out whether native Russian speakers are able to use acoustic differences between vowels [ɐ] (Degree 1 reduction) and [ə] (Degree 2 reduction) in order to recognize word boundaries correctly. The stimuli for the experiment were nonce words, five-syllable sequences including two stressed vowels; they were presented to the participants of the experiment in a form of fictional foreign names. The listeners were asked to choose between two possible ways of segmentation of these fivesyllable sequences into a first name and a second name of a person. The results of the experiment show that native Russian speakers used the acoustic differences between vowels for segmentation, but the results were statistically significant only for some of the stimuli. However, for half of stimuli the listeners performed correct segmentation at chance level. In addition, artificial modification of first pre-stressed vowel duration was performed for some of the stimuli; the participants’ responses show that vowel duration influences the degree of success in the segmentation task.

Keywords

Introduction The successful perception of speech in native and foreign language consists of several stages, one of them is the ability to define word boundaries within a phrase. The necessity and importance of this ability is due to the hypothesis that “phonetic word is the basic unit of the listener’s vocabulary, in particular with regard to Russian speakers” (Riekhakainen, 2016: 56); “we do not have the infinite storage space which would be required to contain a representation of every utterance with which we might possibly be presented. Therefore segmentation is a necessary operation…” (Cutler & Butterfield, 1992: 232). The sources of information about word boundaries in native language (in the absence of pauses which are, however, relatively unreliable sources, too), are rhythmic structure, phonotactic constraints, lexical knowledge and phonetic detail» (Weber & Broersma, 2012: 6-7). One of the basic examples that show the role of lexical knowledge in segmentation is the case when a listener that is presented with a long sequence of sounds unambiguously recognizes an embedded word of his mother tongue which in turn helps him define the boundaries of two adjacent words. More complex mechanisms based on mental lexicon are contextual predictability and word frequency. Phonotactic information can be involved when listener identifies sequences of two or more phonetic segments that cannot occur in a position within a phonetic word in his language. For example, in Standard Russian the sequences of two consonants that differ only in the feature hardness/softness are only allowed at word boundaries and at the boundary of a clitic and a host (Panov, 1979: 170). Thus, a native speaker of Russian presented with the sound sequence [ˈtɨllʲʊ…] might be able to detect the boundary without first accessing rather infrequent word /ˈtɨl/ ‘backing, support’ in his mental lexicon. Yet question remains whether this relates to consonant sequences in Russian that are not prohibited within word boundaries but are only present in rare words, such as the suggested in (Panov, 1979) sequence [͡tsf] that can occur at word boundaries but only is present in proper nouns such as the surname Tsfasman. The phonetic details mentioned in (Weber & Broersma, 2012) are the differences based on allophones of the same phoneme. The traditional example of this kind is English minimal pair keeps parking and keep sparking where presence/absence of aspiration in the bilabial plosive marks the place of the word boundary. The experiments reported in (Altenberg, 2006) and (Ito & Strange, 2009) have shown that English speakers use this phonetic detail for segmentation with a significantly higher rate of success than Spanish and Japanese learners of English. Russian language provides comparable cases, for example, the phonetic realization of word-initial open vowel /a/ after a preceding word ending with soft consonant (in the absence of a pause). The formant measurements of vowels in this position showed that “minimal differences in vowel quality… are sufficient for marking word boundaries” (Moiseeva, 2015: 6). Rhythmic structure is the most reliable source of information for segmentation, in particular in case of noise-masked speech (Cutler & Butterfield, 1992: 226). Its delimitative role is preserved even in languages with lexical stress. The experiments conducted in (Cutler & Norris, 1988; Cutler & Butterfield, 1992) show that for the native speakers of English “strong syllables (containing full vowels) are most likely to be the initial syllables of lexical words, whereas weak syllables (containing central, or reduced, vowels) are nonword-initial, or, if word-initial, are grammatical words”. This segmentation strategy is explained by the frequency effect of different rhythmic models of English words: an analysis of spontaneous British English corpus in (Cutler & Carter, 1987: 133) showed that “in this corpus, 90% of lexical words were found to begin with strong syllables”. Russian language, like English, has variable stress, but no indications of comparable predictability level can be found. Thus, the delimitative function of stress is not as strong as in English. On the other hand, with regard to word rhythmic structure, Modern Standard Russian has a specific feature: “disyllabic prosodic core, where stressed and first pre-stressed syllables are contrasted with all other syllables” (Knyazev & Pozharitskaya, 2005: 123). This rhythmic structure imposes restrictions on the repertory of vowels that can be present in certain syllables. We suggested that these constrictions can have a delimitative function in Russian: if the sequence of two stressed vowels indicates the presence of word boundary, then hypothetically the sequence of a syllable with Degree 2 reduction and a stressed syllable can indicate the boundary as well because such disyllabic sequences are prohibited by Russian phonotactics. It is necessary to mark here that a subsystem of Russian complex words such as стоп-кран ‘emergency brake’ is left aside here. In these lexemes, according to (Knyazev, 2015: 277), “in position without phrasal accent and in pronunciation without the stress on the first syllable, phonemes /o/ and /e/ in first pre-stressed syllable have phonetic realization [ə]”; these units, however, should be regarded as two different phonetic words. Purpose The purpose of the experiment was to find out whether speakers of Standard Russian can utilize the information about word boundaries that is contained in Russian word rhythmic structure when all other sources of information for segmentation (including lexical) are absent from the signal. In particular, whether the acoustic differences between the sequences “a syllable with [ə] + a syllable with a stressed vowel” (təˈta, prohibited by Russian phonotactics within a word) and “a syllable with [ɐ]1 + a syllable with a stressed vowel” (tɐˈta, allowed by Russian phonotactics within a word) can be perceived by Russian speakers and used for segmentation purposes. In order to achieve the purpose a perceptive experiment was conducted. Methods and materials The methods of experimental phonetics were used in the present research. The experiment included two parts: productive and perceptive stages. During the first stage the phrases including stimuli were recorded from one native speaker. Acoustic features of vowels in his pronunciation were analyzed by means of Praat software (Boersma & Weenink, 2017). During the second stage a perceptive experiment was conducted: 30 participants were instructed to segment the stimuli. Acoustic differences between different vowels and listeners’ responses were analyzed with statistical methods. Results The research confirmed that in the recorded pronunciation of the speaker Degree 1 and Degree 2 reduction vowels significantly differ in duration and F1 frequency. The perceptive experiment has shown that listeners used these acoustic keys inconsistently when they were asked to segment nonce-words. For half of natural experimental stimuli (4 out of 8) the number of responses containing correct segmentation didn’t differ significantly from chance level. On the other hand, for 3 other natural stimuli (and also for 2 out of 3 stimuli containing artificially modified vowels) the result was statistically significant. Discussion Due to the need to exclude the influence of lexical knowledge on the listeners doing the segmentation task, nonce-words were chosen as stimuli2. The stimuli in question were the sequences of five open syllables first presented in orthography with stress marked by acute. Every sequence included two stressed syllables, therefore, all of them could be identified as two phonetic words of different length. Every syllable included a hard consonant т [t], к [k] or с [s] and a vowel а or о (о was used in order to make noncewords less monotonous and was only present in final unstressed syllables). The choice of syllable structure and obstruent consonants was intended to facilitate further sound segmentation and duration measurements (Kuznetsov & Ott, 1989). Four minimal pairs of nonce-words based on these criteria were formed, in Standard Russian they are supposed to be pronounced as [ˈtаkə#sɐkˈаtə] and [ˈtаkəsə#ˈkаtə]; [ˈkаsə#tɐˈsаkə] and 1 The symbol [ɐ] that refers to the open vowel in the first pre-stressed syllable after a hard consonant is used in the present paper following (Kasatkina, 2005). 2 Homophonic phrases, such as это ж[ə] над[ə] было (possible translation ‘it was necessary’) and это ж[ɨ]на д[ɐ]бы´ла ‘wife has gained it’, could have served as another possible source of stimuli but the number of such phrases was considered not sufficient for the present experiment. [ˈkаsətə#ˈsаkə]; [tɐˈkа#sɐˈkаtə] и [тɐkаˈsə#ˈkаtə]; [kɐˈsа#tɐˈkаtə] и [kɐsˈаtə#ˈkаtə]. To make both production and perception tasks easier for participants of the experiments all these nonce-words were presented to them in Russian orthography as the names and surnames of people from some fictional country: Та´ка Сака´то and Та´каса Ка´то (Taka Sakato vs. Takasa Kato), Каса´ Такат´ etc. · and Касат´ а Кат´ · (Kasa Takato vs. Kasata Kato), The experimental material was further expanded by adding two groups of fillers. Firstly, in order to control listeners’ involvement, eight stimuli with the only acceptable segmentation were added (for example, sequences containing two stressed syllables in a row): [sɐˈta#ˈkatəkə] (Сата´ Ка´тако). Secondly, several pairs of “ambiguous” stimuli were added: the sequences where no delimitative information is present, consequently, both segmentation answers had equal chances to be chosen: [ˈkatə#kəsɐˈka] and [ˈkatəkə#sɐˈka] (Ка´та Касака´ and Ка´така Сака´). In total, 20 stimuli were selected; every “fictional name” was embedded in the final part of a carrier phrase. The need to control phrasal position of stimuli is due to the influence of phrasal position on the acoustic characteristics of vowels (Knyazev 2006). The phrases were presented in a pseudo-random order; thus a text was produced. The beginning of this text is presented below: Здравствуйте, меня зовут Та´са Ка´сата (Hello, my name is Tasa Kasata) Моего друга зовут Ка´са Таса´ка (My friend’s name is Kasa Tasaka) Вас ждет господин Сатака´ Сата´ (You are expected by Mr. Sataka Sata), etc. The text was recorded from speaker D.B., a 25-year-old native Russian speaker, a Muscovite with a degree in Russian philology but no professional interest in phonetics or linguistics. Four recordings were made in total: two sessions of two recordings with an interval of two weeks. The first two correct pronunciations of every phrase were further used for the analyses. The text was presented in a form of PowerPoint slides, every slide contained one phrase in orthography with marked stress. The informant reported no difficulties in reading the nonce-words. The acoustic characteristics of vowels in 40 phrases were analyzed in Praat software (Boersma & Weenink, 2017). Vowel durations and frequencies of F1 and F2 at the center of vowels were measured. Only the first four vowels in the stimuli were analyzed; the last vowels in all phrases were not included in the experiment due to the possible effects of prosodic domain strengthening (Fougeron & Keating, 1997). For example, it was previously shown that in Russian «stressed and post-stressed word-final vowels lengthen in words at the end of phrases (Kachkovskaya, 2014: 68). The results of these measurements for all 160 vowels are presented in Table 1. Table 1 Mean duration, F1 and F2 frequencies for vowels in nonce-words pronounced by speaker D.B. (with standard deviations) Vowel Number of measurements Duration, ms F1, Hz F2, Hz stressed [а´] 68 106 (18) 704 (56) 1316 (38) first pre-stressed [ɐ] 44 60 (7) 573 (60) 1421 (62) other non-stressed [ə] 48 37 (8) 454 (39) 1452 (88) The qualitative and quantitative differences between stressed, first pre-stressed and other unstressed vowels have been previously studied experimentally. In particular, common acoustic features of stressed [a] and Degree 1 [ɐ] were shown. An experiment in (Knyazev, 2006) showed that the acoustic contrast between the prosodic core vowels in Standard Russian is significant only under phrasal accent and only in terms of duration. Significant durational differences (and no differences in F1 frequency) between the prosodic core vowels were reported in (Barnes, 2006: 65). Also, in this study strong correlation between duration and F1 of unstressed vowels lead to the conclusion that Russian has “one phonological process (Degree 1 reduction) and one phonetic reduction process (Degree 2 reduction)”. Different results are reported in (Padgett & Tabain, 2005), where first pre-stressed vowels were shown to be significantly shorter than stressed vowels in all positions after hard consonants, while other unstressed vowels (Degree 2 reduction [ə]) in the same contexts were significantly shorter than [ɐ] for 7 out of 9 speakers. As for the qualitative characteristics, significant differences in F1 and F2 frequency were shown for all degrees of reduction. The purpose of the present experiment was mostly to study perception, so the experimental material was restricted to vowels in open syllables pronounced by only one speaker of Standard Russian. The results of acoustic analysis resemble the findings of (Padgett & Tabain, 2005). Single-factor analysis of variance has shown significant differences for all pairs of variables (duration, F1 and F2 frequencies for all three vowels [а´], [ɐ] and [ə]), with the exception of the difference in F2 frequency between Degree 1 and Degree 2 reduction vowels [ɐ] and [ə] that was slightly above p-value (p = .056). Therefore, for speaker D.B. the three studied groups of vowels differ in duration and F1 frequency; stressed vowels differ in F2 frequency from unstressed vowels. Differences in F1 and F2 measurements are illustrated schematically at Figure. Figure. Formant frequencies of studied vowels measured in central stable part, Hz. Horizontal axis - F2 frequency, vertical axis - F1 frequency The analysis of vowels pronounced by the speaker has shown that in his pronunciation Degree 1 and Degree 2 reduction vowels differ in duration and F1 frequency. A perceptive experiment was held in order to find out whether native Russian speakers can utilize these acoustic differences while performing a task of segmentation. For each of 20 experimental phrases one pronunciation was chosen as a stimulus for perceptive experiment. While the choice of filler phrases was random, the choice of stimuli that included the difference between [ɐ] and [ə] was based on the acoustic characteristics of the studied vowels. The tokens in which vowels that presumably contained keys for segmentation had “typical” (nearest to the mean values shown in Table 1) duration and F1 frequency for [ɐ] and [ə] were chosen for this part of the experiment. In addition, three stimuli with artificially modified vowels were included in the perceptive experiment: 1) [ɐ] with the duration increased to 86 ms; 2) [ə] with the duration increased to 74 ms; 3) [ɐ] with the duration reduced to 40 ms. Manipulations with duration were made in Praat by copying whole periods of oscillations in the central part of the vowel. The modified stimuli sounded naturally; tested informants were unable to distinguish them from non-modified stimuli. In total, the material of the perceptive experiment included 23 phrases, each of them included one stimulus (“name of a foreigner”): 12 filler phrases, 8 non-modified target phrases and 3 target phrases with modified vowel duration. The phrases were presented to the participants of the experiment twice with five-second intervals. The informants were asked to answer the question “What were the name and the surname of the person that you have heard?”. The informants had to choose one of two possible variants of segmentation for each phrase, e.g. Така Саката (Taka Sakata) or Такаса Ката (Takasa Kata). The stress was not marked in the forms that the participants were asked to fill in. 30 native Russian speakers took part in the experiment (16 females and 14 males), aged 21-31 (mean age 26,5 years old). All of them either were born in Moscow or obtained higher education in Moscow. Every informant had an individual experimental session in headphones; 12 of them filled in the forms with answers in paper, 18 - in electronic format. In 240 presentations of fillers with the only possible segmentation the participants made only 4 “mistakes” (2 “mistakes” in 2 different stimuli made by 4 different informants). The reaction to 120 presentations of 4 fillers with two equally possible correct responses showed two strategies. 6 out of 30 informants when they faced this uncertainty always chose the first or the second answer in the form. Other 24 informants were apparently trying to find acoustic keys for segmentation but nevertheless their success rate (the number of answers when they correctly chose the segmentation that was initially presented to speaker D.B.) was at chance level (48 out of 96; none of the participants gave 4 “correct” answers out of 4). These results support the assumption that no acoustic keys for segmentation (including pauses and pitch contour features) were present in this group of fillers. The results of the perceptive experiment for test stimuli are presented in Table 2, the stimuli are given here in the order of presentation to listeners. To measure the statistical significance binomial test was applied (α = 0,05). The test shows that the null hypothesis that the informants performed segmentation at chance level should be rejected for 4 out of 8 test stimuli with non-modified vowels and for 2 out of 3 stimuli with artificially modified vowels. Perceptive experiment results for test stimuli. Statistically significant results are marked with grey colour Table 2 № Vowel Duration, ms F1, Hz Number of correct responses (out of 30) 1 [ɐ] 64 542 14 2 [ə] 35 430 24 3 [ə] 33 454 26 4 [ɐ] 62 580 29 5 [ɐ] 62 661 8 6 [ə] 31 458 10 7 [ɐ] 52 593 17 8 [ə] 36 476 12 The stimuli with artificially modified vowels 1* [ɐ]* 64→86 542 21 6* [ə]* 31→74 458 12 4* [ɐ]* 62→40 580 21 While the response to fillers shown above has a trivial explanation, the experiment with test stimuli showed partly unexpected and inconsistent results. In general, 140 out of 240 test stimuli with non-modified vowels (58,3%) that were presented to listeners were segmented correctly. The informants’ success rate varied (3 to 6 correct responses out of 8), but none of them segmented the nonce-words 100% correctly. Table 2 shows that the success rate for individual stimuli varies significantly. At first glance, the extremely high success rate for the stimuli 2, 3 and 4 might either mean that the informants used the acoustic differences between the unstressed vowels or that they utilized some other acoustic keys, for example, prosodic features or pauses. However, the results for fillers tend to support the first hypothesis. None of the hypotheses explain the results for the stimuli 1 and 6-8 where the informants performed at chance level and stimulus 5 where the majority of listeners perceived [ɐ] as a word-final vowel. The results for the stimuli with artificially modified vowels should be considered separately. Table 2 shows that the modification of vowel duration worked as predicted in 2 out of 3 cases. The lengthening of [ɐ] from 64 to 86 ms increased the number of correct answers (from 14 to 21), and the shortening of a similar vowel from 62 to 40 ms led to the opposite result (the number of correct answers changed from 29 to 21). However, artificial lengthening of [ə] from 31 to 74 ms had no result. Thus, the perceptive experiment has shown that the listeners were unable to use the acoustic keys of unstressed vowels [ɐ] и [ə] for segmentation consistently. However, some individual stimuli were divided into words correctly at a high success rate. These findings might support the hypothesis that Russian speakers can use these acoustic keys. Other findings in favor of this assumption are the results for artificially modified vowels showing that lengthening and shortening of [ɐ] significantly changed the listeners’ segmentation success rate. Conclusion The paper is an experimental verification of a hypothesis that the features of Russian word rhythmic structure, namely, the acoustic differences between Degree 1 and Degree 2 reduction vowel, can be used by native speakers for segmenting natural speech. The research can be continued by further modification of experimental design. Specifically, the production experiment included the recordings of only one speaker. Further experiments should include analyses of pronunciation by other groups of native Russian speakers, for example, female speakers (gender-based features of Modern Standard Russian phonetics have been mentioned in (Kasatkina, 2005)). Moreover, further extension of experimental material could include other Russian vowels in different positions (including the position after a palatalized consonant). Finally, despite the fact that using nonce-words can be considered a robust way to observe pronunciation and perception, using real Russian words (in case a sufficient number of minimal pairs will be found) would’ve made the experiment design less complicated for the informants. Finally, the present experiment was initially planned for L2 research purposes. The language-specific strategies for segmentation have been observed in a number of experimental papers (e.g., Cutler et al., 1986; Sanders & Neville, 2000; Hay & Diehl, 2007; Kabak, Maniwa & Kazanina, 2010). However, there have been no studies to date about the rhythmic structure of Russian word in this aspect. A pilot experiment was held based on the material of the present paper. It showed that English-speaking students that studied Russian used their own specific strategy for segmenting Russian nonce-words. Presumably, this strategy could be based on the features of English word rhythmic structure. Therefore, it could be of interest to examine the transfer of segmentation strategies into L2 for learners of Russian.

Pavel Vasilievich Duryagin

Candidate of Sciences in Philology, Lecturer at Faculty of Humanities, School of Linguistics, National Research University Higher School of Economics. Research interests: Russian dialectology, second language acquisition: Russian as L2, experimental phonetics