The current research themes are the following: (1) perception of locally time-reversed speech, (2) perception of interrupted speech, (3) Irrelevant sound effect on short-term memory, (4) multivariate analyses of speech, (5) perception of noise-vocoded speech, (6) multivariate analyses of choral music, etc. Some of them are collaborative works with Technische Universitaet Darmstadt, Germany, and University of British Columbia, Canada. He had been a member of the Perceptual Psychology Unit of the governmental "Center of Excellence" (COE) program entitled "Design of Artificial Environments on the Basis of Human Sensibility," Kyushu University, since 2003. He joined a membership of "Center for Applied Perceptual Research," since its foundation in 2010. The Center developed into "Research Center for Applied Perceptual Science" in 2013. He supervises some undergraduate and graduate students. He teaches Psychology of Hearing, Auditory Perception and Cognition, Perceptual Psychology, Science of Auditory and Visual Perception, etc. He had taught Psychology, Perceptual Psychology, Experimental Design, etc. at Kyoto Prefectural University. He has experienced managements in publishing and research meetings at the Acoustical Society of Japan and at the Japanese Society for Music Perception and Cognition. Since 2018, he has been one of the Vice Presidents of the International Society for Psychophysics.

Perception of locally time-reversed speech is investigated. A paper has been published (Ueda et al., 2017).

A series of experiments is planned to use results from Ueda & Naka jima (2008) in order to produce speech which is degraded in systematical ways. Speciﬁcally, noise-vocoded speech shall be produced that lacks the spectral ﬁne-structure of the original recording, and that permits to systematically vary the number of frequency channels used in the synthesis. Furthermore, conventional critical-band based synthesis methods shall be compared with ones that partition the audible frequency range into ’meaningful’ units as determined by Ueda, Naka jima & Araki (2009). Finally, input signals other than noise may be used for the vocoder. The stimuli thus generated may serve as interfering background in ’irrelvant speech’ type paradigms as studied by Ellermeier & Zimmer (1997) or Zimmer, Ghani & Ellermeier (2008). The results may elucidate what makes sounds ’speech-like’ and what are the acoustical properties that produce the greatest degree of memory impairment in the irrelevant sound paradigm. Simultaneously they serve to validate the concept of speech-based auditory universals proposed by Ueda et al. (2009). Several aural presentations and poster presentations were conducted (e.g., Ueda, Nakajima, Doumoto, Ellemeier, and Kattner, 2011; Ellemeier, Kattner, Ueda, Nakajima, and Doumoto, 2012). The first output of our collaboration in a referee journal was published in the Journal of the Acoustical Society of America ( Ellermeier, Kattner, Ueda, Doumoto, and Nakajima, 2015).

Yoshitaka Nakajima, Mizuki Matsuda, Kazuo Ueda, and Gerard B. Remijn, Temporal Resolution Needed for Auditory Communication: Measurement with Mosaic Speech, Frontiers in Human Neuroscience, 10.3389/fnhum.2018.00149, 12, 149, 2018.04, Temporal resolution needed for Japanese speech communication was measured. A new experimental paradigm that can reflect the spectro-temporal resolution necessary for healthy listeners to perceive speech is introduced. As a first step, we report listeners' intelligibility scores of Japanese speech with a systematically degraded temporal resolution, so-called ``mosaic speech'': speech mosaicized in the coordinates of time and frequency. The results of two experiments show that mosaic speech cut into short static segments was almost perfectly intelligible with a temporal resolution of 40 ms or finer. Intelligibility dropped for a temporal resolution of 80 ms, but was still around 50%-correct level. The data are in line with previous results showing that speech signals separated into short temporal segments of <100 ms can be remarkably robust in terms of linguistic-content perception against drastic manipulations in each segment, such as partial signal omission or temporal reversal. The human perceptual system thus can extract meaning from unexpectedly rough temporal information in speech. The process resembles that of the visual system stringing together static movie frames of ~40 ms into vivid motion..

3.

Kazuo UEDA, Yoshitaka NAKAJIMA, Wolfgang ELLERMEIER, Florian KATTNER, Intelligibility of locally time-reversed speech: A multilingual comparison, Scientific Reports, 10.1038/s41598-017-01831-z, 7, doi:10.1038/s41598-017-01831-z, 2017.05, [URL], A set of experiments was performed to make a cross-language comparison of intelligibility of locally time-reversed speech, employing a total of 117 native listeners of English, German, Japanese, and Mandarin Chinese. The experiments enabled to examine whether the languages of three types of timing---stress-, syllable-, and mora-timed languages---exhibit different trends in intelligibility, depending on the duration of the segments that were temporally reversed. The results showed a strikingly similar trend across languages, especially when the time axis of segment duration was normalised with respect to the deviation of a talker's speech rate from the average in each language. This similarity is somewhat surprising given the systematic differences in vocalic proportions characterising the languages studied which had been shown in previous research and were largely replicated with the present speech material. These findings suggest that a universal temporal window shorter than 20--40~ms plays a crucial role in perceiving locally time-reversed speech by working as a buffer in which temporal reorganisation can take place with regard to lexical and semantic processing..

4.

Yoshitaka NAKAJIMA, Kazuo UEDA, Shota FUJIMARU, Hirotoshi MOTOMURA, Yuki OHSAKA, English phonology and an acoustic language universal, Scientific Reports, 10.1038/srep46049, 7, 46049, 1-6, doi: 10.1038/srep46049, 2017.04, [URL], Acoustic analyses of eight different languages/dialects had revealed a language universal: Three spectral factors consistently appeared in analyses of power fluctuations of spoken sentences divided by critical-band filters into narrow frequency bands. Examining linguistic implications of these factors seems important to understand how speech sounds carry linguistic information. Here we show the three general categories of the English phonemes, i.e., vowels, sonorant consonants, and obstruents, to be discriminable in the Cartesian space constructed by these factors: A factor related to frequency components above 3,300 Hz was associated only with obstruents (e.g., /k/ or /z/), and another factor related to frequency components around 1,100 Hz only with vowels (e.g., /a/ or /i/) and sonorant consonants (e.g., /w/, /r/, or /m/). The latter factor highly correlated with the hypothetical concept of sonority or aperture in phonology. These factors turned out to connect the linguistic and acoustic aspects of speech sounds systematically..

5.

Kazuo UEDA, Yoshitaka NAKAJIMA, An acoustic key to eight languages/dialects: Factor analyses of critical-band-filtered speech, Scientific Reports, doi: 10.1038/srep42468, 7, 42468, 1-4, doi: 10.1038/srep42468, 2017.02, [URL], The peripheral auditory system functions like a frequency analyser, often modelled as a bank of non-overlapping band-pass filters called critical bands; 20 bands are necessary for simulating frequency resolution of the ear within an ordinary frequency range of speech (up to 7,000 Hz). A far smaller number of filters seemed sufficient, however, to re-synthesise intelligible speech sentences with power fluctuations of the speech signals passing through them; nevertheless, the number and frequency ranges of the frequency bands for efficient speech communication are yet unknown. We derived four common frequency bands---covering approximately 50--540, 540--1,700, 1,700--3,300, and above 3,300 Hz---from factor analyses of spectral fluctuations in eight different spoken languages/dialects. The analyses robustly led to three factors common to all languages investigated---the low & mid-high factor related to the two separate frequency ranges of 50--540 and 1,700--3,300 Hz, the mid-low factor the range of 540--1,700 Hz, and the high factor the range above 3,300 Hz---in these different languages/dialects, suggesting a language universal..

Wolfgang Ellermeier, Florian Kattner, Kazuo UEDA, Kana Doumoto, Yoshitaka NAKAJIMA, Memory disruption by irrelevant noise-vocoded speech: Effects of native language and the number of frequency bands, the Journal of the Acoustical Society of America, http://dx.doi.org/10.1121/1.4928954, 138, 3, 1561-1569, 2015.09, [URL], To investigate the mechanisms by which unattended speech impairs short-term memory performance, speech samples were systematically degraded by means of a noise vocoder. For experiment 1, recordings of German and Japanese sentences were passed through a filter bank dividing the spectrum between 50 and 7000 Hz into 20 critical-band channels or combinations of those, yielding 20, 4, 2, or just 1 channel(s) of noise-vocoded speech. Listening tests conducted with native speakers of both languages showed a monotonic decrease in speech intelligibility as the number of frequency channels was reduced. For experiment 2, 40 native German and 40 native Japanese participants were exposed to speech processed in the same manner while trying to memorize visually presented sequences of digits in the correct order. Half of each sample received the German, the other half received the Japanese speech samples. The results show large irrelevant-speech effects increasing in magnitude with the number of frequency channels. The effects are slightly larger when subjects are exposed to their own native language. The results are neither predicted very well by the speech transmission index, nor by psychoacoustical fluctuation strength, most likely, since both metrics fail to disentangle amplitude and frequency modulations in the signals.(C) 2015 Acoustical Society of America..

A consistent clustering of power fluctuations in British English, French, German, and Japanese.

3.

Critical-band-filter analyses of speech sentences: Common factors across Japanese, British English, French, and German..

4.

A critical-band-filtered analysis of Japanese speech sentences.

5.

Factor analyses of critical-band-filtered speech of British English and Japanese.

6.

Principal component analyses of critical-band-filtered speech.

7.

Critical-band-filter analysis of speech sentences: A case of British English.

8.

English /r/ and /l/ identification by native and non-native listeners in noise: applying screening text, signal-to-noise ratio variation, and training.

Membership in Academic Society

The Society for Bioacoustics

The Japanese Psychonomic Society

The International Society for Psychophysics

The Acoustical Society of America

The Japanese Psychological Association

The Acoustical Society of Japan

The Japanese Society for Music Perception and Cognition

Awards

Twenty-five year awards, the Acoustical Society of America, 5 June 2013.

Educational

Educational Activities

He is in charge of Perceptual Psychology, Psychometrics, Auditory Physiology, Auditory Psychology, Auditory Cognition, Acoustic Experiments I and II, and Human Science A. In addition, the Consortium of Auditory Research Laboratories in O'hashi has been holding Joint Seminars regularly, since 2001. The Consortium consists of Nakajima, Remijn, and Ueda Laboratories. All of the laboratory members attend the Joint Seminar. Special Lectures on a specific topic are occasionally included in the Seminar, and all the professors of these laboratories give some lectures.

Social

Professional and Outreach Activities

He has experienced managements in publishing and research meetings at the Acoustical Society of Japan and at the Japanese Society for Music Perception and Cognition. In 2017, he contributed in organizing Fechner Day 2017: the 33rd Annual Meeting of the International Society for Psychophysics held in Fukuoka..