Filer i denne post: 1

Modern hearing aids use a variety of advanced digital signal processing methods in order to improve speech intelligibility. These methods are based on knowledge about the acoustics outside the ear as well as psychoacoustics. We present a novel observation based on the fact that acoustic prominence is not equal to information prominence for time intervals at the syllabic and sub-syllabic levels. The idea is that speech elements with a high degree of information can be robustly identified based on basic acoustic properties. We evaluated the correlation of (information rich) content words in the DanPASS corpus with fundamental frequency (F0) and spectral tilt across four frequency bands. Our results show a correlation of certain band-level differences and the presence of content words. Similarly, but to a lesser extent, a correlation between F0 and the presence of content words was found. The principle described here has the potential to improve the “information-to-noise” ratio in hearing aids. In addition, this concept may also be applicable in automatic speech recognition systems.

Today's synthetic voices are largely based on diphone synthesis (DiSyn) and unit selection synthesis (UnitSyn). In most
DiSyn systems, prosodic envelopes are generated with formal models while UnitSyn systems refer to extensive, highly
indexed sound databases. Each approach has its drawbacks; such as low naturalness (DiSyn) and dependence on huge
amounts of background data (UnitSyn). We present a hybrid model based on high-level speech data. As preliminary
tests show, prosodic models combining DiSyn style at the phone level with UnitSyn style at the supra-segmental levels
may approach UnitSyn quality on a DiSyn footprint. Our test data are Danish, but our algorithm is language neutral.

Nonsense syllable speech materials are often used when investigating speech perception in quiet and
under adverse conditions. The main advantage of using nonsense syllables over words and sentences
is that the acoustic as well as the linguistic context is minimal. This paper presents three anechoic
recordings of 13 male and 13 female native talkers of Danish each speaking 65 nonsense syllables
repeated three times with the neutral intonation contour for Danish (in total 15210 syllables). The
authors compared and ranked groups of three recordings. These three recording had the same talker
and had identical phonetic content. The syllables were ranked according to the general “appropriateness”
and consistency, i.e., prototypical production of the consonant-vowel (CV) with respect to
applicability in speech perceptual studies. The results were compared to results of an automatic
method based on acoustic measures. The two novel ideas are 1) to devise an automated method for
evaluating “appropriateness” of CVs and 2) to develop a Danish CV-material annotated with an objective
measure of “appropriateness” for each recorded CV. The latter would potentially render more
CV’s appropriate for perceptual studies. Moreover, objective evaluation would make it possible to
examine any perceptual effects of variability in CV production (for example how susceptible different
renderings by the same talker of CV’s are to background noise). To the knowledge of the authors, no
such material has yet been published for any language.

In this paper, we present the newly established Danish speech corpus PiTu. The corpus consists of recordings of 28 native
Danish talkers (14 female and 14 male) each reproducing (i) a series of nonsense syllables, and (ii) a set of authentic
natural language sentences. The speech corpus is tailored for investigating the relationship between early stages of the
speech perceptual process and later stages. We present our considerations involved in preparing the experimental set-up,
producing the anechoic recordings, compiling the data, and exploring the materials in linguistic research. We report on
a small pilot experiment demonstrating how PiTu and similar speech corpora can be used in studies of prosody as a
function of semantic content. The experiment addresses the issue of whether the governing principles of Danish prosody
assignment is mainly talker-specific or mainly content-typical (under the specific experimental conditions). The corpus is
available at http://amtoolbox.sourceforge.net/pitu/.

We present the speech corpus SMALLWorlds (Spoken Multi-lingual Accounts of Logically Limited Worlds), newly established and still
growing. SMALLWorlds contains monologic descriptions of scenes or worlds which are simple enough to be formally describable. The
descriptions are instances of content-controlled monologue: semantically “pre-specified” but still bearing most hallmarks of spontaneous
speech (hesitations and filled pauses, relaxed syntax, repetitions, self-corrections, incomplete constituents, irrelevant or redundant
information, etc.) as well as idiosyncratic speaker traits. In the paper, we discuss the pros and cons of data so elicited. Following that,
we present a typical SMALLWorlds task: the description of a simple drawing with differently coloured circles, squares, and triangles,
with no hints given as to which description strategy or language style to use. We conclude with an example on how SMALLWorlds may
be used: unsupervised lexical learning from phonetic transcription. At the time of writing, SMALLWorlds consists of more than 250
recordings in a wide range of typologically diverse languages from many parts of the world, some unwritten and endangered.

Digital hearing aids use a variety of advanced digital signal processing methods in order to improve
speech intelligibility. These methods are based on knowledge about the acoustics outside the ear as well
as psychoacoustics. This paper investigates the recent observation that speech elements with a high
degree of information can be robustly identified based on basic acoustic properties, i.e., function words
have greater spectral tilt than content words for each of the 18 Danish talkers investigated. In this paper
we examine these spectral tilt differences as a function of time based on a speech material six times the
duration of previous investigations. Our results show that the correlation of spectral tilt with information
content is relatively constant across time, even if averaged across talkers. This indicates that it is possible
to devise a robust method for estimating information density in the speech signal based on
computationally simple short-term band-level differences. The principle described here has the potential
to improve speech transduction in hearing aids and cochlear implants. In addition, the concept of
information-based speech transduction may also be applicable in automatic speech recognition systems.