Makuhari, Chiba, Japan
September 26-30. 2010

Acoustic Correlates of Meaning Structure in Conversational Speech

(1) Università di Trento, Italy
(2) FBK-irst, Italy

We are interested in the problem of extracting meaning structures from spoken utterances
in human communication. In SLU systems, parsing of meaning structures is carried over the
word hypotheses generated by the ASR. This approach suffers from high word error rates and
ad-hoc conceptual representations. In contrast, in this paper we aim at discovering
meaning components from direct measurements of acoustic and non-verbal linguistic
features. The meaning structures are taken from the frame semantics model proposed in
FrameNet. We give a quantitative analysis of meaning structures in terms of speech
features across human--human dialogs from the manually annotated LUNA corpus. We show that
the acoustic correlations between pitch, formant trajectories, intensity and harmonicity
and meaning features are statistically significant over the whole corpus as well as
relevant in classifying the target words evoked by a semantic frame.