Makuhari, Chiba, Japan
September 26-30. 2010

Incremental Acoustic Valence Recognition: An Inter-Corpus Perspective on Features, Matching, and Performance in a Gating Paradigm

Björn Schuller, Laurence Devillers

LIMSI, France

It is not fully known how long it takes a human to reliably recognize emotion in speech
from the beginning of a phrase. However, many technical applications demand for very quick
system responses, e.g. to prepare different feedback alternatives before the end of a
speaker turn in a dialog system. We therefore investigate this ‘gating paradigm’ employing
two spoken language resources in a cross- and combined manner with a focus on valence: we
determine how quick a reliable estimate is obtainable and whether matching by models
trained on the same length of speech prevails. In addition we analyze how individual
feature groups by type and derived functionals respond and find considerably different
behavior. The language resources have been chosen to cover for manually segmented and
automatically segmented speech at the same time. In the result one second of speech is
sufficient on the datasets considered.