To link to the entire object, paste this link in email, IM or documentTo embed the entire object, paste this HTML in websiteTo link to this page, paste this link in email, IM or documentTo embed this page, paste this HTML in website

MULTIMODAL ANALYSIS OF EXPRESSIVE HUMAN
COMMUNICATION:
SPEECH AND GESTURE INTERPLAY
by
Carlos Busso
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Ful llment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2008
Copyright 2008 Carlos Busso

The verbal and non-verbal channels of human communication are internally and intricately connected. As a result, gestures and speech present high levels of correlation and coordination. This relationship is greatly affected by the linguistic and emotional content of the message being communicated. The interplay is observed across the different communication channels such as various aspects of speech, facial expressions, and movements of the hands, head and body. For example, facial expressions and prosodic speech tend to have a stronger emotional modulation when the vocal tract is physically constrained by the articulation to convey other linguistic communicative goals. As a result of the analysis, applications in recognition and synthesis of expressive communication are presented.; From an emotion recognition perspective, we propose to build acoustically neutral models, which are used to measure the degree of similarity between the input speech and neutral speech. A fitness measure is then used as feature for classification, achieving better performance than conventional classification schemes in terms of accuracy and robustness. In addition to detecting users' emotions, we analyze how to use such ideas for meta-analysis of user behavior such as in automatically monitoring and tracking the behaviors, strategies and engagement of the participants in multiperson interactions. We describe a case of study of an intelligent meeting environment equipped with audio-visual sensors. We accurately estimate in real-time not only the flow of the interaction, but also how dominant and engaged each participant was during the discussion.; Finally, we show examples of how to synthesize expressive behavior by exploiting interrelation between speech and gestures. We propose to synthesize natural head motion sequences from acoustic prosodic features by sampling from trained Hidden Markov Models (HMMs). Our comparison experiments show that the synthesized head motions are perceived as natural as the captured head motion sequences.

MULTIMODAL ANALYSIS OF EXPRESSIVE HUMAN
COMMUNICATION:
SPEECH AND GESTURE INTERPLAY
by
Carlos Busso
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Ful llment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2008
Copyright 2008 Carlos Busso