Teaching

Tutorials

Multimodal Recognition of Socio-emotional Signals

The interpretation of facial expressions, head gestures, and prosodic information are important non-verbal cues for intelligent systems. The usage of these cues enables such a system to gain information about the mental state of the user and the quality of an interaction.

Research Questions

As a basis for the interpretation of a human face we use facial point extraction methods. By tracking these points over time we can calculate relevant features for the classification of the human's internal state [1] or facial communicative signals [2]. In contrast to many other groups we do not want to recognize the seven basic emotions of Ekman but concentrate on socio-emotional signals like smiling, agreement, or confusion.

In addition to visual cues we analyse the prosody. We extract different features like pitch, energy and MFCCs from speech signals to detect socio-emotional signals and recognise user states such as hesitation and uncertainty. Other cues for the user state can be gained from filled-pause analysis. We use these prosodic features and combine them with the visual cues to get better classification results for the mental state of the user.