Emotion (WP2)

Overview

One of the pillars of human-human communication is the capability to perceive, understand and respond to social interactions, usually determined through affective expression[1]. Therefore, applying emotion expression recognition in robots can drastically change our interaction with them[2]. A robot capable to understand emotion expressions can increase its own capability to solve problems by using these expressions in its decision making process, in a similar way that humans do[3]. Especially the use of emotional information between robots and older adults can improve the interaction by several degrees[4].

Research has indicated a major difference in emotion reactions and regulation in older adults[5]. Therefore, the projects in WP2 will collaborate closely to investigate emotions in the aging society and to investigate synergies between different modalities. The final goal is integrated emotion recognition that takes into account age-related variations of Interaction Quality. Such variations may be caused by physical or cognitive fatigue, and also by large variations between individuals. A common possible application for the work in WP2 is social companion robots such as the PARO therapeutic robot made available via our partner ADELE. PARO has been found to reduce patient stress and to stimulate interaction between elderly and caregivers, and is used in nursery homes. Equipping such robots with a multi-modal system for emotion recognition has a potential to improve interaction.

Tasks and Deliverables

TASKST2.1 Development of deep neural architecture incorporating unsupervised approaches to learn and recognise visual emotional states (ESR1)
T2.2 Auditory and language features for emotion recognition and development of deep neural architecture for classification (ESR2)
T2.3 Development of models for emotion recognition and generation based on gait (ESR3)
T2.4 Investigation of how Interaction Quality depends on fusing modalities for emotion recognition (ESR1, ESR2, ESR3)

Involved ESRs

ESR1 (HAM) Learning face and upper-body emotion recognition

ESR1 will focus on face and upper-body emotion recognition since psychological studies[6] have shown that in non-verbal communication, facial expressions and body motion complement each other and lead to a more robust recognition when determining emotional states and are perceived differently when shown individually. Building on our previous work[7], where emotions were recognised from visual features, ESR1 will develop an architecture containing unsupervised learning techniques combined with the dimensional model for emotions[8]. In this model, emotional states are represented by values in two universal dimensions: pleasure-displeasure (valence) and activation-deactivation (arousal) since even the same person can express the same emotion in different ways. Using this model, the aim is to develop a deep learning neural system that is capable to learn facial and upper-body features to create a new intensity grid of emotion expressions and cluster them with self-organising layers to distinguish different and so far unknown emotional states. This is especially needed to compensate for the varying Interaction Quality when interacting with older adults who show a different variance in emotional reaction, e.g. less distinct facial expressions or more aroused reactions. Several categories of complementary datasets will be used. The Cohn-Kanade dataset[9] contains acted emotion expressions, and CAM3D[10] contains spontaneous emotion expressions. We will train the network also with data from the Emotions in the wild dataset[11]. At a secondment to ESR3@BGU, general visual features will be investigated. During an industrial secondment to FHG, the model will be integrated on the Care-O-bot platform, and evaluated (this is a tentative plan that may be adjusted to best fit actual research).

Compared to facial expressions, emotional sound and language detection is more dependent on temporal features. In speech processing, deep learning has emerged as a very prominent successful research direction over the past few years[12],[13]. ESR2 will aim at a neural architecture that learns to incorporate auditory features to detect emotional states[14], from low-level prosodic features to higher-level cues from word or sentence level (grammar, sentence structure, specific word combinations signalling anger, etc.), exploring recurrent architectures on top of several different feature extractors working at different time scales. Due to its more robust detection using several features, the chosen architecture can increase auditory communication quality with older adults who exhibit less distinct auditory features. The architecture will substantially extend previous work on multi-modal feature extraction[15] and will be evaluated with databases for auditory affect recognition[16], e.g. the VAM database[17]. During a secondment to ESR12@UWE, data will be recorded, and Wizard-of-Oz experiments will be conducted. Possibilities to apply results on a social companion robot will be investigated during a secondment to ADELE using their PARO robot (this is a tentative plan that may be adjusted to best fit actual research).

ESR3 (BGU) Emotion recognition and expression based on Human Motion

ESR3 will examine the effect ofpeople’s emotional experiences on their gait. Recent research has investigated the association between emotional state and human body motion[18] but quantitative assessment of the effect of emotions on body motion, i.e. gait, is still lacking. The complexity and versatility of the musculoskeletal motor system, and its intricate connections with the emotional system, requires multidisciplinary efforts, from performance arts to neuroscience and human-computer interaction. Building on our earlier work[19],[20], the methodology will combine practices used in psychology research and in biomechanics. Emotional states will be manipulated in a within-subjects setting to four conditions: happiness, relaxation, sadness, and fear. Specific emotions to be analysed will be based on focus groups in elderly homes in collaboration with ESR15@BGU. Since human mobility significantly varies in the older adults this will be taken into account and considered as part of the quality of interaction. The results will be used in developing models and algorithms for human-robot interaction such that the robot can both express emotions by body motions, and understand the human’s emotional state. Regression and decision tree classifications methods will be developed to relate the way human motion and posture characteristic represent different emotions. Since research has indicated there is a difference in emotion expression and regulation depending on age, we will ensure throughout the research comparison to other populations to ensure the findings are valid for other applications. During a secondment to ESR2@HAM the focus will be on developing recurrent algorithms for classification and combining the algorithms with the work on auditory cues. During a secondment to PAL-R, the algorithms for generation of robot motions representing emotions will be implemented and evaluated on their robot platform TIAGo aided by the PAL engineers (this is a tentative plan that may be adjusted to best fit actual research).