Abstract

A growing number of affective computing researches recently developed a computer system that can recognize an emotional state of the human user to establish affective human-computer interactions. Various measures have been used to estimate emotional states, including self-report, startle response, behavioral response, autonomic measurement, and neurophysiologic measurement. Among them, inferring emotional states from electroencephalography (EEG) has received considerable attention as EEG could directly reflect emotional states with relatively low costs and simplicity. Yet, EEG-based emotional state estimation requires well-designed computational methods to extract information from complex and noisy multichannel EEG data. In this paper, we review the computational methods that have been developed to deduct EEG indices of emotion, to extract emotion-related features, or to classify EEG signals into one of many emotional states. We also propose using sequential Bayesian inference to estimate the continuous emotional state in real time. We present current challenges for building an EEG-based emotion recognition system and suggest some future directions.

1. Introduction

An emotional state refers to a psychological and physiological state in which emotions and behaviors are interrelated and appraised within a context [1]. From the psychological aspects, the space of the emotional state can be built from the discrete model or the dimensional model. In the discrete model, an emotional state is defined as a set of a finite number of discrete states corresponding to one of core emotions, including anger, fear, disgust, surprise, happiness, and sadness, or a combination of them [2]. The dimensional model defines an emotional state spatially with the basic dimensions of emotion such as valence and arousal and interprets an emotion through the levels of each dimension [3]. These emotion models have been used for systematical and multilateral analyses of emotion [3]. Based on the emotion models, neurophysiologic mechanisms under the emotional state have been vigorously investigated. Broadly, it has been documented that the emotional processes performed at the ventral and dorsal systems in the human brain are functionally different [4]. The ventral system, including ventral anterior cingulate gyrus and some ventral areas of prefrontal cortex (ventromedial prefrontal cortex and medial orbitofrontal cortex), is involved in the production of emotional states and the regulation of affective responses, whereas the dorsal system, including dorsal anterior cingulate gyrus, some dorsal areas of prefrontal cortex (dorsolateral, posterior dorsolateral, and mid-dorsolateral prefrontal cortex), and hippocampus, is involved in effortful emotion regulation and subsequent behavior [4, 5].

Recently, affective computing (AC) has emerged as a converging technology blending emotion into human computer interaction (HCI) [6]. AC, often called emotion aware computing, builds emotional interactions between a human and a computer by measuring the emotional state through behavioral and physiological signals and developing computational models for the emotional state [6, 7]. One of the key elements in AC is emotion recognition that estimates the emotional state of users from their behavioral and physiological responses [7]. Emotion recognition aims to advance the intelligence of computer for creating affective user interfaces and to enhance the quality of psychiatric health care.

A variety of measures have been used for emotion recognition including self-report, startle response, behavioral response, autonomic measurement, and neurophysiologic measurement [3]. Self-report readily acquires emotional responses according to the emotion modeling framework but makes it difficult to track rapid affective changes and needs to rely on the outcome from self-estimation of the emotional state [3, 8]. The startle response magnitude using electromyography (EMG) measures unconscious myoneural responses but assesses only partial aspects of emotion (e.g., arousal level) [3, 9]. Behavioral measurement detects changes in facial and/or whole-body behavior using EMG or video image, but needs an assumption that EMG signals directly correspond to a specific emotional state [3, 10]. Autonomic measurement can objectively detect emotion-related physiological responses of autonomic nervous system (ANS), such as skin conductance responses (SCRs) and heart rate variability (HRV), but only access the subspaces of the emotional state [3, 11]. Neurophysiologic measurement based on electrophysiological and neuroimaging techniques can detect a wide range of dynamics of the emotional state by directly accessing the fundamental structure in the brain from which an emotional state emerges [3, 12]. Hence, neurophysiologic measurements clearly provide the most direct and comprehensive means for emotion recognition.

A large body of research has investigated neural correlates of emotion in humans using many noninvasive sensor modalities, each presenting unique characteristics with respect to spatiotemporal resolution and mobility. Functional magnetic resonance imaging (fMRI) has been used to find cortical and subcortical structures implicated in emotional states [13]. MEG has also been used to find emotion-related neural signals from specific sources in a timely manner with fine spatial and temporal resolutions [14]. But the cost and immobility of fMRI and MEG prevents these modalities from being used for practical emotion recognition systems [15, 16]. EEG, although suffering from its poor spatial resolution and high susceptibility to noise, has been widely used to investigate the brain dynamics relative to emotion as it enables the detection of immediate responses to emotional stimuli with an excellent temporal resolution [17–21]. Being developed to become more cost-effective and mobile with increased practicability and less physical restriction [22], EEG, not without its downsides, still carries critical advantages in practical usage and therefore has been a primary option for the development of online emotion recognition systems. In fact, there have been a growing number of efforts to recognize a person’s emotion in real time using EEG. For example, EmoRate developed as a commercial product (Emotiv Corp., CA, USA) detects the flow of the emotional state while user is watching a film [23]. Brown et al. proposed an EEG-based affective computer system that measures the state of valence and transmits it via a wireless link [24].

The development of an EEG-based emotion recognition system requires computational models that describe how the emotional state is represented in EEG signals and how one can estimate an emotional state from EEG signals. Despite a long history of searching for EEG indices of emotion, less attention has been paid to the computational models for emotional state estimation. Hence, we feel needs for a review of the state-of-the-art computational models for emotional state estimation to subserve the development of advanced emotion recognition methods. This paper will review the current computational methods of emotional state estimation from the human EEG with discussion on challenges and some future directions.

This paper will particularly focus on the following aspects of EEG-based emotional state estimation models. First, it will start with a quick review on EEG correlates of emotion, including definition of the emotional state space, the design of emotional stimuli, and the EEG indices of emotion. Then, it will revisit the computational methods to extract EEG features relative to emotional states and to estimate emotional states from EEG. We will also propose a mathematical approach to the estimation of continuous emotional state based on Bayesian inference.

2. EEG Correlates of Emotion

Finding EEG correlates of emotional states should begin with how to define the emotional state space. The emotional state space can be largely categorized into a discrete space and a continuous space. The discrete state space draws upon the discrete emotion model and contains a set of discrete experiential emotional states. The discrete emotional state comprises seven to ten core emotions such as happiness, surprise, sadness, anger, disgust, contempt, and fear [2, 25] and sometimes expands to contain a large number of emotions with the synonyms of these core emotions [25]. The continuous state space is built from the dimensional emotion model and represents an emotional state as a vector in a multidimensional space. This vector space of the continuous emotional state depends on the definition a basis. For instance, the circumplex model, developed by Russell, describes an emotional state in a two-dimensional circular space with the arousal and valence dimensions [26]. Various psychological models define emotional dimensions that subsequently constitute the basis for the emotional state space [25, 27–30].

Based on the construction of the emotional state space, the investigation of EEG correlates of emotion should also address how to determine experimental stimuli to induce emotions. Typically, emotional stimuli are selected to cover desired arousal levels and valence states, and presented in different modalities including the visual, auditory, tactile, or odor stimulation. The ground truth of the emotional state induced by a stimulus is secured by exploiting the self-ratings of subjects or using the standard stimulus sets such as the international affective picture system (IAPS) or the international affective digitized sound system (IADS). The IAPS provides a set of normative pictures for emotional stimuli to induce emotional changes and attention levels [31]. The IADS embodies acoustic stimuli to induce emotions, sometimes together with the IAPS [32]. These international affective systems are known to be independent of culture, sex, and age [33].

A number of neuropsychological studies have reported EEG correlates of emotion. These EEG features can be broadly placed in one of two domains: time domain and frequency domain. In the time domain, several components of event-related potentials (ERPs) reflected underlying emotional states [34]. These ERP components can be encapsulated in a chronological order: P1 and N1 components generated in a short latency from stimulus onset, N2 and P2 in a middle latency, and P3 and slow cortical potential (SCP) in a long latency. The ERP components of short to middle latencies have been shown to correlate with valence [34–37], whereas with the ERP components of middle to long latencies have been shown to correlate with arousal [38–41]. Basically, the computation of ERPs requires averaging EEG signals over multiple trials, rendering ERP features inappropriate for online computing. However, recent developments of the single-trial ERP computation methods increase a possibility to use ERP features for online emotional state estimation [42–46].

In the frequency domain, the spectral power in various frequency bands has been implicated in the emotional state. The alpha power varied with the valence state [47] or with discrete emotions such as happiness, sadness, and fear [18]. Specifically, the frontal asymmetry of the alpha power has been repeatedly reported as a steady correlate of valence [48]. The subsequent studies have suggested that the frontal alpha asymmetry may reflect the approach/avoidance aspects of emotion, rather than valence per se [49]. The event-related synchronization (ERS) and desynchronization (ERD) of the gamma power has been related to some emotions such as happiness and sadness [50–52]. The ERS of the theta power has also been modulated during transitions in the emotional state [18, 53–55].

Besides the waveforms and the spectral power, the interactive properties between a pair of EEG oscillations such as phase synchronization and coherence have also been implicated in emotional processes. For instance, the phase synchronization level between the frontal and right temporoparietal areas varied with the emotional states of energetic, tension, and hedonic arousal [56]. The EEG coherence across the prefrontal and posterior beta oscillations was increased by viewing high arousal images [57]. Also, increases in the gamma phase synchronization index were induced by unpleasant visual stimuli [58]. As the emotional process engages a large-scale network of the neural structures in the brain, these multichannel analyses of EEG across the brain will reveal more signatures of emotion as they do for other cognitive functions [59–64]. In short, a brief summary of the EEG correlates of emotion is presented in Table 1.

Table 1: EEG correlates of emotion.

3. Computational Methods to Estimate Emotional States

The computational methods to estimate the emotional state have been designed based on various EEG features related to emotional processes. As most EEG analysis methods are accompanied by preprocesses for reducing the artifacts, so is the emotional state estimation method. Figure 1 illustrates overall processing steps to estimate the emotional state from EEG signals. The recorded EEG signals in response to affective stimuli pass through the preprocessing step in which noise reduction algorithms and spatiotemporal filtering methods are employed to enhance the signal-to-noise power ratio (SNR). Then, the feature extraction step determines specific band powers, ERPs, and phase coupling indices correlated with the target emotional states. Usually, this feature selection process is optimized by mathematical methods to achieve maximum emotional estimation accuracy. The classification step estimates the most probable emotional state from the selected EEG features. The number of class depends on the definition of the emotional state space, such as the continuous state of arousal and valence, or the discrete states.

Figure 1: Overall emotional state estimation process. The overall emotional state estimation procedure. EEG signals are recorded during emotional situations and passed through the preprocessing step including noise reduction and spatial and temporal filtering. The features related with the emotional states such as spectral power, ERP, and phase synchronization are extracted from the preprocessed EEG signals. These features are used to estimate emotional states by classification methods.

As the preprocessing methods are relatively general to a variety of EEG signal processing applications, here we focus on the feature extraction and emotion classification methods. We first review the computational methods to extract emotion-related features from EEG, followed by the classification algorithms used to estimate the emotional state from the EEG features. The feature extraction methods usually build a computational model to find emotion-related features based on neurophysiologic and neuropsychological knowledge. Unlike the feature extraction methods, the classification methods draw more upon signal processing theories such as machine learning and statistical signal processing. It has been of interest how each of these two steps impact on estimation accuracy. On one hand, the feature extraction seems to be more closely tied to estimation performance since without pointing to the very features correlated with emotion, it is implausible to build a correct model. On the other hand, the classification algorithms should also be carefully designed to fit to the characteristics of the feature space; for instance, using a linear classifier for highly nonlinear feature structures would not make much sense. In general, one should weigh coherence between a feature space and a classifier for increasing estimation accuracy.

3.1. Feature Extraction Methods

As for valence-related features, it has been shown that positive and negative emotions induce asymmetric modulations in the frontal alpha power of EEG, leading to a relative decrease in the left frontal alpha power for positive emotions and a decrease in the right for negative emotions [65]. This frontal alpha asymmetry provides an effective index for valence by computing a difference between the left and right alpha powers, here denoted as and respectively, divided by the sum of both:
The computation of the spectral power in the alpha band has been executed by a number of methods, including the squares of the EEG amplitude filtered through an alpha bandpass filter [53], Fourier transform [66], power spectral density [18, 21], and wavelet transform [7, 67, 68]. Most of these methods are well established and can readily be implemented in real time.

As for arousal-related features, one can extract the spectral power features such as the frontal midline theta power similar to the alpha power. Recently, more advanced computational methods have been proposed to evaluate emotional arousal. For instance, Asymmetry index (AsI) assesses emotion elicitation by computing a multidimensional directed information (MDI) between EEG channels [69]:
indicates the total amount of information flowing from left hemisphere signals, , to right hemisphere signals, , when the subject has emotional feelings. refers to the total bidirectional information with emotion. indicates the same directional information from to but when the subject does not have emotional feelings, and refers to the total bidirectional information without emotion. AsI can effectively indicate whether an emotional state is elicited or not [69]. Besides AsI, the variance of potentials from a specific channel over different EEG channels has been used as an emotion-related feature [68]. Also, the entropy of EEG signals has been used to extract information related to emotion from intrusive noise [68].

As for individual discrete emotions, a typical approach is to search through all the possible EEG channels, spectral bands, and time segments for a set of features that maximizes the accuracy of emotional state estimation. This approach adopts a greedy search method with supervised learning, often resulting in different optimal feature sets for each individual. To overcome this issue of subject-by-subject variability, a higher order crossing (HOC) analysis was developed to implement a user-independent emotion recognition system [70]. The HOC analysis aims to find EEG features with respect to six affective traits, including surprise, disgust, anger, fear, happiness, and sadness [70]. The HOC model is given as:
is the simplified version of the HOC feature that counts the number of zero-crossing from a high-pass filtered, standardized EEG time series. Zero-crossing indicates an event at which the signal amplitude passed through a zero-line with the change of polarity. The zero-crossing counts often represent oscillation properties more robustly than the spectral power. A vector of the simple HOCs is constructed to contain the features related to emotion. A higher value of means decreases in the discrimination power of the simple HOC because different processes can yield almost the same . indicates a binary time series with zeros and ones: at time instant where if the amplitude of the filtered signal is negative and otherwise. indicates the length of the time series . The EEG feature vector is defined as that consists of multiple simple HOCs [70]. The computational methods to extract emotional features from EEG are summarized in Table 2.

Table 2: Emotional state estimation model.

3.2. Emotion Classification Methods

The EEG feature vector provides observations from which an emotional state can be inferred. Commonly, a classifier has been used for decoding the feature vector into one of possible emotional states. A number of classification methods have been used for emotional state estimation, including discriminant analysis (DA), support vector machine (SVM), k-nearest neighbor (k-NN), and the Mahalanobis distance (MD) based method. DA performs dimensionality reduction in a high-dimensional feature space onto a low-dimensional space with an aim to maximize the Fisher discriminant ratio, , of between-class scatter, , to within-class scatter, , [42, 71–76].
A larger value indicates greater separation between classes. The dimensionality of the low-dimensional space varies from one up to the number of classes minus one.

SVM is derived from DA but determines a decision boundary in a kernel space instead of the original feature space. SVM finds an optimal hyperplane, , and the hypermargin of the decision boundary in the feature space using a supervised learning method. The classifier classifies a new input feature vector using a classification rule given by
Here, indicates a set of the support vectors that are used to determine the maximum margin hyperplane, and denotes the kernel function of the SVM classifier. denotes an offset parameter, does training input vectors, and does nonzero weights on each support vector [7, 77–80]. Various kernel functions have been proposed such as the Gaussian function or polynomials. SVM offers advantages of good generalizability for nonlinear feature spaces.

The k-NN algorithm determines the class of a new feature vector according to the number of nearest vectors in the training set surrounding a new feature vector [73, 81]. k is a parameter determining the encircled boundary. The k-NN algorithm depends on how to define a distance between feature vectors, which is subject to be affected by the curse of dimensionality [81, 82].

The MD-based method, has been widely used in the clustering analysis, not only for distance, but also with correlation coefficient and the standard deviation [83, 84]:
and indicate the inverse of the covariance matrix and the mean vector of a class , respectively. MD converges to Euclidean distance when the covariance matrix of feature vectors becomes the identify matrix [84]. Basically, when a new feature vector arrives, the MD-based classifier compares the distance of the vector to each class using MD and chooses the class with the smallest distance. The classification methods that have been used for emotional state estimation are summarized in Table 2.

4. A Generative Model for Online Tracking of Emotional States

As described earlier, most computational models estimating emotional states have focused on the discrete state space and classified EEG features into one of a finite number of emotional states. This approach generally suits well to the case of a static determination of which emotion is induced by a given stimulus. Yet, for the development of an online emotion recognition system, where continuous tracking of the emotional state may play an important role, the current approach might be suboptimal because they do not take temporal dynamics of the emotional state into account. Another downside of the current approach originates from their direct modeling framework. A model in this framework builds a direct input-output mapping from the observed EEG signal to the emotional state. Although this framework may be able to provide a reasonable solution just for the purpose of improving classification accuracy, it does not exploit prior information of the emotional state as well as dynamics of the emotional state. These shortcomings make it difficult to gain useful insights on the neural mechanism of emotion. Also, it is often desirable to incorporate prior information of the dynamics of the emotional state within a model, especially for tracking emotional state continuously over time.

To address these issues, we propose a computational modeling approach based on the generative modeling framework [85–87]. Our approach focuses on tracking the change of the emotional state over time from EEG signal. In this approach, a generative model depicts how EEG signal is generated from a hidden emotional state. Also, a prior model explains how the emotional state changes over time. Integrating these two models, we infer a most likely emotional state from an observed EEG signal. Differences between the generative and direct models can be illustrated in a probabilistic view where a goal is to estimate a conditional probability of emotional state variables given EEG observations as accurately as possible. Suppose that a random vector denotes hidden emotional states and a random vector denotes observed EEG data (e.g., an EEG feature vector). An estimation model aims to optimize a parameter set, , for the following conditional probability:
A direct model forms a functional relationship from to with , the parameter set of a function ,
where is a residual vector. In many cases, the residual vector is assumed to follow the Gaussian distribution. Parameter estimation of can be accomplished by many standard solutions such as maximum likelihood [88]. On the other hand, a generative model uses maximum a posteriori (MAP) or the Bayesian inference to estimate the conditional probability,
where represents a constant representing the integral of . The posterior is estimated by the product of , the likelihood of observation given a state , and , the prior of the state . The parameter set is used to model a generative relation from to . In terms of the EEG correlate of the emotional state, the likelihood describes how the observed EEG signal is generated from an emotional state, the prior describes a probability of each emotional state, and the posterior describes which emotional state most likely elicits the observed EEG signal.

Here, we extend this generative approach to take into account the temporal dynamics of the emotional state. We use sequential Bayesian inference to track a time-varying emotional state from EEG signal [89]. To this end, we first assume that the emotional state is defined in a continuous space. An example of a continuous state space consists of two emotional dimensions, such as valence and arousal. The valence dimension ranges from negative values to positive values. The arousal dimension ranges from low to high arousal levels. A key point is that an emotional state varies over a continuous space, instead of altering between discrete values. This does not mean that we need to assign an explicit emotion to every possible point in the emotional state space. A specific area or volume in the state space can represent a single emotion.

The generative model is then formulated as follows. Let be an emotional state vector and an EEG signal vector at time instant . contains a set of emotional state variables (e.g., , where is the valence dimension, is the arousal dimension, and is the dominance dimension). contains a set of EEG features selected to be related to emotion (e.g., the power of certain frequency band at a selected channel). The goal of the model is to find the most probable emotional state given a series of observation from the beginning, (assuming observation begins at ). The posterior is formed as
The posterior can be rewritten as a recursive equation,
Note that the likelihood, , depends only on the current time . The prior, , represents state transition from to , assuming the first-order Markov process. The dynamics of emotional state is embedded in the prior, whereas the generative process of the EEG features from an emotional state is modeled by the likelihood. The integral can be approximately computed by a number of methods with different model assumptions [89].

As this approximation relies on the recursion of the posterior, inference of an emotional state from EEG signal operates sequentially over time. This property enables our model fit well to the purpose of tracking emotional states continuously. In fact, the sequential Bayesian inference model (or called a Bayesian filter) has been widely adopted for many neuroengineering studies (e.g., see [90–94]). Our model may provide an effective way for online emotion aware computing, especially when we need to keep track of changes in the emotional state from EEG measurements continuously over time, for instance, tracking emotional changes while a subject is watching movies [95].

5. Discussion

In this paper, we overviewed the computational methods used for emotional state estimation. We first briefly gave an overview of the EEG correlates of emotion. Then, we revisited the computational methods to extract EEG features correlated with the continuous and discrete emotional states. We also described the classification methods to discriminate a particular emotional state from EEG features. Finally, we proposed a computational approach based on the generative modeling framework that may suit well to tracking the emotional state over time. These computational methods for emotional state estimation will serve as a key element for practical online emotion recognition systems for affective computing.

While affective computing has attracted attentions in the HCI field with a promise to develop a novel user interface, the development of the computational methods to estimate the emotional state still requires further understanding of emotion processes and their neurophysiologic substrates [96]. Especially, the estimation of emotional states from the human EEG has been posed only as a relatively simple classification problem with a few discrete emotions. The development of a real-time emotional state tracking system would require a more rigorous definition of the emotional state space suitable for estimation models.

Exploration of the EEG signatures of emotion that can span a broad area of the emotional state space or represent a number of different discrete emotions should continue. Such investigations may need to overcome many existing challenges. In particular, finding such EEG signatures of emotion that are invariant across individuals will be important for general emotion recognition systems [69]. As the emotion-related features have been mostly found in the frontal EEG signal, online algorithms to overcome the eye movement artifact should be continuously developed [97–99]. Also, bringing the EEG-based emotion recognition system out to the normal users would require a simple yet efficient EEG sensor. A new EEG sensor should meet some criteria such as stabilization of a signal to noise ratio (SNR), reduction of noise elicited from hair, optimization of active dry electrodes, development of multi-channel wireless communication, and sustainment of the quality of EEG signals over a long period [100–103]. Many previous studies have estimated the emotional state by analyzing the EEG responding to specific emotional stimuli. However, this emotion-induction paradigm has a limitation that the EEG signals can be modulated by the stimulus properties irrelevant to emotion [21]. Hence, a computational model that can predict the emotional state with various stimuli may be required for real-world applications.

The computational methods to estimate the emotional state may improve further with several advances in computational models. First, a model that can associate the dynamics of EEG signal with the dynamics of cognitive emotional process will provide a basis for constructing a novel emotional state estimation method. The current methods only capture the static properties in the EEG pattern in response to emotional stimuli. If a new model can embrace the temporal dynamics of emotional information processing in the human cognitive system and find EEG correlates of those dynamical properties, it will estimate the emotional state more precisely. Second, the quest for novel EEG signatures of the emotional state should be pursued. In particular, interactive properties between EEG signals such as cross-frequency coupling and effective connectivity pattern may be worth exploring to find novel EEG correlates of emotion. Third, inference of emotion-related information from high-dimensional and nonlinear EEG data poses an interesting problem to develop and apply the state-of-the-art machine learning algorithms. So far, only a few basic learning algorithms have been applied for emotional state estimation, but it is likely that emotion recognition would benefit from more advanced statistical learning and pattern recognition algorithms. With these advances, we foresee that the computational models of emotional estimation would play a key role in future consumer devices. Before long, they can bring serendipity to device users by estimating emotional states in a natural and nonintrusive way.

Conflict of Interests

The authors declare that there is no conflict of interests.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) Grants funded by the Korean government (2012047239 and 20120006588) and was funded by the Samsung Electronics Grant (R1210241).

F. Babiloni, L. Bianchi, F. Semeraro et al., “Mahalanobis distance-based classifiers are able to recognize EEG patterns by using few EEG electrodes,” in Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 651–654, October 2001.View at Scopus