Perception of facial expressions of emotion is generally assumed to correspond to underlying muscle movement. However, it is often observed that some individuals have sadder or angrier faces, even for neutral, motionless faces. Here, we report on one such effect caused by simple static configural changes. In particular, we show four variations in the relative vertical position of the nose, mouth, eyes, and eyebrows that affect the perception of emotion in neutral faces. The first two configurations make the vertical distance between the eyes and mouth shorter than average, resulting in the perception of an angrier face. The other two configurations make this distance larger than average, resulting in the perception of sadness. These perceptions increase with the amount of configural change, suggesting a representation based on variations from a norm (prototypical) face.

Introduction

Understanding emotion is a central problem of cognitive science and is fundamental to the study of human behavior (Darwin, 1872; Russell, 2003). To properly study this, it is essential that we identify how emotions are represented in the brain—their psychological space. Here, we are concerned with the underlying dimensions of the face space used to represent and recognize sadness and anger from pictures. Ekman and Friesen (1978) and Ekman and Oster (1979), following in the footsteps of Darwin (1872), identified six universal emotional expressions (anger, sadness, fear, surprise, happiness, and disgust) assumed to have some role within the theory of evolution (Schmidt & Cohn, 2001). According to this hypothesis, all people should be equipped with the capacity to express and perceive these emotions and their associated facial expressions. Ekman and Friesen's (1978) fundamental work demonstrated how each of these emotional expressions is created by moving particular facial muscles to certain positions, and several authors have shown their universality (Ekman & Oster, 1979; Izard, 1994). Under this view, motion primitives constitute the dimensions of the underlying computational face space.

However, everyday experience suggests that other elements may be employed to classify faces into one group or another (Zebrowitz, 1997). For example, Montepare and Dobish (2003) show that there is an agreement among subjects judging the perception of emotions and social cues from faces with no particular emotional expression. Zebrowitz, Kikuchi, and Fellous (2007) found that impressions of emotion expression on faces were partially mediated by babyfacedness. And, Lyons et al. ( 2000) found that an upward tilt of the head produces a two-dimensional retinal image with facial features associated with negative affect. Similarly, a downward tilt produces a two-dimensional image of positive affect.

One possibility is that neutral faces resembling configurations of a given expression elicit perception of such emotion (Montepare & Dobish, 2003; Zebrowitz, 1997; Zebrowitz et al., 2007). This effect is known as the emotion overgeneralization hypothesis. Under this view, the major question remains as to which facial features are used to represent emotions and thus which ones are associated with these overgeneralizations. This is important, because these features are the ones used to code the psychological (face) space.

Using Ekman and Friesen's ( 1978) collection of face images, where subjects were instructed to move their face muscles to display each of the six universal emotions, we observed that the distance between the inner corner of the two eyebrows and the mouth increases when one expresses sadness (from an average of 233.12 pixels in neutral expressions to 248.8 for sadness). The same distance decreases to 220.12 pixels when one expresses anger. Hence, one hypothesis is that individuals with an “elongated” face (i.e., a larger vertical distance between internal features) will be perceived sadder than average, while people with a reduced vertical distance between facial features will be perceived as angrier.

In the present paper, we show results in favor of this hypothesis. In Experiment 1, we show that the vertical distance between eyes, nose and mouth of an individual face is correlated with the perception of anger and sadness in face images displaying a neutral expression. When the vertical distance between eyes and mouth is made larger than that of the average face (i.e., the average distance within a population), the perception of sadness increases. A decrease of the same distance from that of the average face increases the perception of sadness. This result suggests that (second-order) configural cues are used by the cognitive system to represent and recognize anger and sadness. Our results further demonstrate that the perception of anger and sadness increases linearly with the eye–mouth distance, suggesting the existence of a norm-based face space where each face is represented as a variation of a norm (sometimes called mean or prototype) face (Valentine, 1991). We also show that this effect is asymmetric, because the linear increase only occurs when the second image has a large intra-feature distance than the first one. When the larger eye-to-mouth distance is on the first image, this effect is diminished. This is because when the second image is closer to the mean face, the density of the face space increases, making it more difficult to distinguish between categories.

If the face space used to represent facial expressions of emotion is indeed norm-based, then those images with a shorter than average eye–mouth distance should not be perceived as sadder, while those with a larger than average distance should not be seen as angrier. That is to say, the perception of sadness and anger should not traverse to the other side of the “mean” face, since these correspond to two distinct psychological categories. This is the same effect seen in face recognition, where moving along a single dimension of the face space results in the observation of two distinct identities after crossing over to opposite sides of the mean face. This effect is verified in Experiment 2. In this second experiment, subjects were shown the images with shorter and larger eye-to-mouth distance simultaneously. Their task was now to determine whether the second image appeared more angry or sad than the first. Subjects consistently classified the images with larger vertical distance between internal facial features as sadder and image with a shorter distance as angrier.

The results reported in this paper have implications in the interpretation and design of psychophysical experiments, social behavior and political studies as well as in defining computational models and developing human–computer interaction systems. Furthermore, our results are important for the analysis of art, where artist may have used such configurations to emphasize expressions—as it seems to be the case in Grant Wood's “American Gothic” portrait. The perceptual biases described here are also relevant to interpret some social interactions (Hassin & Trope, 2000), and seem to affect the perception of attractiveness (Paunonen, Ewan, Earthy, Lefave, & Goldberg, 1999; Thornhill & Gangestad, 1999) and identity (Ackerman et al., 2006; Martinez, 2003). These results also call for additional research to help determine to what extent configural, shape and motion cues are combined with one another to influence the perception of facial expressions of emotion.

Experiment 1

Methods

Procedure

To assess the role of static configural differences in perception of emotion, face images were modified and presented on a computer screen. The relationships among internal features of the face were manipulated without affecting the overall shape of the face or head, resulting in a configural change. Placement of the eyes, brows, nose, and mouth were varied along the vertical axis. Four types of configural change were applied to each of 12 individual face images taken from the AR Database (Martinez & Benavente, 1998). For each type of configural change, individual images were created at the following percentages of displacement: 0% (i.e., original image), 25% (5 pixel displacement), 50% (10 pixels), 75% (15 pixels), and 100% (20 pixels). One of the unchanged originals is shown in Figure 1A. Examples of each type of configural change are shown at 100% displacement in Figures 1B–1E. The images are of 768 × 576 pixels.

Facial configurations. (A) The original face with a neutral expression. (B, C) The angry group comprises face images in which the distance between the eyes and mouth has been decreased. In (B), this is accomplished by displacing the eyes and brows downward. In (C), the mouth is displaced upward. (D, E) The sad group comprises faces in which the distance between the eyes and mouth is increased. This is done by displacing the eyes and brows upward (D) or by displacing the mouth downward (E). The examples shown here correspond to a 20-pixel displacement of the facial features (100%).

Figure 1

Facial configurations. (A) The original face with a neutral expression. (B, C) The angry group comprises face images in which the distance between the eyes and mouth has been decreased. In (B), this is accomplished by displacing the eyes and brows downward. In (C), the mouth is displaced upward. (D, E) The sad group comprises faces in which the distance between the eyes and mouth is increased. This is done by displacing the eyes and brows upward (D) or by displacing the mouth downward (E). The examples shown here correspond to a 20-pixel displacement of the facial features (100%).

Image sets from the four configurations were divided into two groups. The angry group included face images in which the mouth was displaced upward and face images in which the eyes and brows were displaced downward. The sad group included face images in which the mouth was displaced downward and face images in which the eyes and brows were displaced upward. Pairs of these face images were displayed sequentially ( Figure 2). Prior to the display of the first image of each pair, an initial visual cue (crosshair) was shown for 600 ms in a randomly selected location to alert the subject as to where the first image would be displayed. The first image was then displayed for 600 ms. This was followed by a mask with a duration of 500 ms. A second visual cue was then displayed in another randomly selected location for 600 ms followed by the display of the second image for another 600 ms. This prevented the perception of implied motion.

Stimulus time-line. An initial visual cue is shown in a randomly selected location to alert the subject as to where the first image will be displayed. After 600 ms, the first face image is displayed for a period of 600 ms. A mask (blank screen) is then presented for 500 ms. A second visual cue is displayed in another randomly selected location for 600 ms. The second face image is then displayed for 600 ms. Finally, a blank screen with a large question mark (“?”) at the center is displayed until the subject responds with a key-press.

Figure 2

Stimulus time-line. An initial visual cue is shown in a randomly selected location to alert the subject as to where the first image will be displayed. After 600 ms, the first face image is displayed for a period of 600 ms. A mask (blank screen) is then presented for 500 ms. A second visual cue is displayed in another randomly selected location for 600 ms. The second face image is then displayed for 600 ms. Finally, a blank screen with a large question mark (“?”) at the center is displayed until the subject responds with a key-press.

Subjects were asked to indicate whether the second image seemed more, the same, or less angry (or sad) relative to the first image displayed. Subjects were required to respond with a key-press for more, same, or less angry (during the session where the angry image group was used) and sad (for the sad group set). Subject responses and reaction times were recorded. Responses with reaction times greater than two times the standard deviation of all reaction times were considered outliers and were eliminated from subsequent analysis.

Subjects

Twenty subjects with normal or corrected to normal vision were seated at a personal computer with a 19-inch LCD monitor. Subjects were drawn from the population of faculty, staff, and students at The Ohio State University. The typical viewing distance was 50 cm, providing a percept of approximately 15 degrees vertically and 8 degrees horizontally. Each session lasted between 30 and 40 minutes. Subjects were given the opportunity to take a brief rest at the 25%, 50%, and 75% completion points.

Results

Responses for the changes in each configural position within the sad group and within the angry group are shown in Figures 3A and 3B, respectively, with significant differences ( p < 0.01) denoted by an asterisk. The combined responses for the sad group and for the angry group are shown in Figure 3C. The abscissa reflects the difference in positive or negative change level between the first and second image in each presentation. For example, if the first image represents a configural change of 75% and the second image is at a level of 25%, the subject's response would fall into the −50% grouping (negative change). Likewise, if the first image is at 50% and the second image is at 75%, the response would fall into the 25% grouping (positive change). The ordinate reflects the percentage of less, same, and more responses made at each grouping.

Subject responses. (A) Responses for Sad Group: Relative percentages for less–same–more responses at each level of change between the pairs of faces for both downward displacement of the mouth and upward displacement of the eyes and brows. (B) Responses for Angry Group: Relative percentages for less–same–more responses at each level of change between the pairs of faces for both upward displacement of the mouth and downward displacement of the eyes and brows. (C) Combined responses for Sad Group and Angry Group: Cumulative responses for the sad group (nose/mouth down and eyes/brows up) and the angry group (nose/mouth up and eyes/brows down). Significant differences ( p < 0.01) among less–same–more responses at each level are marked with an asterisk. Bars denote standard error.

Figure 3

Subject responses. (A) Responses for Sad Group: Relative percentages for less–same–more responses at each level of change between the pairs of faces for both downward displacement of the mouth and upward displacement of the eyes and brows. (B) Responses for Angry Group: Relative percentages for less–same–more responses at each level of change between the pairs of faces for both upward displacement of the mouth and downward displacement of the eyes and brows. (C) Combined responses for Sad Group and Angry Group: Cumulative responses for the sad group (nose/mouth down and eyes/brows up) and the angry group (nose/mouth up and eyes/brows down). Significant differences ( p < 0.01) among less–same–more responses at each level are marked with an asterisk. Bars denote standard error.

The patterns observed in the more responses in Figure 3 are sigmoidal as expected, since they approximate zero before the 0% and will saturate after 100%. This response type is common of many perceptual studies. Nonetheless, we note that the percentage of more responses increases linearly with the amount of positive differences of change (i.e., from 0% to 100%). This is because the sigmoidal response approximates a linear function with very low slope in this interval. This means that, in the sad group, if the distance between the baseline of the eyes and the mouth increases by x, the percentage of more responses increases by fs( x), where fs( .) is a linear function ( r2 = 0.981). For the angry group, when the distance between the eyes and the mouth decreases by x, the percentage of more response increases by fa( x), with fa( .) also linear ( r2 = 0.986). Similarly, the percentage of less responses increases linearly with the amount of negative change. In this case, for the sad group, the r2 value is 0.995, while for the angry group r2 = 0.998. The percentage of same responses (i.e., identical perception of sadness or anger in the first and second image) also decreases linearly. For the sad group, the percentage of same responses for negative change has an r2 value of 0.988, while the percentage of same responses for positive change has an r2 value of 0.966. For the angry group, these percentages are r2 = 0.991 and 0.971, respectively.

These results suggest an underlying dimension in a multidimensional face space where sadness and anger are represented as variations from a norm (prototypical or mean) face. This norm-based face space is well documented for the task of identification (Leopold, Bondar, & Giese, 2006; Rhodes & Jeffery, 2006; Valentine & Bruce, 1986), but has not been previously shown for the representation and recognition of expressions. This is a surprising finding, because the perception of facial expressions of emotion is purported to be categorical (Beale & Keil, 1995; Calder, Young, Perrett, Etcoff, & Rowland, 1996; Young et al., 1997). Nonetheless, our results suggest that although the perception of emotion may be categorical, the underlying representation is (at least in part) continuous, in the sense that the perception of an emotion is made clearer as we move away from the norm face (center of the psychological space) and a perception of neutral expression is obtained at the “center” of this face space (Russell, 1980).

A χ2 goodness-of-fit test was applied to determine whether the responses at different levels of displacement are indeed statistically different from those at 0% change. To properly perform this test, the cumulative responses for the sad group and for the angry group were examined ( Figure 3C). For each of the sad and angry groups, the responses for the 0% change in displacement were used as the expected values. Therefore, the null hypothesis (H 0) states that there is no difference between the response profile for any given level of change in facial feature displacement when compared to the response profile for no change in displacement. All comparisons for both the sad group and angry group yielded a significant χ2 value for p = 0.01 with two degrees of freedom ( χcv2 = 9.21; df = 2). For both the angry group and sad group, the residuals indicate that the less responses are the major contributors to the significant differences for the negative changes. Similarly, the more responses are the major contributors to the significant differences for the positive changes. These results are again consistent with a norm-based representation.

It is well known that in many perceptual and psychophysical studies, the responses will be stronger at the extremes (Attneave, 1950), which is generally where the density of the underlying cognitive multidimensional space is lower (Krumhansl, 1978). This effect has been observed in the recognition of identity in face images (Benson & Perrett, 1994; Valentine & Ferrara, 1991). In these studies, faces closer to the mean face (i.e., the center of the face space) are more difficult to identify, while faces far from the mean are recognized more easily. Therefore, we can predict that if the representation of expression is indeed norm-based, then a similar pattern should be observed in our results. To test this hypothesis, we analyze the different responses to each of the configural changes from 25% to 75%. Note that each configural change includes several possible pairs. For example, the 25% change includes eight possible scenarios, because the variation between the image with 0% displacement and that of 25% is the same as those from 25% to 50%, 50% to 75%, 75% to 100%, and to the mirror pairs. We show the responses to a difference of 25% change for the sad and angry groups in Figure 4A. In this plot, the abscissa values (n%–m%) illustrate the percentage of facial feature displacement in the first image (n%) and that in the second image (m%).

Combined responses for each level of displacement in Sad and Angry Groups: (A) 25% displacement; (B) 50% displacement; (C) 75% displacement. These differences are specified in the x-axis in each plot as n%– m%, where n% corresponds to the percentage of displacement in the first image and m% that of the second. For the positive changes ( n < m) we see the familiar linear pattern, while for negative changes ( n > m) this pattern changes (flattens). This asymmetry is further illustrated by the statistical significant values ( p < .001) marked with asterisks.

Figure 4

Combined responses for each level of displacement in Sad and Angry Groups: (A) 25% displacement; (B) 50% displacement; (C) 75% displacement. These differences are specified in the x-axis in each plot as n%– m%, where n% corresponds to the percentage of displacement in the first image and m% that of the second. For the positive changes ( n < m) we see the familiar linear pattern, while for negative changes ( n > m) this pattern changes (flattens). This asymmetry is further illustrated by the statistical significant values ( p < .001) marked with asterisks.

Our results are again consistent with a norm-based model. For positive changes (i.e., when n < m), we see a linear increment in the percentage of responses. That is, the more we move away from the center of the face space, the more apparent the percept becomes. When the same difference of change is on the face stimuli farthest from the mean face, the perception of sadness and anger is maximized. We also note, however, that the plot is asymmetric, because the same does not apply to the negative changes (i.e., n > m). We specifically see this by looking at the difference in same responses. Note that while on the positive side of the plot the percentage of same responses decreases linearly, on the negative side these are practically identical, i.e., the responses have flattened. A quantitative comparison of the (negative) same responses yields differences of less than 5% in all conditions. This means the direction of change is also important, since the perception of anger/sadness is most visible when the change is positive ( n < m). The same is true for the plots at 50% and 75% change given in Figures 4B and 4C. This suggests the face space is warped, also a known phenomenon in psychology (Tversky, 1977) and some face recognition tasks (Rotshtein, Henson, Treves, Driver, & Dolan, 2005), but previously not reported in the perception of facial expressions.

Asymmetries like the one observed in these results arise when the more salient of the two objects shown to the subject is placed in a different location in the psychological space and, hence, plays a distinct role. We note that when we go from large transformation (e.g., the extremes at 100%) to a smaller one (e.g., 25%), the density of points in the face space increases. This is because the second image is closer to the mean face, and the density of points representing faces increases as we approach the mean face. As we get closer to the dense areas, it becomes harder to distinguish between percepts, and the perception of sadness and anger diminishes. The opposite is true when we go from a denser to a less dense region.

We also performed an analysis to assess any learning that may have taken place over the course of an experimental session and may be responsible for the increased perception of sadness and anger. To assess the influence of learning on the subjects' responses, the percentages of correct responses were examined across the sequence of trials. Responses that reflected the actual change in deformation were considered correct, i.e., less responses were considered correct when the changes were negative, likewise, more responses were considered correct for positive changes. The data revealed a moderate improvement of around 5% from the first trial to the final one. To see whether this improvement could have affected the results reported above, we divided the data into two halves to see if this improvement was prominent in one of the conditions. We observed that all conditions had an almost identical increase, and thus the pattern of the data remained unaltered, i.e., a plot of the first half of the data and a plot of the second half have the exact same (linear) pattern, with the first-half plot being about 5% below that of the second-half. The reported data sits in between these two plots and still has the exact same observable pattern. Hence, although there was a small learning effect, this did not change the pattern of responses described in this section.

Experiment 2

Although the results reported above are in agreement with our hypothesis that a shorter than average eye-to-mouth distance increases the perception of anger and a larger than average distance increases the perception of sadness, other explanations may be possible. In particular, it has been argued that as a face deviates from the norm face, negative affect toward it increases (Zebrowitz, 1997). It is possible that when subjects noticed a deviation from the mean face in the stimuli presented above, they tended to respond toward the negative category rather than the actual impression of sadness and anger. If this alternate hypothesis were true, subjects would not be able to distinguish between the sad and angry stimuli defined above when presented in the same experiment. To counter this alternative explanation of our results, we prepared a second experiment where subjects were forced to make decisions of sadness versus anger.

Methods

The same stimuli defined in Experiment 1 were used in this second experiment. However, the images were now used in a single session. Face images were presented sequentially in 500 pairs with the identity, type and magnitude of displacement randomly selected for the initial image ( Figure 2). The identity and type of configural change applied to the second image remained the same as in the first image. No image pair was used more than once.

As opposed to our first experiment presented above, the magnitude of configural change in the second image was randomly selected from all cases representing no change or an increase in displacement relative to the first image, i.e., a positive change as defined in Experiment 1. For example, if the initial image displayed the nose–mouth down condition at 50% displacement, the second image could only be chosen from the nose–mouth down condition at 50%, 75%, or 100% displacement. Images were shown in sequences using the same procedure described in Experiment 1. Seventeen subjects, with normal or corrected to normal vision, were asked to indicate whether the second image seemed more angry, the same, or more sad relative to the first image displayed. None of these subjects had participated in the previous study.

Results

Subject responses are summarized in Figure 5. The abscissa reflects the magnitude and direction of change from the first image to the second image in a pair. The 0% condition represents all cases in which the first image and second image were identical, regardless of the magnitude of displacement. Positive percentages to the right of the 0% represent cases in which the displacement of nose–mouth down and eyes–brows up was increased . Positive percentages to the left of the 0% represent cases in which displacement of nose–mouth up and eyes–brows down was increased. Therefore, the first group (i.e., nose–mouth down and eyes–brows up) corresponds to a set of stimuli with larger than average eye-to-mouth distance and an increment of it in the second image relative to the first. The second group (i.e., nose–mouth up and eyes–brows down) includes the stimuli with smaller than average eye-to-mouth distance and a smaller distance in the second image relative to the first.

Cumulative responses for Experiment 2. Significant differences at p < 0.01 are denoted by an asterisk. On the left-hand side of the 0% condition, we show the responses obtained when subjects see the angry condition stimuli. On the right-hand side of the 0% condition, we have subjects' responses to the sad stimuli.

Figure 5

Cumulative responses for Experiment 2. Significant differences at p < 0.01 are denoted by an asterisk. On the left-hand side of the 0% condition, we show the responses obtained when subjects see the angry condition stimuli. On the right-hand side of the 0% condition, we have subjects' responses to the sad stimuli.

If our hypothesis were true, then subjects should select the more sad option when they see an image of the first group and the more angry option when they observe the images in the second group. If subjects were responding to random negative labels when the second image deviates more from the mean face than the first image, then they would equally respond to more sad and more angry in the two change conditions (i.e., right and left of the 0% condition). In Figure 5, we see that subjects' selections clearly agree with our hypothesis. This result further demonstrates that the perception of sadness and anger increases as the difference between the first and second image grows. Most importantly, this result substantiates that there is no perception of anger on the side of the face space representing sadness, and no perception of sadness on the side of the space describing anger. These results provide further proof for a norm-based coding, with distinct emotion perception at each side of the normative face.

A χ2 goodness-of-fit test was used to compare the more angry, same, and more sad response profiles at different levels of configural change to the response profile for 0% change. The χ2 test was applied to frequency data in this case. For the cumulative responses, the χ2 test shows a significant difference for all levels of change relative to no change (0%). The percentage of times subjects selected the more sad response increased linearly ( r2 = 0.978) with the difference in distance between eyes and mouth as in Experiment 1. Similarly, more angry responses increased linearly ( r2 = 0.994) as the eye–mouth distance decreased.

An analysis to assess the existence of any learning curve was also conducted. As we did in Experiment 1, the responses were examined across the sequence of trials by plotting the percentage of correct responses over the trial number. No difference between the trials was observed. A comparison of the first and second halves of the responses showed differences of less than 1%. Hence, no significant learning that could affect the results reported above was observed in the second experiment.

Discussion

Identifying the dimensions of the computational space underlying the representation of faces and emotions has proven to be a challenging problem, yet it is key to the understanding of the processes associated with these tasks and, ultimately, cognition.

It seems evident from the results reported here that configural effects (i.e., second-order relations) can influence the perception of emotion in an otherwise neutral face. The changes in configuration studied in this paper were made without adding any of the distortions associated with activation of the facial musculature. This implies that face appearance (which is a direct consequence of the physical shape of the skull and jaw and the muscles overlying these) strongly influences people's perception. Furthermore, this perceptual bias is predictable, since the number of times the perception of anger and sadness is observed over many trials increases linearly with the increase/decrease of the distance between the baseline of the eyes and the mouth. Hence, these results suggest that motion primitives are not the sole features used by the cognitive system when representing facial expressions of emotion.

A similar effect may be expected in other facial expressions of emotion. However, not all facial expressions carry a vertical configural change and, hence, not all the expressions may in fact be affected by configural cues. Moreover, it is still possible that shape and motion cues still carry a larger weight when dealing with general expressions of emotion. In a recent paper, Nusseck, Cunningham, Wallraven, and Bülthoff ( 2008) study the different contributions of motion cues to several facial expressions. The results reported in the present paper suggest that additional studies which include configural and shape features on top of motion ones should also be conducted.

The results reported in this paper also justify the general observation that some faces seem sadder or angrier than average even when no facial movements are present. This is also clear from some face composites constructed by people. For example, young children tend to drop their jaws as much as possible when they want to show they are sad. This creates the impression that the baseline of the mouth is more separated from the baseline of the eyes than usual, resulting in the perception of sadness. A similar effect seems to be the one responsible for an increased perception of sadness in the male subject of Wood's “American Gothic” ( Figure 6A). It is clear from this picture that, although there is no strong facial expression in the male character, the distance between eyes and mouth is uncannily large. These physical characteristics create the illusory percept of sadness. To demonstrate this effect, we can now reduce the distance between eyes and mouth in the male character ( Figure 6B). Most viewers experience a reduction in the level of sadness evoked from the male's face and an increase in the perception of anger. Also, we showed a selection of the original neutrals and their angry and sad sets ( Figure 1) in isolation. For the modified images, we used the set with 75% of the deformation for nose and mouth down (sad) and nose and mouth up (angry). Viewers consistently classify the angry set as angry, the sad set as sad, and the neutral set as neutral.

These results can be an overgeneralization of the perception of actual sad and angry facial expressions. Evidence for this was given earlier when we illustrated the difference in brow–mouth distances in angry, sad and neutral expressions. Furthermore, the percentage of times that subjects select the more category increases as the distance from the norm face grows, which suggests a norm-based coding. In this sense, as one increases the distance from the norm, the percept becomes clearer, i.e., more noticeable.

These results also suggest that an increase in the eye to mouth distance will increase the perception of sadness in sad expressions, while a decrease of this distance will make angry expressions look angrier. This effect is illustrated in Figure 7. In Figure 7A, we show the original sad face expression and the same image with a larger eye–mouth distance. In Figure 7B, we show the original image of an angry expression and the modified image where the eye–mouth distance has been decreased. When shown to human subjects, the modified images are generally perceived as sadder and angrier than their original counterparts. In some instances, configural changes may lead to the perception of a distinct expression. Therefore, even if the configural effect described in this paper were found to have a small influence in other expressions, it needs to be considered in studies on anger and sadness.

An immediate consequence of these results is in studies of facial expressions of emotion. It is clear that these may be biased toward one result or another depending on the configural information of the particular faces used. According to the results reported in this paper, one would need to use images of faces close to the mean (prototype) face, because these are the ones less affected by the configurational bias. However, due to the increased density near the center of the face-space, certain discrimination tasks would be made more difficult. Moreover, the cognitive face space for expression shares a number of aspects with the face space for identity. The similar nature of both face spaces suggests that there is some direct or indirect interaction between expression and identity (Martinez, 2003). Also, deficits such as those in schizophrenics may be partly rooted in an impairment of facial expression recognition. It has been recently proposed (Chambon, Baudouin, & Franck, 2006) that configural information may be related to such deficits. A better understanding of the underlying face space will facilitate studies in these directions.

Acknowledgments

We thank the reviewers for their constructive comments. This research was supported in part by the National Institutes of Health grant R01 DC 005241 and the National Science Foundation grant IIS 0713055. DN was supported in part by a fellowship from Ohio State's Center for Cognitive Sciences.

Facial configurations. (A) The original face with a neutral expression. (B, C) The angry group comprises face images in which the distance between the eyes and mouth has been decreased. In (B), this is accomplished by displacing the eyes and brows downward. In (C), the mouth is displaced upward. (D, E) The sad group comprises faces in which the distance between the eyes and mouth is increased. This is done by displacing the eyes and brows upward (D) or by displacing the mouth downward (E). The examples shown here correspond to a 20-pixel displacement of the facial features (100%).

Figure 1

Facial configurations. (A) The original face with a neutral expression. (B, C) The angry group comprises face images in which the distance between the eyes and mouth has been decreased. In (B), this is accomplished by displacing the eyes and brows downward. In (C), the mouth is displaced upward. (D, E) The sad group comprises faces in which the distance between the eyes and mouth is increased. This is done by displacing the eyes and brows upward (D) or by displacing the mouth downward (E). The examples shown here correspond to a 20-pixel displacement of the facial features (100%).

Stimulus time-line. An initial visual cue is shown in a randomly selected location to alert the subject as to where the first image will be displayed. After 600 ms, the first face image is displayed for a period of 600 ms. A mask (blank screen) is then presented for 500 ms. A second visual cue is displayed in another randomly selected location for 600 ms. The second face image is then displayed for 600 ms. Finally, a blank screen with a large question mark (“?”) at the center is displayed until the subject responds with a key-press.

Figure 2

Stimulus time-line. An initial visual cue is shown in a randomly selected location to alert the subject as to where the first image will be displayed. After 600 ms, the first face image is displayed for a period of 600 ms. A mask (blank screen) is then presented for 500 ms. A second visual cue is displayed in another randomly selected location for 600 ms. The second face image is then displayed for 600 ms. Finally, a blank screen with a large question mark (“?”) at the center is displayed until the subject responds with a key-press.

Subject responses. (A) Responses for Sad Group: Relative percentages for less–same–more responses at each level of change between the pairs of faces for both downward displacement of the mouth and upward displacement of the eyes and brows. (B) Responses for Angry Group: Relative percentages for less–same–more responses at each level of change between the pairs of faces for both upward displacement of the mouth and downward displacement of the eyes and brows. (C) Combined responses for Sad Group and Angry Group: Cumulative responses for the sad group (nose/mouth down and eyes/brows up) and the angry group (nose/mouth up and eyes/brows down). Significant differences ( p < 0.01) among less–same–more responses at each level are marked with an asterisk. Bars denote standard error.

Figure 3

Subject responses. (A) Responses for Sad Group: Relative percentages for less–same–more responses at each level of change between the pairs of faces for both downward displacement of the mouth and upward displacement of the eyes and brows. (B) Responses for Angry Group: Relative percentages for less–same–more responses at each level of change between the pairs of faces for both upward displacement of the mouth and downward displacement of the eyes and brows. (C) Combined responses for Sad Group and Angry Group: Cumulative responses for the sad group (nose/mouth down and eyes/brows up) and the angry group (nose/mouth up and eyes/brows down). Significant differences ( p < 0.01) among less–same–more responses at each level are marked with an asterisk. Bars denote standard error.

Combined responses for each level of displacement in Sad and Angry Groups: (A) 25% displacement; (B) 50% displacement; (C) 75% displacement. These differences are specified in the x-axis in each plot as n%– m%, where n% corresponds to the percentage of displacement in the first image and m% that of the second. For the positive changes ( n < m) we see the familiar linear pattern, while for negative changes ( n > m) this pattern changes (flattens). This asymmetry is further illustrated by the statistical significant values ( p < .001) marked with asterisks.

Figure 4

Combined responses for each level of displacement in Sad and Angry Groups: (A) 25% displacement; (B) 50% displacement; (C) 75% displacement. These differences are specified in the x-axis in each plot as n%– m%, where n% corresponds to the percentage of displacement in the first image and m% that of the second. For the positive changes ( n < m) we see the familiar linear pattern, while for negative changes ( n > m) this pattern changes (flattens). This asymmetry is further illustrated by the statistical significant values ( p < .001) marked with asterisks.

Cumulative responses for Experiment 2. Significant differences at p < 0.01 are denoted by an asterisk. On the left-hand side of the 0% condition, we show the responses obtained when subjects see the angry condition stimuli. On the right-hand side of the 0% condition, we have subjects' responses to the sad stimuli.

Figure 5

Cumulative responses for Experiment 2. Significant differences at p < 0.01 are denoted by an asterisk. On the left-hand side of the 0% condition, we show the responses obtained when subjects see the angry condition stimuli. On the right-hand side of the 0% condition, we have subjects' responses to the sad stimuli.