Human observers are experts at face recognition, yet a simple 180° rotation of a face photograph decreases recognition performance substantially. A full understanding of this phenomenon—which is believed to be important for clarifying the nature of our expertise in face recognition—is still waiting. According to a long-standing and influential hypothesis, an inverted face cannot be perceived as holistically as an upright face and has to be analyzed local feature by local feature. Here, we tested this holistic perception hypothesis of the face inversion effect by means of a gaze-contingent stimulus presentation. When observers' perception was restricted to one fixated feature at a time by a gaze-contingent window, performance in an individual face matching task was almost unaffected by inversion. However, when a mask covered the fixated feature, preventing the use of local information at high resolution, the decrement of performance with inversion was even larger than in a normal—full view—condition. These observations provide evidence that the face inversion effect is caused by an inability to perceive the individual face as a whole rather than as a collection of specific features and thus support the view that observers' expertise at upright face recognition is due to the ability to perceive an individual face holistically.

Introduction

Humans are remarkably good at recognizing people from their face even across dramatic changes of viewpoint, facial expression, or even aging. However, it has long been known that turning a face upside down makes it particularly hard to recognize (Goldstein, 1965; Hochberg & Galper, 1967; Rock, 1974; Yin, 1969). This effect of inversion on recognition performance is much larger for faces than for other visual object categories (e.g., Diamond & Carey, 1986; Leder & Carbon, 2006; Robbins & McKone, 2007; Scapinello & Yarmey, 1970). Since simple inversion of a face photograph disrupts human expertise in face recognition while preserving low-level visual parameters of the stimulus, inverted faces have been widely used as control stimuli in behavioral (for reviews, see Rossion, 2008; Rossion & Gauthier, 2002; Valentine, 1988) as well as neural studies (e.g., Haxby et al., 1999; Jacques, d'Arripe, & Rossion, 2007; Mazard, Schiltz, & Rossion, 2006; Perrett, Oram, & Ashbridge, 1998; Yovel & Kanwisher, 2005). More importantly, the loss of human expertise in face recognition associated with inversion suggests that clarifying this phenomenon will go a long way towards a better understanding of human face recognition in general.

According to a long-standing and influential hypothesis of the face inversion effect, while an upright face is processed at a global level—i.e., that of the whole face—an inverted face would have to be processed at a more local level, element by element, or analytically (Farah, Tanaka, & Drain, 1995; McKone, 2004; McKone, Martini, & Nakayama, 2003; Rossion, 2008, 2009; Yin, 1969). This idea is well known among artists, who are taught to draw a face upside down when they have to draw each feature of a face portrait realistically, without being influenced by the expression of the face as a whole (Edwards, 2002). However, previous experimental research has provided only indirect evidence for this holistic/upright vs. analytic/inverted distinction, showing that, when having to recognize a particular feature of a face (e.g., the mouth), the position and identity of the other features influences one's judgment for upright but not for inverted faces (e.g., Rhodes, Brake, & Atkinson, 1993; Sergent, 1984; Tanaka & Farah, 1993; Tanaka & Sengco, 1997; Young, Hellawell, & Hay, 1987). While this suggests that inversion disrupts the interactivity of processing facial features, the question of whether the spatial window of vision from which diagnostic information can be extracted encompasses the whole upright face but is limited to one face feature at a time (i.e., is spatially constricted) for inverted faces remains unclear (Rossion, 2008, 2009).

Here we took a novel approach to test the hypothesis of a holistic/analytic dichotomy in upright/inverted face recognition more directly. We used a gaze-contingent stimulus presentation method in which the window of vision can be physically restricted to roughly one face feature, depending on the observer's own fixation gaze (van Diepen, De Graef, & Van Rensbergen, 1994; for the origin of the method in reading research, see Rayner, 1975, 1998) (Figure 1A). We reasoned that under this viewing condition, the decrement in performance for recognizing inverted faces should be significantly reduced, given that inversion effects on isolated features are rather small and cannot predict the magnitude of the effect for the whole face (Bartlett, Searcy, & Abdi, 2003; Rakover & Teucher, 1997).

Gaze-contingent stimulus presentation: (A) window condition: here the observer is fixating the mouth, which is then the only feature available. (B) Mask condition: here the observer is fixating the left eye, which is covered.

Figure 1

Gaze-contingent stimulus presentation: (A) window condition: here the observer is fixating the mouth, which is then the only feature available. (B) Mask condition: here the observer is fixating the left eye, which is covered.

Importantly for testing our hypothesis, we contrasted this viewing condition with the opposite situation, in which the observer's fixated feature was masked by means of an eye-contingent mask. Here, the observer could not rely on the fixated feature in central vision but was able to benefit from the remaining whole of the face ( Figure 1B). Even though the observer could still process one element at a time (e.g., the element next to the mask border), the most adaptive strategy in this latter condition is to rely on the whole of the face, visible around the mask ( Figure 1B). Our prediction, drawn from the holistic account of the face inversion effect, was then that, under this central feature masking condition, the inversion effect should be substantially increased. This is because the mask would prevent the optimal featural strategy for inverted faces, and at this orientation the observer would be unable to extract information from the whole the face.

We tested 16 observers who had to encode a face shown in full view and had to match the face with one of 2 face photographs which were presented side by side, either in full view (baseline condition) or with only the central window of vision revealed (window condition) or masked (mask condition). Faces were presented either upright or inverted.

Methods

Participants

Sixteen naïve participants took part in the experiment. All participants had normal or corrected-to-normal visual acuity. The participants were tested individually in the presence of the experiment leader. Each participant also completed the computerized version of the Benton Face Recognition task (Benton & Van Allen, 1972) in order to ensure normal face recognition skills. All participants performed within the normal range (all scores above 75%) on this test.

Procedure/experimental setup

A delayed matching task was conducted in which a 1000-ms presentation of an unknown adult reference face was followed by two faces presented side by side, one of which corresponded to the target face. The participant's task was to indicate the target. The eye movements of the participant were continuously monitored.

The course of a trial is presented in Figure 2. Each trial started with a standard drift correction with a central fixation cross, correcting small deviations from the original calibration due to, for instance, small head movements. This was followed by a fixation cross on the left of a gray-scale blurred face, consisting of the average of all faces, indicating the position and orientation of the first face. The participant was instructed to fixate the fixation cross, causing its disappearance, and then to saccade to the face. From the moment the participant fixated the blurred face, it changed into the target face that had to be encoded. After 1000 ms, the face was replaced by two faces in the same orientation as the first face, one on each side of the screen. The participant could freely explore both faces for an unrestricted time period. In one third of the trials, the faces were completely visible (full view condition), in one third of the trials, a gaze-contingent mask covered the central part of the visual field (mask condition), and in the remaining third of the trials, only the fixated feature was visible through a window (window condition). The position of the mask and window was constantly adjusted to the gaze position of the participant. During the exploration of the faces, the average face replaced the face that was not fixated, in order to provide a reference frame for saccade planning to the non-fixated face in all viewing conditions. Furthermore, this way, the amount of information from the non-fixated face during the exploration of the fixated face was similar in all three viewing conditions. The response was given by hitting the left or right answer button on the keyboard.

Course of a trial for the upright and the inverted condition. An average non-diagnostic face was presented (in gray levels) and participants had to fixate the cross outside of the face area to get the first face stimulus to encode for 1 s. When the face disappeared, it was replaced by two different individual faces presented side by side. The individual face (target or distractor) was displayed (in full, window or mask) only when the participant fixated it, while the non-fixated face was replaced temporarily by the average face. Above: window condition, upright orientation. Below: mask condition, inverted orientation.

Figure 2

Course of a trial for the upright and the inverted condition. An average non-diagnostic face was presented (in gray levels) and participants had to fixate the cross outside of the face area to get the first face stimulus to encode for 1 s. When the face disappeared, it was replaced by two different individual faces presented side by side. The individual face (target or distractor) was displayed (in full, window or mask) only when the participant fixated it, while the non-fixated face was replaced temporarily by the average face. Above: window condition, upright orientation. Below: mask condition, inverted orientation.

The stimulus set contained 10 male and 10 female faces (KDEF database, Lundquist, Flykt, & Öhman, 1998) from which the external features were removed, but the individual head shape of the face was preserved. For each face, there were two pictures, taken on a different moment in time and therefore slightly differing from each other. Both pictures also differed slightly in lighting conditions. The faces were randomly combined in pairs of two males or two females with the same lighting conditions, but with a different lighting condition than the to be encoded face, in order to prevent participants from using image-based color and pigmentation cues to select the correct target face.

Apparatus

The stimuli were displayed using Presentation software, on a 22-in. Sony Trinitron monitor at a viewing distance of 58 cm with a spatial resolution of 1600 × 1200 pixels and a refresh rate of 85 Hz. The height of the faces was 11 deg, the distance between the inner borders of the faces was approximately 6 deg, and the elliptical window and mask subtended 4 deg horizontally by 3 deg vertically. Both stimulus display and response registration were handled by an Intel Centrino vPro. Eye movements were registered with an SR Research Eyelink 1000 remote eye tracker at a sampling rate of 250 Hz and with gaze position error smaller than 0.5°. Head movement was restricted by a chin and head rest.

The experiment was subdivided in 8 blocks of 36 trials. There was a continuous alternation between a block with upright and one with inverted faces. Within each block, each of the three viewing condition occurred 12 times in a random and unpredictable order. In this way, the viewing condition that would follow the exploration of the first face was unpredictable and could therefore not influence the perceptual strategies used to encode the first face. Participants were instructed to respond as fast and accurately as possible. Only response times from trials resulting in a correct response and within 3 standard deviations from each participant's mean were included in the response time analysis.

Results

Performance

The average accuracy (proportion correct) and response times (i.e., time between stimulus appearance and subject response) for each viewing condition and orientation are shown in Figures 3 and 4. All accuracy scores were well above chance. For faces presented in full view, accuracy decreased about 5% and the response times increased by about 350 ms (about 30%) when the faces were inverted. In the window condition, performance appeared to decrease much less (accuracy 1.5% decrease, response times about 250 ms increase). In the mask condition, the decrease in accuracy almost doubled (9.5%) and the increase in RTs was also more pronounced in this condition (about 500 ms). Statistical analyses largely confirmed these observations.

Accuracy in proportion correct responses for each combination of viewing condition and orientation. Compared to the full view condition, the drop in accuracy with inversion was increased in the mask condition and decreased in the window condition. The error bars represent standard errors.

Figure 3

Accuracy in proportion correct responses for each combination of viewing condition and orientation. Compared to the full view condition, the drop in accuracy with inversion was increased in the mask condition and decreased in the window condition. The error bars represent standard errors.

Response times in milliseconds for each combination of viewing condition and orientation. Compared to the full view condition, the increase in RT with inversion was further increased in the mask condition. The error bars represent standard errors.

Figure 4

Response times in milliseconds for each combination of viewing condition and orientation. Compared to the full view condition, the increase in RT with inversion was further increased in the mask condition. The error bars represent standard errors.

A within-subjects analysis of variance (ANOVA) on accuracy rates showed a main effect of orientation with higher accuracy for upright than for inverted faces, F(1, 15) = 12.89, p = 0.003, and a main effect of viewing condition, F(2, 30) = 28.09, p < 0.001). These effects were qualified by a significant interaction between viewing condition and orientation, F(2, 30) = 3.87, p = 0.032. Tukey adjusted pairwise comparisons showed that, when the faces were upright, accuracy did not differ in the full view and mask conditions ( p = 0.882), and both the full view and the mask conditions led to superior performance than the window condition ( p < 0.001 for both). However, and most importantly for our hypothesis, when the face was inverted, participants performed worse in the mask condition than in the full view condition ( p = 0.027). For inverted faces, accuracy was higher in the full view than in the window condition ( p < 0.001) but did not differ between the mask and window condition ( p = 0.121). While there were large significant effects of inversion for both the full view and mask conditions ( p < 0.001), there was no significant effect for the window condition ( p = 0.328).

The inversion effect—defined as the subtraction of the accuracy for upright versus inverted faces—was significantly larger for the mask than for the window condition ( p = 0.026). The inversion effect with full view did not significantly differ from the effect with a window ( p = 0.408) or with a mask ( p = 0.348). In summary, the data on accuracy support the hypothesis of an increase in the magnitude of the inversion effect in the central mask condition and a decrease of the effect in the window condition.

A within-subjects ANOVA on correct RTs showed a main effect of orientation with faster responses for upright than for inverted faces, F(1, 15) = 23.16, p < 0.001, and a main effect of viewing condition, F(2, 30) = 51.48, p < 0.001. These effects were qualified by a significant interaction between viewing condition and orientation, F(2, 30) = 3.87, p < 0.023. Pairwise comparisons with Tukey adjustment showed that RTs at upright orientations were faster in full view than in the mask and window conditions ( p < 0.001) and faster in the mask than in the window condition ( p < 0.001). These differences were also found at inverted orientations ( p < 0.001), and there were significant effects of inversion in all conditions ( p < 0.001 for full and mask conditions, p < 0.049 for the window condition). The magnitude of the inversion effect in response speed was larger for the mask condition than for the full view ( p = 0.037) and marginally larger for the mask than for the window condition ( p = 0.068). The full view and window conditions did not significantly differ from each other ( p = 0.959). The data on correct RTs were thus consistent with the accuracy rates.

Since accuracy and RT each carry part of the effects of interest, we also computed inverse efficiency scores (response time (in seconds) divided by accuracy (proportion correct); Townsend & Ashby, 1983), which gives a global measure of performance in the task for the different conditions. Lower inverse efficiencies correspond to better performance. A within-subjects ANOVA on the inverse efficiency scores showed a main effect of orientation with better performance for upright than for inverted faces, F(1, 15) = 14.87, p = 0.002, and a main effect of viewing condition, F(2, 30) = 52.16, p < 0.001. These effects were qualified by a significant interaction between viewing condition and orientation, F(2, 30) = 4.74, p < 0.016. Pairwise Tukey adjusted comparisons showed that performance at upright orientations was better both in full view and with a mask than in the window condition (p < 0.001). Performance with the mask did not significantly differ from that in full view (p = 0.159). At inverted orientations, performance in full view was better than with a mask (p < 0.001) or window (p < 0.001). Performance with a mask was also better than with a window (p < 0.001). Performance was better for upright than for inverted faces in both the full view and the mask conditions (p < 0.001 for both), but not in the window condition (p = 0.198). The inversion effect was larger for the mask condition than for the full view condition (p = 0.033) and marginally larger for the mask condition than for the window condition (p = 0.063). The inversion effect in the full view and the window condition did not differ from each other (p = 0.958).

In summary, the performance data showed that, compared to the full view (i.e., baseline) condition, observers' performance for upright faces was more affected in the window condition than in the mask condition. However, the inversion effect was almost non-existent in the window condition, while it was even increased in the masked condition, supporting the holistic hypothesis of the face inversion effect.

Eye movements

Fixation patterns for the different conditions are shown in Figure 5. There were no clear and systematic differences between conditions and orientations in terms of location of eye gaze fixations, subjects fixating mostly the region of the eyes (e.g., Althoff & Cohen, 1999; Henderson, Williams, & Falk, 2005; Yarbus, 1967). Yet, interestingly, observers also seemed to fixate slightly more the center of the face, in between features, in the full view and mask conditions, for the face displayed on the left at least, and only in the upright orientation (Figure 5).

Heat maps with the Z-score values of the relative number of fixations per trial on a given screen position on the stimuli for each condition separately. Fixation positions were smoothed using a Gaussian filter with a sigma of 30 pixels in order to account for the fixation position variability when fixating a certain point. The heat maps for the inverted faces were horizontally flipped in order to facilitate comparison with the upright faces. Only trials resulting in a correct response were included.

Figure 5

Heat maps with the Z-score values of the relative number of fixations per trial on a given screen position on the stimuli for each condition separately. Fixation positions were smoothed using a Gaussian filter with a sigma of 30 pixels in order to account for the fixation position variability when fixating a certain point. The heat maps for the inverted faces were horizontally flipped in order to facilitate comparison with the upright faces. Only trials resulting in a correct response were included.

Considering the small differences between conditions, the eye movements were only quantitatively analyzed in terms of duration and number.

For fixation duration, only the main effect of viewing condition was significant, F(2, 30) = 51.36, p < 0.001, without reliable interaction between condition and orientation ( p > 0.05). Fixations were significantly shorter with a mask ( M = 268.95) than with full view ( M = 301.41, p < 0.007) and shorter with full view than with a window ( M = 363.34, p < 0.001).

There was a main effect of viewing condition, F(2, 30) = 27.83, p < 0.001, and of orientation, F(1, 15) = 17.07, p < 0.001, on the number of fixations. There was also an interaction effect between the two factors, F(2, 30) = 9.64, p < 0.001. With a mask, participants made more fixations when faces were inverted ( M = 8.71) than when they were upright ( M = 5.99, p < 0.001). For the window (upright: M = 7.14; inverted: M = 8.03) and full view condition (upright: M = 4.13; inverted: M = 5.07), the effect of inversion was not significant ( p > 0.05). The fact that only with a mask participants made more fixations in inverted than in upright faces further supports the behavioral data, indicating that the information used with a mask for upright faces cannot be extracted easily for inverted faces.

Replication: Smaller retinal size

As predicted by the holistic account of the face inversion effect, the data showed a larger inversion effect with a mask covering the fixated feature of the face than with a window revealing only that feature.

In order to strengthen and generalize these observations, we conducted an exact replication of this experiment on a new group of participants, with faces presented at a much smaller retinal size. To do that, we simply increased the distance between the participant and the stimulus display. Consequently, the absolute size of the window/mask was much smaller, but its size relative to the face stimulus was identical (i.e., it revealed/covered one feature at a time respectively). The rationale behind this experiment was that the decrease/increase of the inversion effect in the window/mask conditions is a phenomenon that should be face-based rather than based on the observer's absolute size of visual stimulation/deprivation, respectively.

The experimental procedure and stimuli were identical to the original experiment. The distance between the monitor and the participant was increased to 95 cm. The height of the faces then subtended 6.5 deg, and the size of the window and mask was 2.3 by 1.8 deg.

Ten new participants (age 21 years, on average, 8 females) each completed 4 blocks, half of which contained upright, the other half inverted faces, of 48 trials, 15 of each viewing condition in a random and therefore unpredictable order. All participants had normal or corrected to normal visual acuity.

As illustrated in Figure 6, the results of the experiment with a smaller retinal size of the mask/window largely replicated the findings of the original study.

Accuracy in proportion correct responses and response times in milliseconds for each combination of viewing condition and orientation in the replication experiment with smaller retinal size of the stimuli. Note the similar pattern of performance as for the larger stimuli, in particular the increased/decreased inversion effect in the mask/window conditions respectively. The error bars represent standard errors.

Figure 6

Accuracy in proportion correct responses and response times in milliseconds for each combination of viewing condition and orientation in the replication experiment with smaller retinal size of the stimuli. Note the similar pattern of performance as for the larger stimuli, in particular the increased/decreased inversion effect in the mask/window conditions respectively. The error bars represent standard errors.

An ANOVA on accuracy scores showed a main effect of viewing condition, F(2, 18) = 26.56, p < 0.001, and orientation, F(1, 9) = 6.90, p = 0.028, and a significant interaction effect between the two factors, F(2, 18) = 3.84, p = 0.041. When the face was in an upright orientation, participants made more mistakes with a window than in full view ( p < 0.001) and with a mask ( p < 0.001). The difference between the full view and mask conditions was not significant ( p = 0.229). For inverted faces, performance was better with a full view than in the window ( p = 0.007) and the mask condition ( p = 0.018). Performance in the latter two conditions did not differ significantly from each other ( p > 0.05).

The inversion effect was significant only in full view, F(1, 18) = 9.13, p = 0.007, and with a mask, F(1, 18) = 13.20, p = 0.002. In contrast, with a window, performance did not significantly decrease when faces were inverted, F(1, 18) = 0.35, p = 0.561. The size of the inversion effect was larger with the mask than for full view ( p = 0.018) and window conditions ( p = 0.036). The difference between full view and window conditions was not significant ( p = 0.947).

An ANOVA on correct response times revealed a significant main effect of viewing condition, F(2, 18) = 12.42, p < 0.001, and orientation, F(1, 9) = 15.58, p = 0.003, as well as a significant interaction effect between the two factors, F(2, 18) = 5.22, p = 0.016. Participants were slower with a window than in full view for both upright and inverted faces ( p = 0.002 and p < 0.001, respectively) and slower with a window than with a mask for upright faces only ( p = 0.013; for inverted faces p = 0.829). Response times were longer with a window than with a mask for upright ( p = 0.001) but not for inverted faces ( p = 0.755). The increase in response times with inversion was only significant with a mask, F(1, 18) = 34.88, p < 0.001, but not with full view, F(1, 18) = 1.10, p = 0.309, or with a window, F(1, 18) = 0.78, p = 0.388. The inversion effect was smaller with a window than with full view ( p < 0.001) or with a mask ( p = 0.005). The difference between the latter two conditions was not significant ( p = 0.535).

An ANOVA on inverse efficiency scores showed a significant main effect of viewing condition, F(2, 18) = 53.17, p < 0.001, and orientation, F(1, 9) = 12.12, p = 0.007, and a significant interaction between the two factors, F(2, 18) = 11.37, p < 0.001. The inversion effect was significant for full faces, F(1, 18) = 6.92, p = 0.017, and with a mask, F(1, 18) = 33.88, p < 0.001, but not with a window, F(1, 18) = 0.59, p = 0.451. Furthermore, the inversion effect with a window was significantly smaller than with full view ( p = 0.032) or with a mask ( p < 0.001). The inversion effect size for the mask condition was the largest, but it was not significantly larger than in full view ( p = 0.113).

In summary, the data of this second experiment, in which the absolute size of the stimulus (and window/mask) was smaller than in the main experiment again supported the hypothesis by showing an increased inversion effect in the mask condition, and an almost non-existent inversion effect with a window. This phenomenon is thus largely independent of the absolute size of the window/mask and is related to their size relative to the face stimulus (i.e., the diagnostic area covered on the face) instead.

Discussion

Viewing one feature at a time reduces the inversion effect

With faces presented in full view, we replicated the well-known face inversion effect in our task, as previously observed in several behavioral studies: With the exact same stimuli to discriminate, observers performed better and faster with upright than with inverted faces. This inversion effect was observed despite the fact that the task was quite simple (forced choice discrimination of two individual faces, without time limit). The fact that the inversion effect in the full view condition was only of a few percents drop in accuracy is certainly related to the relative easiness of the task. This was done on purpose in order to be able to observe a high level of performance at upright orientation in all conditions. Nevertheless, the inversion effect on accuracy was highly significant, and inversion also slowed down participants significantly at full view (30% of RT increase).

When observers had their vision constrained in such a way that they could see only through a small central window, there was almost no inversion effect at all: There was no effect in accuracy and only a small increase in response times. A similar reduction of the inversion effect with reduced windows of vision was made in a previous study (Endo, 1986), but not with a gaze-contingent stimulation allowing the observer to extract online the most diagnostic local information. This observation indicates that the inversion effect is not caused primarily by a difficulty in perceiving local detailed facial features (Sekuler, Gaspar, Gold, & Bennett, 2004), which could be seen at the highest (foveal) resolution in the window condition. In this condition, the observer can extract local information from one feature at a time, but is unable to see two or more features (e.g., an eye and the mouth; the two eyes) simultaneously. Thus, the near absence of an inversion cost with an eye-contingent window supports the long-standing hypothesis that the reason why inverted face recognition is difficult is because observers are unable to simultaneously extract diagnostic information at different locations on an inverted face (Endo, 1986; Farah et al., 1995; Rossion, 2008, 2009; Sergent, 1984; Yin, 1969). Consequently, we would like to argue that this observation demonstrates that holistic face perception is impaired for inverted faces. Indeed, our experimental manipulation reproduced exactly what is hypothesized by the holistic account of the face inversion effect: When having to deal with a full inverted face, the observer would process one feature of the face at a time. In other words, inversion appears to constrict the functional visual field or perceptual field, which can be defined here as the area of vision where the observer can extract diagnostic information for individual face recognition (Rossion, 2008, 2009). Importantly, this reduction of the perceptual field is relative to the face stimulus rather than being absolute, as indicated by the replication of the present study with a smaller retinal size of stimulation.

Previous studies showed inversion costs for “isolated” features (e.g., Rakover & Teucher, 1997), but unless such a “feature” encompasses several elements of the face in a local configuration (e.g., the two eyes and eyebrows; Leder & Bruce, 2000; Leder, Candrian, Huber, & Bruce, 2001), these effects are relatively small. Most importantly, inversion effects for isolated features cannot explain the full face inversion effect for the whole face (Bartlett et al., 2003). The present observations, made with an original gaze-tracking method, go beyond these previous studies since here, in the window condition, the observer was not constrained in terms of the local information that he/she could use: participants could focus either on one feature, such as the mouth, or one eye, or the nose of the face, or even on many features in succession to recognize the target face. The only constraint was that only one feature, determined by the observer him/herself, was available at the same time on a potential target face.

The present observations also explain why, on full faces, patterns of gaze fixations may be highly similar for upright and inverted faces, despite a lower performance for inverted faces (Barton, Radcliffe, Cherkasova, Edelman, & Intriligator, 2006; Williams & Henderson, 2007; Figure 5 in the present study). With upright faces, the observer can fixate the eyes—the most salient and socially relevant area of the face—and yet be able to extract diagnostic information from the mouth at the same time. However, when the face is upside down, roughly the same dominant gaze fixation on the eyes may be observed (Barton et al., 2006; Williams & Henderson, 2007; Figure 5 here), yet our data suggest that in these conditions the observer is able to extract diagnostic information from a small area around each eye only, so that his/her performance is greatly affected.

Interestingly, we also noted that observers of the present study fixated towards the center of the face, in between the eyes, in the full view and mask conditions rather than on each eye per se. This was not the case in the window condition ( Figure 5). This difference between conditions disappeared for inverted faces, for which observers fixated on the eyes in all conditions ( Figure 5). This observation supports the recent proposal that such central fixations slightly below and in between the eyes may be optimal for fast recognition based on an overall appreciation of the whole face (i.e., the “center of mass of the face”; Hsiao & Cottrell, 2008; Orban de Xivry, Ramon, Lefèvre, & Rossion, 2008). It can be taken as a further indication that such a holistic processing of the individual face was impaired in the window condition and for all conditions when faces were presented upside down.

Increase of the face inversion effect when
the fixated feature is invisible

As observed previously without gaze-contingent stimulation (Endo, 1986; Inui & Miyamoto, 1984), observers' performance was already lower with a small window stimulation than in full view at upright orientation. Hence, one could argue that observers' performance may simply not be affected further by inversion in the window condition (i.e., a floor effect). However, the task was quite easy to perform and, given the long and unlimited durations of the first and second stimulus presentation, participants could perform quite well even in the window condition, leaving a lot of room for a further potential decrease in performance with inversion. Had we used only this comparison (full view vs. window), another possible argument against our interpretation would be that less information is available in the window condition, causing a smaller inversion cost. However, this alternative explanation does not hold: The comparison of the full view with the mask condition rules out this possibility. In this latter viewing condition, the observer could not rely on the fixated feature of the face, so that there was also less information available than in full view. Nevertheless, at upright orientations performance was much less affected with the mask than with the window. Most importantly, there was a significant increase of the inversion effect in the mask condition relative to the full face view. To our knowledge, this is the first report of an increase of the face inversion effect without altering the properties of the face itself (i.e., shape, spatial frequencies, diagnosticity of certain features or their relationships, etc.). Our manipulation only changed the way in which this information could be sampled by the observer. This observation indicates that it is not the amount of available information, or its degree of resolution, which is important to account for the inversion effect, but what really matters is how this information is sampled.

The increased inversion effect in the mask condition provides additional, complementary, support for the holistic account of the face inversion effect. In this condition of masked central vision, the optimal strategy for the observer is to extract information from the rest of the whole face, still visible outside the mask ( Figure 1B). Alternatively, we cannot exclude that the observer concentrated on a single feature at the border of the mask (e.g., using the mouth only, not the eyes when the mask is on the nose). However, the data suggest that participants did not rely on this strategy for upright faces. Indeed, such a piecemeal strategy in the mask condition would make them perform at the level of the window condition at best, but not better. In fact, if participants used only one feature at a time in the mask condition, they should have performed worse than in the window condition, given that the single feature they attended would be revealed at a lower resolution in the mask (outside of the fovea centralis) than in the window condition. Rather, the higher performance in the mask than in the window condition at upright orientations suggests that, in the mask condition, observers extracted information from several features available simultaneously over the whole face. Consequently, the largest decrease in performance for inverted faces in the mask condition fully supports the holistic account of the face inversion effect: Participants were unable to rely on multiple features simultaneously at inverted orientations.

The consequences of holistic perception disruption by inversion

A large number of behavioral studies have reported that, when observers have to discriminate between photographs of individual faces, inversion effects are more pronounced when the faces differ in the relative distance between their features (e.g., nose-mouth distance, interocular distance) than when they differ in the shape/color/texture of local features (e.g., blue round eye vs. brown oval eye) (e.g., Barton, Keenan, & Bass, 2001; Boutet, Collin & Faubert, 2003; Cabeza & Kato, 2000; Collishaw & Hole, 2000; Freire, Lee, & Symons, 2000; Goffaux, 2008; Goffaux & Rossion, 2007; Leder & Bruce, 2000; Leder et al., 2001; Le Grand, Maurer, Mondloch, & Brent, 2001; Malcom, Leung, & Barton, 2005; Rhodes, Hayward, & Winkler, 2006; Searcy & Bartlett, 1996; but see Riesenhuber, Jarudi, Gilad, & Sinha, 2004; Yovel & Kanwisher, 2004). These observations can be understood better if we assume that inverting the face constricts the perceptual field of the observer. Indeed, it is reasonable to assume that differences in terms of relative distances between features, which require considering multiple elements simultaneously (e.g., the 2 eyes for interocular distance; both the nose and mouth for nose/mouth distance) will be particularly difficult to perceive with a small spatial window of analysis. This will be most pronounced when diagnostic cues for individual face recognition concern long-range distances between features (e.g., eyes–mouth distance; Goffaux & Rossion, 2007; Sekunova & Barton, 2008) or are located far away from the observer's usual point of gaze (e.g., the mouth when fixating the eyes; Malcom et al., 2005; Sekunova & Barton, 2008). Other observations such as the large effects of inversion for faces differing in their global shape (as opposed to differences in terms of local features, Van Belle, De Smet, De Graef, Van Gool, & Verfaillie, 2009; or as opposed to surface reflectance, Jiang, Blanz, & Rossion, 2009) or for face photographs for which local information has been filtered out (low-spatial frequency faces; Boutet et al., 2003; Collishaw & Hole, 2000; Goffaux & Rossion, 2006; but see Gaspar, Bennett, & Sekuler, 2008; Goffaux, 2008) could also easily be accounted for by a disruption of holistic perception, i.e., a reduction of perceptual field (Rossion, 2008, 2009), following inversion.

A high-level vision account of the inversion effect

Why are human observers able to perceive an upright face more holistically than an inverted face, even though the two visual stimuli are virtually identical, except for their picture-plane orientation? The holistic view of the face inversion effect is not only a qualitative account, assuming that different perceptual processes (i.e., holistic vs. analytic) are involved to a different extent for upright and inverted faces, respectively (see Rossion, 2008; Rossion & Boremanse, 2008). It is also a high-level vision account: Human observers are capable to perceive the face at a global scale when it is upright yet appear to have to rely on a smaller scale of analysis when the exact same visual face stimulus is presented upside down. Importantly for a high-level vision account, the scale of analysis is not absolute but relative to the face stimulus. Here, the size of the central window (and mask) was chosen to reveal/cover roughly a single major internal feature of the face at a time,1 irrespective of the absolute face stimulus size. Hence, when the window/mask area was much smaller in absolute size (replication experiment), we observed the same pattern of result, because the faces were also made smaller in the same proportion.

Although the holistic view of the face inversion effect is a high-level vision account, we acknowledge that the face inversion effect is also constrained by low-level visual properties of the stimulus such as spatial frequencies (see Sergent, 1986). As such, we consider that it is not incompatible—rather complementary—with the recent low-level visual processing proposal that the face inversion effect in individual face recognition may be related to the degradation of the vertically ordered sequence of horizontal dark and light bands or “bar codes” of information making up a face (Dakin & Watt, 2009). That is, years of visual experience with upright faces, i.e., with a systematic organization of horizontal bands co-aligned vertically over the whole face (Dakin & Watt, 2009), could have tuned the face recognition system so that it developed a generic global representation of a face (i.e., a template). The matching of this global template with the incoming visual stimulus would form the basis of the global percept of the face. In other words, in this framework, it is the internal representation of the global structure of a face that allows observers to see an upright face holistically (Rossion, 2008, 2009; Rossion & Boremanse, 2008). Following inversion, the vertical ordering of the bar codes is disrupted (Dakin & Watt, 2009; see also Goffaux & Rossion, 2007), so that the visual face stimulus cannot be matched to an experience-derived template. Consequently, the face has to be analyzed sequentially at the level of local elements. Since our data show that the inversion effect can almost be cancelled out when perception is limited to a single feature, it suggests that local finer-scale bar codes (e.g., vertical stripes within the mouth or eye region of the face, see Dakin & Watt, 2009) do not contribute much to the inversion effect.

Finally, the original gaze-contingent method used here and our findings with it have wider implications for a better understanding of the nature of face recognition, both in normal observers and in populations who present difficulties at face recognition. It has long been proposed that patients who suffer from prosopagnosia, the inability to recognize faces following brain damage (acquired prosopagnosia, Bodamer, 1947; Quaglino & Borelli, 1867), present difficulties at holistic face processing (Levine & Calvanio, 1989; Sergent & Villemure, 1989). Yet, similarly to the face inversion effect, direct evidence supporting this view is lacking. If this view is correct, acquired prosopagnosic patients should present a pattern of performance that is reverse to normal observers for upright faces, i.e., they should be relatively unimpaired in a central window condition, but affected mainly when their central vision is masked. Preliminary work in our laboratory supports this view (Van Belle, De Graef, Verfaillie, Busigny, & Rossion, 2009). In a similar vein, the gaze-contingent technique used here could also be used to investigate cases of congenital prosopagnosia or of autism spectrum disorder (ASD), human populations who suffer from a long-life impairment at face recognition without visible brain damage (e.g., Behrmann & Avidan, 2005; Klin, Jones, Schultz, Volkmar, & Cohen, 2002 respectively), and for whom the nature of their face recognition difficulties remains largely unclear.

Acknowledgments

This research was supported by an ARC grant 07/12-007 (Communauté Française de Belgique-Actions de Recherche Concertées), FWO (Fonds voor Wetenschappelijk Onderzoek-Vlaanderen) project G.0583.05 and project 033816/EU-KP6-IST from the European Community. Bruno Rossion is supported by the Belgian National Fund for Scientific Research (Fonds de la Recherche Scientifique; FNRS). This article presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Programme, initiated by the Belgian State, Science Policy Office. The scientific responsibility rests with its authors. PL was supported by the Fonds National de la Recherche Scientifique, the Fondation pour la Recherche Scientifique Médicale, the European Space Agency (ESA, European Union), and Prodex grant C90232 from Belspo (Belgian Science Policy).

We thank Jessica Tauber and an anonymous reviewer for helpful comments on a previous version of this manuscript.

1Note, however, that both the entire eye and eyebrow combination could be seen in the window condition, which may explain why participants were still faster with upright as compared to inverted orientation in this condition.

Sekunova A.
Barton J.
(2008). Long-range and short-range relations in the perception of the vertical position of the eyes in inverted faces. Journal of Experimental Psychology: Human Perception and Performance, 34, 1129–1135.[CrossRef][PubMed]

Gaze-contingent stimulus presentation: (A) window condition: here the observer is fixating the mouth, which is then the only feature available. (B) Mask condition: here the observer is fixating the left eye, which is covered.

Figure 1

Gaze-contingent stimulus presentation: (A) window condition: here the observer is fixating the mouth, which is then the only feature available. (B) Mask condition: here the observer is fixating the left eye, which is covered.

Course of a trial for the upright and the inverted condition. An average non-diagnostic face was presented (in gray levels) and participants had to fixate the cross outside of the face area to get the first face stimulus to encode for 1 s. When the face disappeared, it was replaced by two different individual faces presented side by side. The individual face (target or distractor) was displayed (in full, window or mask) only when the participant fixated it, while the non-fixated face was replaced temporarily by the average face. Above: window condition, upright orientation. Below: mask condition, inverted orientation.

Figure 2

Course of a trial for the upright and the inverted condition. An average non-diagnostic face was presented (in gray levels) and participants had to fixate the cross outside of the face area to get the first face stimulus to encode for 1 s. When the face disappeared, it was replaced by two different individual faces presented side by side. The individual face (target or distractor) was displayed (in full, window or mask) only when the participant fixated it, while the non-fixated face was replaced temporarily by the average face. Above: window condition, upright orientation. Below: mask condition, inverted orientation.

Accuracy in proportion correct responses for each combination of viewing condition and orientation. Compared to the full view condition, the drop in accuracy with inversion was increased in the mask condition and decreased in the window condition. The error bars represent standard errors.

Figure 3

Accuracy in proportion correct responses for each combination of viewing condition and orientation. Compared to the full view condition, the drop in accuracy with inversion was increased in the mask condition and decreased in the window condition. The error bars represent standard errors.

Response times in milliseconds for each combination of viewing condition and orientation. Compared to the full view condition, the increase in RT with inversion was further increased in the mask condition. The error bars represent standard errors.

Figure 4

Response times in milliseconds for each combination of viewing condition and orientation. Compared to the full view condition, the increase in RT with inversion was further increased in the mask condition. The error bars represent standard errors.

Heat maps with the Z-score values of the relative number of fixations per trial on a given screen position on the stimuli for each condition separately. Fixation positions were smoothed using a Gaussian filter with a sigma of 30 pixels in order to account for the fixation position variability when fixating a certain point. The heat maps for the inverted faces were horizontally flipped in order to facilitate comparison with the upright faces. Only trials resulting in a correct response were included.

Figure 5

Heat maps with the Z-score values of the relative number of fixations per trial on a given screen position on the stimuli for each condition separately. Fixation positions were smoothed using a Gaussian filter with a sigma of 30 pixels in order to account for the fixation position variability when fixating a certain point. The heat maps for the inverted faces were horizontally flipped in order to facilitate comparison with the upright faces. Only trials resulting in a correct response were included.

Accuracy in proportion correct responses and response times in milliseconds for each combination of viewing condition and orientation in the replication experiment with smaller retinal size of the stimuli. Note the similar pattern of performance as for the larger stimuli, in particular the increased/decreased inversion effect in the mask/window conditions respectively. The error bars represent standard errors.

Figure 6

Accuracy in proportion correct responses and response times in milliseconds for each combination of viewing condition and orientation in the replication experiment with smaller retinal size of the stimuli. Note the similar pattern of performance as for the larger stimuli, in particular the increased/decreased inversion effect in the mask/window conditions respectively. The error bars represent standard errors.