Abstract

The ability to recognize faces is an important socio-cognitive skill that is associated with a number of cognitive specializations in humans. While numerous studies have examined the presence of these specializations in non-human primates, species where face recognition would confer distinct advantages in social situations, results have been mixed. The majority of studies in chimpanzees support homologous face-processing mechanisms with humans, but results from monkey studies appear largely dependent on the type of testing methods used. Studies that employ passive viewing paradigms, like the visual paired comparison task, report evidence of similarities between monkeys and humans, but tasks that use more stringent, operant response tasks, like the matching-to-sample task, often report species differences. Moreover, the data suggest that monkeys may be less sensitive than chimpanzees and humans to the precise spacing of facial features, in addition to the surface-based cues reflected in those features, information that is critical for the representation of individual identity. The aim of this paper is to provide a comprehensive review of the available data from face-processing tasks in non-human primates with the goal of understanding the evolution of this complex cognitive skill.

1. Introduction

Face recognition is one of the most important skills in primate social cognition, enabling the formation of long-lasting, inter-individual relationships with multiple group members. Among humans, face recognition is associated with a variety of cognitive and neural specializations, suggesting that it has played an important role in shaping human societies. Humans, for example, are face experts. They are able to individuate faces, or recognize and remember many different individuals over a lifetime, often with only mere exposure. This is achieved using a holistic processing strategy where it is not just the presence of specific features, such as the eyes, nose and mouth, but the relative spatial arrangement of these facial features that becomes integrated into a single perceptual whole [1,2]. Moreover, these processes are orientation-dependent such that inverting faces interferes with the ability to process faces holistically, leading to impairments in the ability to detect subtle changes in the spacing of features [3]. Although they will not be covered in this review, neural specializations for face processing are also present in humans and consist of a network of distributed regions that show face-selective activity. These regions include the fusiform gyrus, a region of the ventromedial temporal cortex, the lateral occipital cortex and the superior temporal sulcus ([4–6], but see [7]).

From a comparative perspective, very little is known about the ability of non-human primates to process faces using cognitive and neural processes that are similar to humans. Therefore, it remains unclear whether these processes represent unique human specializations or whether they are present in some form in non-human primates. This review summarizes the existing behavioural and cognitive research on face processing in non-human primates with the goal of understanding the evolution of this important socio-cognitive skill. First, it will describe the importance of holistic processing for the rapid visual detection of faces, the influence of expertise on holistic processing, and review relevant data from non-human primates using the well-known face inversion effect. Second, it will review data pertaining to the ability of non-human primates to individuate faces across different viewpoints and the importance of second-order configural cues, such as the spacing of facial features and surface-based cues, in the representation of face identity. Table 1 provides a list of face-processing studies in non-human primates to give readers an accessible reference that notes the species tested, subject numbers, the basic testing paradigm, task question, dependent variable measured and type of stimuli used.

2. Part 1

(a) Configural information in faces

Recognizing faces is a particularly difficult cognitive skill as all faces have the same basic features arranged in the same general configuration. Eyes are above the nose, which is above the mouth, etc. This basic arrangement is referred to as the first-order configuration and is important for discriminating faces from other visual objects, e.g. faces versus non-faces, [51]. Attraction to the first-order face configuration appears to be strongly innate in primates. Among humans, newborn babies are more attracted to face-like patterns, e.g. three dots in an inverted triangular orientation, than non-face-like patterns [52,53]. Similarly, newborn Japanese macaques (Macaca fuscata) show spontaneous gaze preferences to a face-like pattern of dots compared with a linear arrangement [30]. An infant gibbon, 13 days old (Hylobates agilis) oriented more to face-like compared with non-face-like drawings and by four weeks of age he oriented more towards familiar compared with unfamiliar conspecific's faces [26]. The innate preference of monkeys for looking at faces compared with other visual objects has been strongly supported in a recent study by Sugita [41]. He raised Japanese macaques in face-isolation by having human caregivers wear hoods to conceal their faces while the monkeys' environment was visually enriched. Without having ever seen a face, these monkeys preferred to look at faces, both conspecifics and humans, compared with other visual objects [41].

The second type of configural information described by Diamond & Carey [51] is the second-order configuration. This refers to the relative spatial arrangement of facial features with regard to one another, which is unique in every face [2]. Second-order information is also present in the form of surface-based cues, such as shading and pigmentation that is unique to each face [54]. There is a general agreement that second-order configural cues provide the information needed to discriminate between individuals [55]. Therefore, while the first-order configuration enables the identification of faces at a basic categorical level, e.g. face versus non-face, the second-order configuration provides the information necessary to individuate faces, e.g. discriminate Mary from Jane, or subordinate categorical level [56].

The configural information present in faces is integrated into a single perceptual whole through a fast-acting and relatively automatic process referred to as holistic processing [3,57–59]. Tasks used to demonstrate holistic processing typically show that it is harder to identify individual facial features, such as the eyes, when they are embedded in a whole face than when presented in isolation (parts-to-wholes task, [59]), in an inverted face (inversion effect, [60]), or if presented within the context of an unnaturalistic face shape (composite task, [61]). This is because humans have such a strong tendency to process the whole face holistically that it interferes with the ability to extract information about its parts. It has been argued that holistic processing may be unique to faces as a category of stimulus as its direct markers, e.g. inversion, the composite effect and parts-to-wholes task, are more robust for faces than non-face stimuli [1,62].

(b) Expertise and perceptual tuning

Sensitivity to configural information is strongly influenced by an individual's expertise. Because humans have extensive experience with faces from birth, faces represent one of the few stimulus categories for which people are natural experts. In a clever study, Pascalis et al. [63] demonstrated perceptual tuning for faces in human babies. At six months of age, human infants showed no viewing preference for human or monkey faces, but by nine months of age they selectively attended to the human versus other species' face [63]. One explanation for this effect is that holistic processing, present early in development (between six and eight months [64]), operates generally on all first-order face-like configurations, but this becomes more selective as the faces become more familiar. Although early rearing studies cannot be performed in humans, face-deprivation studies in monkeys addressed the sensitivity of these early periods. Sugita [41] showed that after six months of face-deprivation, the viewing preference and discrimination performance of the monkeys became biased towards the faces of species to which they were first exposed. If they were first shown conspecifics' faces, they showed better discrimination and greater viewing preference for monkey faces compared with human faces or objects. Their bias was towards human faces if these were the first faces they were exposed to after the period of face-deprivation. Moreover, these preferences persisted up to a year later even after the monkeys had been given visual exposure to both species [41]. The preference of infant chimpanzees (Pan troglodytes) for their mother's face was present at two months of age, but at one month, the infants showed no preference for their mother's face compared with unrelated individuals [65]. Thus, young infant humans and non-human primates prefer to look at faces compared with objects, but only show specific preferences for one species over another after they have established some expertise.

Humans also demonstrate sensitivity to expertise in the form of the other race effect whereby discrimination and recognition memory for same race faces are better than other race faces, presumably owing to the fact that we have more experience with own race faces during development [66,67]. Similarly, a same species-preference has been shown in New and Old World monkeys [31]. Although the exact mechanism responsible for the perceptual tuning and other race effects for faces is unknown, early exposure to faces represents a critical period for the development of holistic processing. This is nowhere better evidenced than in studies of human infants born with congenital cataracts, making them blind at birth. Their condition was surgically corrected in the first six months of life, but when tested years later, they showed impaired holistic processing of faces (tested with the composite effect) despite years of normal visual input post-surgery [68].

The influence of expertise on the categorical perception of faces has been studied in chimpanzees [35]. This study found that chimpanzees housed at a centre where they saw few chimpanzees but many humans showed better discrimination and categorical perception of human compared with chimpanzee faces. By contrast, chimpanzees housed at a different facility where they were familiar with many chimpanzees and humans did not show any species-biases for discrimination performance or categorical perception [35]. Thus, in both human and non-human primates, early attraction to faces is present from birth, undergoes similar perceptual narrowing during development, and early visual exposure to faces appears necessary for the normal development of holistic processing.

(c) The face inversion effect

By far the most widely used paradigm in studies of face processing in both human and non-human primates is the inversion effect, the phenomenon in which rotating faces 180° makes them difficult to recognize [60]. Among humans, the inversion effect is face specific, producing greater deficits for faces than non-face images [60,69]. The results from studies of non-human primates, particularly monkeys, however, have been largely inconsistent, with some studies supporting evidence of an inversion effect for faces while others have not (table 1). In all but one study, chimpanzees have shown clear face-specific inversion effects that are stronger for faces with which subjects are familiar, such as conspecifics or human faces. In the one discrepant study, Tomonaga et al. [15] tested the ability of one female chimpanzee to name human faces, individuals with whom she was familiar, using symbols. Perhaps because of the personal familiarity of these individuals to the subject, or the way in which she learned to associate their faces with specific lexigrams, this individual did not appear to use a holistic-processing strategy when presented with inverted faces. Parr et al. [20] tested the inversion effect in five chimpanzees using a computerized, matching-to-sample (MTS) task (figure 1). In this study, chimpanzees used a joystick-controlled cursor to select one of two inverted faces that matched an upright sample face on a computer monitor. Significant inversion effects were found for unfamiliar chimpanzee and human faces, the two species for which subjects had expertise, but not capuchin monkey faces or automobiles, face and objects for which subjects were naive [20].

An illustration of the simulations matching-to-sample (MTS) task used to study face processing in chimpanzees. (a) Subjects are first presented with a single sample face and a cross-shaped cursor on the computer monitor. (b) After contacting the sample image with the joystick-controlled cursor, two comparison images are added to the display. One of these images matches the sample (identical photograph on the bottom left), while the other does not (bottom right shows a different individual). Subjects must move the cursor to contact the matching image in order to receive a food reward (see [25]).

Studies of chimpanzees from other laboratories also support the inversion effect using a variety of stimuli and methods. Tomonaga [24] found the inversion effect in one chimpanzee for unfamiliar human faces compared with houses (this was a different subject than in Tomonaga et al. [15]) using a MTS task. Similar results were found using a visual search task where the same chimpanzee was significantly faster to identify an upright human face and a human caricature face among four or 10 differently oriented distracters [37]. No differences were found in the time required to find the target when the stimuli were upright chairs or hands presented against inverted distracters from the same stimulus categories. A follow-up study presented different facial features and combinations of features and showed an upright superiority effect for the eyes and eyebrows, the eyes and nose, and the eyes and mouth, but not the nose and mouth, or nose or mouth alone. A final study presented the inner features of the face, and the external contour, where faster identification times were found only for upright inner features when presented against inverted distracters [37].

As mentioned, the evidence for a face inversion effect in monkeys is less clear than chimpanzees. Some studies (see table 1 for details) have reported evidence of a face inversion effect [11,17,22,34,36], while others have failed to find evidence of orientation-specific processing exclusive to faces [8,9,13,23,28,29]. Using a similar MTS testing method as described above for chimpanzees, Parr et al. [23] examined the effect of stimulus expertise on the inversion effect in four adult male rhesus monkeys (Macaca mulatta). The only difference in the training and testing of these procedures was that the chimpanzees were reinforced with juice by the experimenter while the monkeys preferred to test alone and were reinforced with an automatic feeder using sugar treats. These monkeys showed significant impairments discriminating inverted compared with upright faces of conspecifics, capuchin monkeys and automobiles, but not the familiar category of human faces [23]. A more recent replication of the inversion effect in a different group of rhesus monkeys (n = 7, five females) that performed the MTS task using a touchscreen interface, found significant inversion effects for all face categories, conspecifics, humans and chimpanzees, but not houses or clip art [39]. The main difference in the two studies was the lack of evidence for a significant inversion effect for human faces in Parr et al. [23] (p = 0.07), but positive evidence of a significant inversion effect for human faces (p = 0.05) in the more recent study [39]. In both cases, the statistical value describing the difference between performances on upright compared with inverted human faces was borderline. Even though these subjects were raised in a laboratory environment and had a lifetime of expertise with human faces, their majority of contact with humans was when wearing protective equipment such as masks or at considerable distance. Thus, it is difficult to quantify their actual experience with human faces.

In both Old and New World monkey species, some authors have suggested a phylogenetic shift in the type of configural information present in human and ape faces compared with monkey faces, perhaps explaining why monkeys show inconsistent inversion effects for their own faces. Two squirrel monkeys (Saimiri sciureus) showed significant inversion effects for human and ape faces, but not for monkey faces or scenery [16]. Similarly, significant inversion effects were found in three rhesus monkeys using a same–different paradigm when discriminating inverted compared with upright human faces, but no differences were found for monkey faces or scenes [19]. Human subjects with no chimpanzee expertise have recently been shown to use holistic processing (composite effect) for chimpanzee faces, compared with the faces of more phylogenetically distant species including gorillas [70]. However, there is little evidence beyond these data for any type of configural superiority for human or chimpanzee faces and the basic first-order configuration is similar in all primate faces [32,36,39].

In summary, Parr et al. [39] have argued for species differences in the inversion effect between chimpanzees and rhesus monkeys. Whereas chimpanzees show face-selective inversion effects for categories of faces with which they are familiar, rhesus monkeys have consistently failed to show face-selective inversion effects [23]. This can be interpreted as evidence for differences in the selectivity of holistic processing for faces in these species. Other researchers have argued that the use of trained, operant response tasks, like the MTS, compared with free-viewing paradigms, such as the visual paired comparison (VPC) or adaptation tasks, produce idiosyncratic response biases leading monkeys to perform the tasks using unnatural strategies [31,34,44]. While the nature of these reported biases have never been described in any detail, inconsistencies in the type of stimuli presented as well as differences in testing methods should not be ruled out as factors contributing to reported species differences and/or inconsistent inversion effects in monkeys.

(d) The face composite task

One of the most well-known tests of holistic processing is the face composite task developed by Young et al. [61]. These researchers deconstructed faces into a top and bottom half, and then combined these so that the same top face part was combined with two different bottom face parts. The faces they used were all of famous individuals who could easily be named by subjects. Each composite face was presented to subjects so that the two parts were either aligned directly on top of one another, or misaligned, such that the top and bottom face parts were askew (figure 2). Subjects were then asked to name the individual represented by the top face part. Subjects were faster and more accurate naming this individual in the misaligned compared with aligned condition because, in the aligned condition, people's holistic-processing mechanism automatically integrated the features into a new perceptual whole, making it difficult to deconstruct the identities represented by each face part.

An illustration of an (a) aligned and (b) misaligned face composite used to test holistic processing. Human subjects are faster and more accurate naming the top face part (George Clooney) when it is misaligned from the bottom face part (Harrison Ford) compared with when it is aligned, as the latter condition produces holistic interference.

Evidence for the face composite task in non-human primates has only been addressed by a handful of studies and their methods required considerable modification from the original task in which subjects were asked to give a verbal response indicating the identity of the individual in the top face part [61]. Parr et al. [32] were the first to adapt the composite effect for use with non-verbal organisms in a study with chimpanzees. Using the MTS task already very familiar to the subjects, they presented either the aligned or misaligned composite as the sample image, and the two comparison images represented the whole face of each individual represented in the composite (figure 3). Subjects were allowed to match the composite by selecting either of these two individuals, top or bottom, under non-differential reinforcement, so they were rewarded for either answer. ‘Does the face composite look more like the top face individual or the bottom face individual?’ It was hypothesized that in matching the misaligned composite, subjects would match the top or bottom face individual equally as often, approximately 50 per cent. However, if holistic processing integrates the aligned composite into a ‘new’ perceptual whole, then individuals would spontaneously switch and choose the top face individual more often as the top part of the face contains the information more diagnostic for individual identity [25,42,71,72]. Figure 4 shows an example of an aligned face composite trial and the percentage of trials in which subjects spontaneously matched the composites to the top face individual [32]. This shows that subjects spontaneously shifted towards the top face part individual in the aligned compared with misaligned trials. Human faces were also tested but did not elicit a strong composite effect.

An illustration of the MTS format used to test the face composite effect in chimpanzees. The sample image shows an aligned face composite while the two comparison images show the individuals represented by the top and bottom face parts. Subjects were rewarded for choosing either of these images. Our hypothesis was that holistic processing would encourage matching the individual represented by the top face part (outline) in aligned trials, since the eyes provide the most salient information for individual recognition, but subjects would choose either the top or bottom face individual equally as often during the misaligned trials. This was confirmed by a spontaneous increase in the selection of top face individuals (approx. 67%) for aligned compared with misaligned trials (approx. 57%) [32].

In a second study, Taubert & Parr [47] examined the composite effect in rhesus monkeys as well as the New World spider monkeys (Ateles geoffroyi), a species that shares a similar fission–fusion social organization with chimpanzees and humans. They expanded on the methods used in the previous study [32] to better match the instructions given to humans [61] by specifically training subjects to match the information in the top face part. They did this by creating a pool of training stimuli that consisted of schematic face ovals each divided into a uniquely coloured top and bottom half. These coloured composites were presented as the sample stimuli and the correct choice was the oval that matched the top part colour of the sample. The non-matching oval was a different colour not present in the sample. After reaching 75 per cent correct on these training sessions, subjects were presented with composites of a variety of faces and objects, e.g. conspecific's faces (spider monkeys only), human faces, chimpanzee faces, gorilla faces, sheep faces and sticks. The spider monkeys showed the composite effect, e.g. better performance matching the top face part for the misaligned compared with aligned trials, for conspecific and human faces, the species of their greatest expertise. This was the case only for upright composites, reinforcing holistic processing as the mechanism involved [48]. The rhesus monkeys, in contrast, showed a composite effect for the chimpanzee faces, a species for which they had no expertise before the experiment [47].

Dahl et al. [34] used an adaptation paradigm to examine the face composite effect in five rhesus monkeys. The monkeys were first presented with the aligned or misaligned composites as the adaptation trials. After this, the dishabituation image presented either the same composite as in the adaptation phase, or a composite with a new bottom face part. The authors reported greater rebound for aligned compared with misaligned composites when the bottom face part had changed, suggesting sensitivity to holistic processing in that the aligned trials were seen as ‘new’ individuals owing to the holistic integration of the new facial feature. In explaining their results, however, the authors raise a potential confound in their methodology. They suggested that, because monkeys have a robust preference for looking at the eyes of conspecific's faces [14,18,29,42] greater rebound to the aligned composites may have occurred because the new bottom face part was closer in proximity to the eye region in the aligned compared with misaligned trials. To address this potential confound, the authors presented the scanning patterns obtained from eye-tracking while the monkeys performed this task. Two hypotheses were presented. First, attention to the eyes, as opposed to holistic processing, was predicted to draw the monkey's attention to the new bottom face part in the aligned compared with misaligned conditions. Alternatively, if engaged in holistic processing, the monkeys should show renewed interest in the eye region of the novel aligned compared with misaligned composites because the holistic integration of the new bottom face part provides the appearance of a ‘new’ individual. Thus, monkeys should return to fixating on their preferred face part, the eyes, an argument not unlike that made for the chimpanzee's performance noted by Parr et al. [32] above. Dahl et al. [34] reported that monkeys looked more at the eye region of the aligned compared with misaligned composites, supporting holistic processing.

It is not clear, however, that eye tracking alone has the power to address specific face-processing mechanisms. The first hypothesis proposed by Dahl et al. [34], for example, appears to be grounded in an unconfirmed assumption that eye fixations would habituate to repeated presentations of the same stimulus, leading the monkeys in their experiment to switch from fixating on their preferred feature (eyes) to the next closest interesting feature (the changed bottom face part). Moreover, from the figure provided (fig. 4, [34]), it appears that the majority of fixations in the misaligned condition occurred in the central region of the image, where the two face parts were offset. This is a region of high contrast and a previous study showed that monkeys were extremely attracted to regions of high contrast for scenery, novel objects and faces [18]. Thus, replicating this finding in addition to testing a control stimulus, at least one other species, would help to validate whether rhesus monkeys show face-selective holistic processing.

(e) Salience of facial features

A number of studies in non-human primates have addressed which facial feature or combination of features are the most salient for both the passive viewing of faces and for the recognition of specific faces (table 1). Kyes & Candland [12] used a viewing preference paradigm in which baboons were trained to press a lever to control the duration that a specific slide would be visible. Then, they presented subjects with either the whole face, a covered face, or isolated facial features, all derived from the dominant male monkey in their social group. Subjects spent more time, as measured by the duration of lever pressing, viewing the whole face compared with a covered face, and preferred to look at facial feature combinations where the eyes and particularly the eyes and nose were present.

Direct gaze is an aversive, threatening signal for rhesus monkeys [73,74], so in order to study the development of gaze avoidance, Mendelson [10] implanted newborn rhesus monkeys with sclera coils that provide the ability to accurately measure viewing duration. At one, three and seven weeks of age, monkeys were shown conspecific's faces with their gaze either slightly averted, or direct. At one week of age, monkeys spent equivalent amounts of time looking at the direct and averted gaze faces, but at three and seven weeks of age, the monkeys looked less at the direct gaze faces. At three weeks of age, the monkeys also began to show emotional behaviours, perhaps indicating the time at which they begin to perceive direct gaze as aversive. Opposite scan patterns were found in adult rhesus monkeys viewing human faces [27]. In this study, subjects made more fixations for longer total durations on a human face with direct compared with averted gaze. This suggests that the ecological salience of human and monkey faces may be different, although little detail is provided about the subjects' rearing history, only that they were experimentally naive before the start of the study [27].

Keating & Keating [14] also used the scleral coil technique to track the viewing patterns of rhesus monkeys to a familiar human face. Subjects were first trained to press a lever when a particular stimulus face appeared (positive stimulus = go response), but not press the lever (negative stimulus = no go) to a variety of other faces. Then, the positive stimulus face was altered by removing, replacing or scrambling the facial features, and including inverting the face in order to see which, if any, of these manipulations would significantly impair recognition, e.g. reduce the number of ‘go’ responses. Subjects' recognition of the stimulus face was significantly reduced when the manipulation was to substitute the eyes or brow region, remove the chin, and for any of the inverted or scrambled feature manipulations, but regardless of the type of manipulation, the majority of fixations occurred to the eyes. Interestingly, features were also graded by making them larger or smaller than the standard stimulus face and this had mixed results, with only deviations of eye shape reducing recognition [14]. As stated earlier, this suggests that monkeys are highly attracted to eyes and less sensitive to the overall configuration of features.

Some similarities, but also notable differences, have been found in another eye-tracking study. Wilson & Goldman-Rakic [18] showed rhesus monkeys photographs of conspecific faces, human faces, interesting pictures including common household items and scenery, and colour fields. Monkeys looked longer at the faces and interesting pictures compared with colours, but no differences were found between viewing time for faces and pictures. Moreover, this study examined recognition memory for specific faces by measuring changes in viewing preference for the images after repeated exposure. Monkeys spent less time looking at familiar faces and pictures compared with novel ones, but no differences were found for colour scenes. An important and previously unconsidered finding was that by examining the specific scan patterns, it was revealed that the monkeys were most attracted to regions of contour as delineated by specific facial features, such as eyes, ears, hairline, nose, mouth and jawline. Many of the studies already discussed did not use control stimuli, so preference for faces over other interesting stimuli, or differences in scan patterns to faces compared with other objects could not be addressed. The results of Wilson & Goldman-Rakic [18] failed to support any particular salience for faces over other interesting pictures and, importantly, showed that monkeys may be most attracted to regions of high contrast, rather than to facial features per se. The lack of appropriate control stimuli still persists in studies of face recognition in non-human primates and needs to be addressed before any major conclusions can be made about facial salience and feature salience, in particular. Putting aside this important methodological issue, the majority of the studies reviewed confirm that the eyes are the most important and frequently scanned features in non-human primate face processing.

3. Part 2

(a) Individuating faces

Although the ability of humans to recognize and remember a large number of faces in a lifetime is well documented, it is unclear whether non-human primates are able to represent individual identity from faces and whether this is achieved using similar configural information and holistic-processing strategies. One of the most direct ways to assess the ability of non-human primates to individuate faces is to present them with a task in which they must recognize the face of the same individual presented across different facial viewpoints. In this way, performance cannot be affected by pictoral cues, those specific to each photograph, because individuals are represented by different photographs. Instead, subjects must focus on the features of the faces themselves. Only a handful of studies have examined the ability of non-human primates to individuate faces using this type of methodology. In one of the first studies of its kind, Rosenfeld & Van Hoesen [8] trained rhesus monkeys to respond with a lever press to the face of a particular monkey and withhold the lever press if the face showed a different individual. Then, they presented images in which size, illumination, colour and facial viewpoint of these individuals were altered. While learning the initial full-face discrimination was by far the most difficult condition for the monkeys, requiring between 300 and 405 trials, they generalized their performance more or less uniformly to the altered face stimuli, including transformations across viewpoint. Performance on these required between 60 and 150 trials [8].

Many years later, Parr et al. [25] examined the ability of chimpanzees and rhesus monkeys to match two different photographs of the same unfamiliar conspecific using a MTS task. Facial viewpoint was not systematically controlled in these studies but it did differ across the correct pair of images. The chimpanzees matched faces according to identity significantly above chance at the group level after only the second presentation of each trial (14 face pairs in total). Monkeys took much longer to learn the task and thus required a generalization phase where new photographs were presented. In the generalization phase, two monkeys performed significantly above chance after the second presentation, one after four presentations and one after 14 presentations [25]. In a different group of monkeys, individual recognition was revisited using all new stimuli consisting mostly of female conspecifics' faces (20 pairs in total). Performance of the six subjects was highly inconsistent across different trials/face pairs, where some pairs were discriminated above chance after only two sessions (where each stimulus was repeated four times in a session), but others required over 17 sessions [39]. There were no visible explanations for the variability across trial type, e.g. physical similarity between the pair of faces, quality of the individual stimuli. In an unrelated study, these same subjects performed much better matching two different pictures of male monkey faces compared with female monkeys [75]. The actual gender of faces presented as stimuli is rarely reported in similar studies but may actually play an important role in influencing face recognition performance.

Studies of individual recognition in New World monkeys are rare (table 1, [16,28,31,36,45–48]) and it is even more unusual that a study compares face processing for familiar and unfamiliar individuals. Pokorny & de Waal [45] tested the ability of capuchin monkeys (Cebus apella) to discriminate familiar (in-group) and unfamiliar (out-group) conspecifics' faces using an oddity paradigm in which subjects were rewarded for choosing the ‘odd’ image in a four-image array. During an initial training phase, the monkeys were shown three identical photographs of a monkey and one ‘odd’ photograph of a different monkey. Subjects were significantly faster to learn discriminations involving the unfamiliar out-group individuals compared with the familiar in-group individuals, which is in contrast to the trouble that humans have when required to discriminate or remember unfamiliar faces, particularly across a change in viewpoint [76–78]. After training, the test phase presented subjects with an array of four different photographs: three photographs showed the same individual with different facial viewpoints, while the fourth image showed a different monkey. In this phase, subjects showed no differences in performance between in-group versus out-group individuals, suggesting that individual recognition was not influenced by familiarity. Subjects went on to generalize their performance to two separate transfer tasks involving all new photographs. Thus, this study showed that capuchin monkeys are very capable of individuating conspecifics' faces across facial viewpoints although, unlike humans, no strong advantages were found for familiar versus unfamiliar individuals [45]. The study did not test specifically whether the individual recognition involved reliance on configural cues or holistic processing per se. In a follow-up study using the same oddity paradigm, three of the same subjects were presented with an array of four different monkeys, three of which were familiar (in-group) and one unfamiliar (out-group), or vise versa. The capuchin monkeys were able to categorize the individuals and respond according to their familiarity, suggesting that the pictures could be interpreted as representing the real individuals [46].

Gothard et al. [29] used a non-invasive, corneal reflection eye-tracking paradigm in conjunction with the well-known VPC task to examine how monkeys scan conspecifics' faces. This study also presented facial expressions, but these data will not be covered here. After habituating subjects to a pair of identical photographs showing an unfamiliar conspecific's face, one of these faces was replaced with a photograph of a novel individual, so that the test pair showed photographs of one old (previously viewed) and one new monkey. Subjects spent more time viewing the novel individual than the familiarized one, indicating that they could discriminate between the two faces. In addition, similar to previous studies, the majority of the fixations were directed towards the eyes. These free viewing tasks are often used to demonstrate individual recognition, as defined above; however, because the test pair contained one previously viewed photograph and one novel photograph, subjects could simply have responded with increased attention to the novel photograph, not selective attention to the novel individual (see also [36]).

This problem was corrected in a second study in which two different viewpoints of the same monkey or human face were presented as the familiarization pair, and then the test pair of images showed a third photograph of the familiarization individual plus a novel individual of the same species [42]. Therefore, in each stage of this VPC task, no individual photograph was repeated. Eye tracking was used to measure where the monkeys fixated and overall whether they preferred to look at the familiar or novel individual. Two monkeys were tested with conspecifics' faces and both fixated for a longer duration on the eye region of the novel individual's face. Three monkeys were tested with human faces and two of the three also preferred looking at the eyes of the novel individual. These data showed that, indeed, subjects were able to detect similar individuals across different viewpoints, preferring to look at the novel individual while controlling for novelty of the photograph [42].

Similar to the studies described above that focused on facial features, the monkeys in this study overwhelmingly preferred to look at the eyes compared with other facial features. Therefore, one might conclude that these monkeys were using a feature-based strategy, detecting differences in the eyes and eye region in order to individuate the faces. The authors attempted to address the specific mechanism in a second study by filtering the face stimuli to influence the visual-processing strategies [42]. When the faces were subjected to a high-pass filter, effectively removing the low-spatial frequency information important for holistic processing [79], the monkeys attended to the novel face and spent the most time scanning the eye region. Interestingly, when the images were blurred using a low-pass filter, effectively removing the high-spatial frequency information important for feature processing, subjects also attended to the eye region of the novel face. Although the authors conclude that monkeys show stronger configural (holistic) processing for conspecifics' faces compared with human faces, there were no overall differences in the gaze patterns between the two filter conditions.

Although this is one of the most well controlled and detailed face-processing studies performed in monkeys to date, it highlights potential limitations in the ability of the eye-tracking technology to provide the data necessary to address basic face-processing mechanisms (see discussion in §2(d) of Dahl et al. [34]). A recent study in humans, for example, presented subjects with a composite face task and, despite the subjects performing better in the misaligned compared with aligned conditions, corresponding eye-tracking data failed to provide any evidence for differential scanning patterns for the aligned and misaligned composites [80]. Thus, even when the behavioural data support holistic processing, concomitant eye-tracking data failed to produce significant behavioural differences in scan patterns. Thus, because monkeys show overwhelming preferences to fixate on the eye region, these patterns should not be interpreted as providing evidence against holistic processing, rather eye tracking may be an insensitive dependent variable for measuring the perceptual mechanisms underlying face processing. That said, Gothard et al. published a recent study that used eye tracking to specifically address individual differences in face scanning patterns [81]. Using some of the same subjects as in their previous studies, this team presented elegant data showing individual differences in the scan paths made by three monkeys to conspecifics' faces, which correlated with basic behavioural temperament style and serotonin transporter gene polymorphisms [81]. Thus, eye tracking is an extremely rich methodology for studying social cognition and understanding individual differences, but may not be able to differentiate face-processing mechanisms per se.

Individuation and subordinate-level processing of faces were also addressed in the study by Dahl et al. [34]. Monkeys were presented with three types of trials using an adaptation paradigm. ‘Subordinate’ trials presented a photograph that was a novel exemplar of the same class as presented in the adaptation phase (either monkey face or dog). ‘Same’ trials showed the exact same photograph as in the adaptation phase only rotated in plane 30° (monkey face), or presented as its mirror image (dogs). To support subordinate-level processing for conspecific's faces, the authors compared the amount of rebound with ‘subordinate’ versus ‘same’ trials for these two stimulus types and reported greater rebound for the monkey face but not dog trials. However, as discussed above, the appropriate comparison requires that ‘subordinate’ trials showing two different monkeys (and thus two different photographs) be compared with two different photographs of the same monkey, not simply the same photograph rotated 30°. Moreover, the ‘subordinate’ dog trials included as control stimuli did not present photographs of different individuals, but showed two different dog breeds. Thus, they were not comparable to the ‘subordinate’ monkey trials. Therefore, the evidence supporting the individuation of faces by monkeys in this study is quite weak when compared with the results obtained by Gothard et al. [42].

(b) The importance of second-order configural cues

Very few studies have attempted to measure the importance of second-order configural information in non-human primate faces. In studies of both chimpanzees and rhesus monkeys, Parr et al. manipulated conspecific's faces by either fracturing/spacing the features apart altering second-order configural cues, fracturing and rearranging facial features altering first- and second-order configural cues, or showing only inner facial features preserving both first- and second-order configural features (chimpanzees [32]; rhesus [39]). Chimpanzees showed impairments matching the fractured and rearranged trials compared with unaltered faces, but no difference matching inner features to the unaltered faces, suggesting discrimination deficits when both first- and second-order configural information is altered. Rhesus monkeys showed significant impairments for all trials. These two studies used rather crude manipulations of second-order configural cues, whereas the more traditional approach in the human literature is to contrast performance on trials in which specific facial features are replaced, to trials in which the spacing of features has been altered. Children over 8 years of age are able to detect changes at the feature level, but even 10 year old children have difficulty detecting changes in the spacing of features [55,82].

A similar manipulation contrasting a change in features versus the spacing of features was performed with the face-deprived monkeys described above [41]. With no visual experience with faces, these monkeys were able to detect changes in both the identity of features and the spacing of features, a finding that is in contrast to the 10 or more years of experience required by humans to detect similar manipulations [55,82]. Dahl et al. [34] also examined whether monkeys could detect changes in the spacing of facial features, e.g. small change in interocular distance. Using the same adaptation paradigm described above, Dahl et al. contrasted monkeys' rebound to ‘same’ trails, showing the same face rotated 30°, with ‘configural’ trials in which the interocular spacing had been altered. Rebound was greater for the ‘configural’ compared with ‘same’ trials, suggesting that monkeys are sensitive to second-order configural cues. Three published studies have examined the Thatcher illusion to demonstrate the importance of second-order configurations in faces. The Thatcher illusion occurs when the eyes and mouth are inverted in an upright face, creating a grotesque appearance that disappears when the face is inverted. Two studies using a viewing preference task reported evidence of the Thatcher illusion in rhesus monkeys [44,49], while one study failed to support the Thatcher illusion in baboons using a matching task [83]. The Thatcher illusion examines the importance of second-order configural cues, since inverting internal features also affects the spacing of these features. However, because the effect disappears when the faces are inverted, it is also a test of holistic processing.

Finally, the effect of surface-based cues was examined in one chimpanzee that performed an oddity task to identify human faces with direct gaze among faces with averted gave, and vise versa [50]. The subject had much greater difficulty accomplishing this when the faces, including the eyes, were reversed in polarity and shown in their photographic negative (negative faces and eyes), or faces in which the contrast reversal was performed only to the eyes (positive faces and negative eyes) compared with contrast reversed faces that did not also include the eyes (negative faces and positive eyes).

4. Conclusion

Faces are one of the most important and salient stimulus categories for primates, providing information about individual identity, age, gender and emotion. Despite a rapidly growing body of literature in humans that supports numerous cognitive and neural specializations involved in recognizing and discriminating faces, data from non-human primates are less clear. While many studies report similar face-processing strategies in monkeys and humans, methodological issues such as the inclusion of appropriate control stimuli often obscure clear conclusions. This is not a trivial oversight because many of the face-processing specializations observed in humans are specific for faces, meaning that they are not shown for stimuli other than faces, or at the very least they are quantitatively greater for faces compared with other visual stimuli. Without the inclusion of appropriate control stimuli, similar conclusions about face selectivity cannot be drawn from studies in other species. Moreover, there are notable reported species differences in the ability of chimpanzees and monkeys to individuate faces, their reliance on second-order configural information, and the role of expertise in face-selective cognitive processes. The majority of the data reviewed supports an innate attraction to first-order face-like configurations in monkeys, including strong preferences for scanning the eyes, and some sensitivity to face identity although it is not clear whether this involves a dedicated holistic-processing mechanism or simply learning specific facial features. Collectively, the picture that has emerged from chimpanzee studies conducted in several different laboratories provides good evidence to support the majority of face-specific specializations characteristic of human face processing, including inversion effects for expert face categories, the composite face effect and sensitivity to second-order configural cues including surface-based cues. The data from monkeys, however, are less clear and may be strongly influenced by differences in the methodologies used to study face processing in these species, such as free-viewing tasks and eye tracking versus operant MTS or oddity tasks.

Some discussion of why monkeys, chimpanzees and humans may have evolved different face-processing strategies is required. Although this discussion is tentative, an examination of social organization might shed some light on reported differences. Chimpanzees, humans and spider monkeys all live in fission–fusion societies where group composition is flexible and changing. This fission–fusion dynamic means that individual group members are not always in visual contact, creating a scenario where robust cognitive mechanisms for representing individual identity would be highly advantageous. This is additionally supported by many behaviours shown by chimpanzees that require recognition and long-term memory for specific individuals, including reconciliation and consolation, and specific patterns of reciprocity and cooperation [84,85]. Recent data suggest that the flexibility inherent in a fission–fusion society may be associated with the coevolution of unique cognitive strategies, including response inhibition that would enable individuals to respond flexibly to a changing social dynamic [86]. Under such a system, chimpanzees, humans and spider monkeys may have evolved similar cognitive specializations for extracting information from faces that provides the most salient cues for identifying specific individuals. The visual information that is present in faces, but not redundant across individuals, is reflected in second-order configural cues. This information refers to the precise spacing of facial features, in addition to the surface-based cues present in a face such as skin texture and pigmentation [54]. The majority of the data suggests that chimpanzees and humans (and spider monkeys [47,70]) are both highly sensitive to second-order cues.

Rhesus monkeys, in contrast, live in large social groups characterized by strict, linear dominance hierarchies. Social recognition in this situation could involve a number of different visual strategies, including the presence of distinct body cues and patterns of association, such as an individual's matriline. Rhesus monkeys also have low rates of reconciliation compared with chimpanzees, and an absence of consolation and cooperative behaviours [84]. From the data reviewed here, there is insufficient evidence to suggest that monkeys and chimpanzees use qualitatively different face-processing strategies, e.g. feature-based compared with configural-processing strategy. Rather, the data seem to suggest that the differences are more quantitative, not unlike the developmental differences that have been reported for human infants [87]. A parsimonious conclusion might be that rhesus monkeys process faces as a unique category of visual stimuli using a combination of configural- and holistic-processing strategies, but have not evolved as robust a mechanism for representing individual identity as chimpanzees and humans. Unfortunately, a clear answer to the question of how face processing evolved in primates, including the role of specific cognitive and social factors, may be long in coming as only a few laboratories are engaged in these studies and even fewer have conducted comparative studies in two or more species using similar methods. Moreover, if methodological issues are a contributing or causal factor for discrepancies in the literature, then between-subject methodological comparisons need to be made in order to understand whether different task paradigms assess similar cognitive mechanisms.

Acknowledgements

This investigation was supported by RR-00165 from the NIH/NCRR to the Yerkes National Primate Research Center, and R01-MH068791 to L.A.P. The Yerkes National Primate Research Center is fully accredited by the American Association for Accreditation of Laboratory Animal Care. Special thanks to Tom Heitz for assistance producing table 1.

2007Visual search for orientation of faces by a chimpanzee (Pan troglodytes): face-specific upright superiority and the role of facial configural properties. Primates48, 1–12.doi:10.1007/s10329-006-0011-4 (doi:10.1007/s10329-006-0011-4)