The ability to perceive and recognize objects is essential to many animals, including humans. Until recently, models of object recognition have primarily focused on static cues, such as shape, but more recent research is beginning to show that motion plays an important role in object perception. Most studies have focused on rigid motion, a type of motion most often associated with inanimate objects. In contrast, nonrigid motion is often associated with biological motion and is therefore ecologically important to visually dependent animals. In this study, we examined the relative contribution of nonrigid motion and shape to object perception in humans and pigeons, two species that rely extensively on vision. Using a parametric morphing technique to systematically vary nonrigid motion and three-dimensional shape information, we found that both humans and pigeons were able to rely solely on either shape or nonrigid motion information to identify complex objects when one of the two cues was degraded. Humans and pigeons also showed similar 80% accuracy thresholds when the information from both shape and motion cues were degraded. We argue that the use of nonrigid motion for object perception is evolutionarily important and should be considered in general theories of vision at least with respect to visually sophisticated animals.

Introduction

For most animals, the ability to maneuver within and interact with their environment is critical for survival. Fundamental to this is the ability to perceive and recognize objects, including other animals (e.g., conspecifics, prey, and predators). Indeed, visually dependent animals, such as humans, categorize and recognize objects in complex scenes with ease, within seconds, and even from very brief exposures (e.g., Potter, 1976; Thorpe, Fize, & Marlot, 1996). The ability to perceive and recognize objects was, until recently, thought to be based predominantly on static properties of the objects, such as shape. Consequently, prevalent models of object perception primarily describe how these static properties contribute to object recognition (Biederman, 1987; Bülthoff & Edelman, 1992; Edelman & Bülthoff, 1992; Lawson & Humphreys, 1996; Marr, 1982; Tarr & Bülthoff, 1995). Yet the retinal projection of objects is rarely completely static; whether because of the movement of the observer or the movement of the object being observed, retinal motion is often seen in conjunction with static properties of objects. Not surprisingly, researchers have begun to investigate the role of motion in object perception. In particular, research has now focused on the contribution of different types of motion (Aggarwal, Cai, Liao, & Sabata, 1998), such as rigid (e.g., translation) and nonrigid object motion that is displayed by moving objects and organisms (e.g., Friedman, Vuong, & Spetch, 2009; Liu & Cooper, 2003; Newell, Wallraven, & Huber, 2004; Setti & Newell, 2010; Stone, 1998; Vuong, Friedman, & Plante, 2009; Vuong, Friedman, & Read, 2012; Vuong & Tarr, 2004, 2006).

Contrary to the prevailing assumption that motion serves only to aid the recovery of shape for object perception (e.g., structure from motion; Marr & Nishihara, 1978; Ullman, 1984), recent evidence indicates that motion contributes to object recognition independently from static cues, such as shape. This was found to be true for both humans and nonhuman animals (see Cook & Murphy, 2012, and Spetch & Friedman, 2006, for reviews). For instance, Spetch, Friedman, and Vuong (2006) investigated the role of motion in object recognition by training both humans and pigeons to respond to three-dimensional (3-D) objects rotating in depth. The main characteristic of the rigid rotation used by Spetch et al. is that there was no deformation of the 3-D shape; rigid rotation is thus most often associated with inanimate objects.

Spetch et al. (2006) found that recognition accuracy for both humans and pigeons decreased when the rigid rotation trajectory of a target object was reversed from the learned motion—even when the shape of the object and the resulting retinal images remained the same (see also Liu & Cooper, 2003; Stone, 1998; Vuong & Tarr, 2004, 2006). It is, however, worth noting several key behavioral differences between the two species. First, the reduction in accuracy by changes in motion was more pronounced for pigeons than for humans. Second, pigeons showed more reliance on shape than motion for the decomposable objects (i.e., objects that had distinct parts), but they relied more on motion than shape for the nondecomposable objects (i.e., objects that were amoeba-like). Humans, by comparison, relied on shape more than motion regardless of whether the objects were decomposable or nondecomposable. Last, with both types of objects, pigeons, but not humans, were able to transfer the discrimination of rigid motion when new objects were presented in the learned motion trajectories. Thus, these findings suggest that pigeons are more sensitive to rigid motion than humans and that their reliance on shape or motion was modulated by the geometric structure of the objects (i.e., decomposable into parts or not). Heavy reliance on motion in pigeons is congruent with reports that, at the lower levels of visual processing, the avian visual system is less sensitive to shape information than to motion (Nankoo, Madan, Spetch, & Wylie, 2014; Nankoo, Madan, Wylie, & Spetch, 2015a). Even though there appears to be a difference between humans and pigeons with respect to reliance on motion, the results of Spetch et al. show that motion affected recognition in both species (see Spetch & Friedman, 2006, for review). Given that organisms with distinct ecological and biological constraints, such as humans and pigeons, make use of rigid motion to identify objects, this finding suggests that motion may be a universally important cue for solving the problem of object recognition in visually dependent organisms.

Although comparative studies on rigid motion, such as Spetch et al. (2006), have been informative with respect to the involvement of motion in object recognition, objects encountered in nature often move in nonrigid ways. This is especially true for biological movement. The movement of body parts (i.e., articulated or semirigid motion) has been shown to carry identity information that can readily be extracted by humans. Research using point-light displays (PLDs) that mimic joint movements (i.e., biological motion; Johansson, 1973) during locomotion shows that humans can extract information such as gender, emotion, and the identity of human walkers (see Troje & Chang, 2013, for review) or novel objects (Jastorff, Kourtzi, & Giese, 2006; Pyles, Garcia, Hoffman, & Grossman, 2007). These displays are often used because they degrade the shape information available (Beintema & Lappe, 2002). In other words, research using PLDs shows that articulated motion alone provides a multitude of information about objects, including identity.

Evidence on the contribution of articulated motion to object recognition with nonhuman animals is scarcer but nonetheless shows that nonhuman animals can also extract important information from the movement of body parts (Dittrich, Lea, Barrett, & Gurr, 1998). For instance, studies show that several species of nonhuman animals can discriminate between coherent PLDs and noncoherent ones (e.g., cats, Blake, 1993; chicks, Vallortigara, Regolin, & Marconato, 2005). There is also evidence that for nonhuman animals articulated motion in naturalistic stimuli can facilitate object recognition relative to static images. For example, Qadri, Sayde, and Cook (2014) presented pigeons with video sequences of human models engaging in a dancing action or a martial arts action. They noted that articulated motion of the human models facilitated discrimination (which they termed the dynamic superiority effect) when compared to discrimination of static single frames randomly selected from the videos (see also Asen & Cook, 2012).

In spite of the prevalence of nonrigid motion in nature, relatively little is known about how nonrigid motion is utilized in conjunction with shape information for object recognition in general. Recently, Vuong et al. (2012) investigated the relative contribution of nonrigid motion and shape information for object recognition in human observers. Participants were asked to determine whether two objects were the same or different. Using a parameter-based morphing technique, the shape and motion differences between the objects were systematically varied and the participants were told to use the shape cue (shape-only condition), the motion cue (motion-only condition), or both cues (shape + motion condition) to distinguish between the objects. In the single cue conditions, participants were instructed to ignore the irrelevant cue (e.g., motion in the shape-only condition). In contrast, in the shape + motion condition, the participants were required to base their decision on both cues; for instance, only when both shape and motion were different between the objects were they to respond “different.” In the shape-only and motion-only conditions, participants were able to distinguish between the objects although shape was more difficult to ignore (i.e., in the motion-only condition). When both shape and motion were used, participants weighted shape more heavily than motion. That is, Vuong et al. (2012) showed that humans are able to use either shape or nonrigid motion to differentiate between objects, but they show a shape bias when both cues were available.

Until now, research has shown that pigeons are more likely than humans to exhibit a motion bias when shape and motion information are available for object recognition in rigidly moving objects (Spetch et al., 2006). Even in the one case in which pigeons showed a shape bias for rigidly moving decomposable objects, they showed more sensitivity to motion than did humans (Spetch et al., 2006). It is currently unknown how nonhuman animals use nonrigid motion for object recognition. Furthermore, it is not known whether the biases reported by Vuong et al. (2012) and Spetch et al. (2006) are representative of a fundamental difference in the way the mammalian and avian brains process shape and motion for object recognition. That is, when nonrigid motion is used in conjunction with shape to identify an object, do humans still show greater sensitivity to shape information and pigeons to motion information? To this end, the present study builds on the findings of Spetch et al. with rigid motion and Asen and Cook (2012) and Qadri et al. (2014) with semirigid motion to investigate the role of nonrigid motion for object recognition in pigeons and humans.

Comparative studies of visual processes between humans and pigeons provide important information on the general principles required for object perception. As stated previously, both humans and pigeons rely extensively on vision for their survival. However, whereas humans process visual information mostly along the thalamofugal pathway (i.e., geniculate pathway), visual information in the avian brain is primarily processed through the tectofugal pathway (i.e., pulvinar pathway; Butler & Hodos, 2005). Thus, an examination of perceptual processes using similar procedures in these distantly related species can inform the functional significance of the difference between the tectofugal and thalamofugal pathways. Ultimately, we may learn how these two organisms with distant neural architecture are able to solve the problem of perceiving dynamic objects.

In the present study, we investigated the contribution of nonrigid motion and shape information to object perception in pigeon (Experiment 1) and human (Experiment 2) observers. We employed the same stimuli as were used in Spetch et al. (2006) and the morphing technique used by Vuong et al. (2012), but we modified the procedure used by Vuong et al. so that the participants had to discriminate a “correct” object (i.e., the S+ stimulus) from an “incorrect” object (i.e., the S− stimulus) to facilitate testing with the pigeons. By varying the values of the S+ stimulus on only the shape continuum, on only the nonrigid motion continuum, and on both continua at the same time, we could ascertain the degree to which pigeons and humans rely on shape and nonrigid motion to discriminate one object from another. Because we did not give instructions regarding which cue to rely on, if pigeons and humans use nonrigid motion and shape cues independently to identify objects, there should be very little decrement in performance across the different morph pairs in the motion- and shape-only conditions because the other cue can be used for the discrimination. In contrast, when both shape and motion vary, there should be a systematic decrement in performance as the S+ stimulus becomes more similar to the S− stimulus.

Experiment 1

Method

Participants

Eight pigeons with previous unrelated touchscreen experience served as subjects. Based on previous research, this number of pigeons is sufficient to detect significant within-subject differences in discrimination tasks that involve multiple trials per condition (e.g., Asen & Cook, 2012; Nankoo et al., 2015a). Four birds were housed in individual cages, and four were housed in group cages under a 12-hr light/dark cycle (light onset at 6:00 a.m.). All birds were maintained at approximately 85% of their free-feeding weights. Water and grit were available ad lib in the home cages.

Apparatus

Stimuli were displayed on a 22-in. Viewsonic VX2268wm FuHzion LCD computer monitor (resolution: 1680 × 1050 pixels; refresh rate: 120 Hz). The experiment was conducted in touch-screen operant chambers. The monitor in each chamber was equipped with a 17-in. Carroll Touch infrared touch frame. Each chamber contained two solenoid-type bird feeders on the side walls of the chamber. Lamps located within each feeder illuminated feeder presentations, and photocells measured the duration of head entries into the hoppers to limit feeding durations to 1 s per food presentation. The chambers were connected to computers located in an adjacent room. These computers controlled all of the experimental contingencies and recorded the responses.

Stimuli

Figure 1A illustrates the dynamic novel objects used in the present study. The stimuli consisted of 3-D objects rendered with a gray surface on a yellow background. The objects were sampled from a shape and motion stimulus space (see Vuong et al., 2012, for details). In short, the stimuli consisted of multipart 3-D models that were deformed based on four parameters to create the nonrigid motion: bend angle, bend direction, twist angle, and twist bias. The bend direction and twist bias affect the initial direction of bending or twisting relative to some arbitrary starting position. Thus, on the shape dimension, the shapes at each end of the shape space (i.e., prototypes) were “brick” and “pyramid” shapes. Within the motion space, the motion prototypes were “bending” and “twisting” motion.

(A) All combinations of S+ and S− prototypes used in this study. Two shape prototypes were used: pyramid and brick. Two motion prototypes were used: twisting and bending. (B) The shape and motion stimulus space from which the prototypes and intermediate morphs were derived. The shape dimension is on the y-axis, and the motion dimension is on the x-axis. The intermediate shapes were derived by linear combination of the two shape prototypes, and the intermediate motions were derived by linear combination of the two motion prototypes. In the shape-only conditions, only the values on the shape dimension were varied whereas the value on the motion dimension remained the same as the S+ motion learned during training. Likewise, in the motion-only conditions, the value on the shape dimension remained the same as the S+ shape learned in training. In the shape + motion conditions (diagonal line), values on both the shape and motion dimensions were varied simultaneously.

Figure 1

(A) All combinations of S+ and S− prototypes used in this study. Two shape prototypes were used: pyramid and brick. Two motion prototypes were used: twisting and bending. (B) The shape and motion stimulus space from which the prototypes and intermediate morphs were derived. The shape dimension is on the y-axis, and the motion dimension is on the x-axis. The intermediate shapes were derived by linear combination of the two shape prototypes, and the intermediate motions were derived by linear combination of the two motion prototypes. In the shape-only conditions, only the values on the shape dimension were varied whereas the value on the motion dimension remained the same as the S+ motion learned during training. Likewise, in the motion-only conditions, the value on the shape dimension remained the same as the S+ shape learned in training. In the shape + motion conditions (diagonal line), values on both the shape and motion dimensions were varied simultaneously.

Each stimulus consisted of pairwise combinations of the shape prototypes and the motion prototypes with different proportions of each morphed dimension, ranging from 0% to 100%. For example, a twisting brick prototype consisted of 100% brick prototype and 0% pyramid prototype on the shape space. Similarly, on the motion space, this prototype stimulus consisted of 100% twisting motion and 0% bending motion. Intermediate morphs were derived by linear combinations of the prototypes.

Similar to our previous study (Vuong et al., 2012), we independently manipulated the shape, the motion, or both the shape and motion of the S+ stimulus. For example, as illustrated in Figure 1B, suppose that the S+ is a bending brick (bottom left corner) and the S− stimulus is a twisting pyramid (top right corner). The shape of the S+ stimulus (brick) could then be systematically “morphed” toward the shape of the S− stimulus (pyramid) while leaving the motion constant (bending). Similarly, the motion of the S+ stimulus could be “morphed” from bending to twisting while keeping the shape constant. Last, both the shape and motion of the S+ stimulus could be equally “morphed” toward the S− stimulus.

The stimuli consisted of 100 frames that were presented at 60 Hz. Note that the frames were looped through until a response was made or for a maximum duration of 2 min. Based on an estimated average distance of 9 cm (Bischof, Reid, Wylie, & Spetch, 1999), the stimuli subtended an estimated 21.19° (160 pixels) × 34.36° (190 pixels) of visual angle. The experiment and stimulus presentation were controlled by a Windows PC computer running E-Prime (PST Software, Pittsburgh, PA).

Procedure and design

Training consisted of two phases. In both phases, a trial began with a start stimulus consisting of a gray circle in the center of the screen. Once the birds pecked the start stimulus, it disappeared, and the training trial began. If the birds did not peck at the start stimulus within 1 min, it disappeared and reappeared after an intertrial interval of 10 s. In phase 1, only the S+ stimulus was shown on the display, and birds received a 1-s reward via a food dispenser if they pecked at the stimulus. Phase 2 was the discrimination learning phase during which the birds learned to discriminate one prototype (S+) from its opposite counterpart on the shape–motion space continuum (S−; Figure 1). Two randomly selected birds were assigned to each of the four shape–motion prototype combinations (Figure 1A). Using a simultaneous two-alternative, forced-choice paradigm, the birds were presented with both an S+ and S− stimulus on the screen with the position of the stimuli (left or right) counterbalanced across trials. The stimuli remained on the screen for a maximum of 2 min or until a response was made. The pigeons responded by pecking at one of the stimuli. A peck to the S+ stimulus resulted in a 1-s reward via a food dispenser. A peck to the S− stimulus or if no response was made resulted in no reward and a time-out of 10 s. During the time-out, the screen was blank. The birds were moved to the testing phase after they achieved an accuracy of 85% over three consecutive sessions.

Testing consisted of three conditions: a shape-only condition in which the shape was manipulated while the motion remained the same as the S+ stimulus, a motion-only condition in which the motion was manipulated while the shape remained the same as the S+ stimulus, and a shape + motion condition in which both the shape and motion were changed to the same degree (see Figure 1B). As in training phase 2, two stimuli were presented simultaneously on the screen. One of the stimuli was always the S− stimulus (i.e., the stimulus that the birds learned not to peck in training), and the other stimulus was the S+ stimulus or morphed versions of the S+ stimulus (see below). The position of the S+ and S− stimuli on the left and right sides of the screen was randomized across trials. Each test trial began with a start stimulus as was done in training. If the bird did not complete a trial within 2 min, the trial ended and was scored as an incorrect response. Trials from all three conditions were presented randomly in each testing session.

Testing was carried out by changing the shape, the motion, or both the shape and motion of the S+ stimulus by 0%, 30%, 60%, 70%, 80%, 90%, 95%, and 100%. A change of 0% meant the stimulus remained identical to the original S+ stimulus, and a change of 100% meant that it became identical to the S− stimulus on the manipulated dimension. These values were chosen based on pilot data collected prior to this study. A peck to the S+ stimulus, even when its shape and/or motion dimensions were changed from the prototype, resulted in the food reward. If the birds' performance on baseline trials (i.e., the 0% morph) in the testing phase was below the training criterion for 2 days in a row, they were put back on training until they reached the training criterion. Thereafter, they resumed testing. In both training and testing sessions, birds were allowed to complete as many trials as possible for a duration of 45 min. Testing was continued until the birds had completed at least 120 trials for each level of each condition.

Results

The training data showed that the birds learned the task relatively quickly with the fastest bird surpassing the criterion within four sessions and the slowest bird taking 20 sessions. During testing, the birds rarely failed to complete a trial within the 2-min time limit; this occurred on fewer than 1% of the trials overall and fewer than 2% of all trials in any level of percentage change from S+. A condition × percentage change from S+ ANOVA was conducted on the percentage correct during test trials. There was a main effect of condition, F(2, 14) = 50.06, MSE = 109.58, Display Formula\(\def\upalpha{\unicode[Times]{x3B1}}\)\(\def\upbeta{\unicode[Times]{x3B2}}\)\(\def\upgamma{\unicode[Times]{x3B3}}\)\(\def\updelta{\unicode[Times]{x3B4}}\)\(\def\upvarepsilon{\unicode[Times]{x3B5}}\)\(\def\upzeta{\unicode[Times]{x3B6}}\)\(\def\upeta{\unicode[Times]{x3B7}}\)\(\def\uptheta{\unicode[Times]{x3B8}}\)\(\def\upiota{\unicode[Times]{x3B9}}\)\(\def\upkappa{\unicode[Times]{x3BA}}\)\(\def\uplambda{\unicode[Times]{x3BB}}\)\(\def\upmu{\unicode[Times]{x3BC}}\)\(\def\upnu{\unicode[Times]{x3BD}}\)\(\def\upxi{\unicode[Times]{x3BE}}\)\(\def\upomicron{\unicode[Times]{x3BF}}\)\(\def\uppi{\unicode[Times]{x3C0}}\)\(\def\uprho{\unicode[Times]{x3C1}}\)\(\def\upsigma{\unicodeTimes]{x3C3}}\)\(\def\uptau{\unicode[Times]{x3C4}}\)\(\def\upupsilon{\unicode[Times]{x3C5}}\)\(\def\upphi{\unicode[Times]{x3C6}}\)\(\def\upchi{\unicode[Times]{x3C7}}\)\(\def\uppsy{\unicode[Times]{x3C8}}\)\(\def\upomega{\unicode[Times]{x3C9}}\)\(\def\bialpha{\boldsymbol{\alpha}}\)\(\def\bibeta{\boldsymbol{\beta}}\)\(\def\bigamma{\boldsymbol{\gamma}}\)\(\def\bidelta{\boldsymbol{\delta}}\)\(\def\bivarepsilon{\boldsymbol{\varepsilon}}\)\(\def\bizeta{\boldsymbol{\zeta}}\)\(\def\bieta{\boldsymbol{\eta}}\)\(\def\bitheta{\boldsymbol{\theta}}\)\(\def\biiota{\boldsymbol{\iota}}\)\(\def\bikappa{\boldsymbol{\kappa}}\)\(\def\bilambda{\boldsymbol{\lambda}}\)\(\def\bimu{\boldsymbol{\mu}}\)\(\def\binu{\boldsymbol{\nu}}\)\(\def\bixi{\boldsymbol{\xi}}\)\(\def\biomicron{\boldsymbol{\micron}}\)\(\def\bipi{\boldsymbol{\pi}}\)\(\def\birho{\boldsymbol{\rho}}\)\(\def\bisigma{\boldsymbol{\sigma}}\)\(\def\bitau{\boldsymbol{\tau}}\)\(\def\biupsilon{\boldsymbol{\upsilon}}\)\(\def\biphi{\boldsymbol{\phi}}\)\(\def\bichi{\boldsymbol{\chi}}\)\(\def\bipsy{\boldsymbol{\psy}}\)\(\def\biomega{\boldsymbol{\omega}}\)\(\eta _p^2\) = 0.88, p < 0.001. Mean accuracy for the shape-only, motion-only, and shape + motion conditions were 91.78%, 91.08%, and 75.41%, respectively, 95% CI = ±7.94%. There was also a main effect of percentage change from S+, F(7, 49) = 86.79, MSE = 16.28, Display Formula\(\eta _p^2\) = 0.93, p < 0.001, and an interaction between condition and percentage change, F(14, 98) = 19.68, MSE = 24.37, Display Formula\(\eta _p^2 = 0.74, p \lt 0.001.\) The left graph in Figure 2 plots this interaction. To explore this interaction further, we compared the accuracy when the S+ stimulus was presented (i.e., 0% change) to the accuracy when the S− stimulus was presented (i.e., 100% change) for each condition. There was a small but significant drop in accuracy for the shape-only, t(7) = 3.34, SDdiff = 6.46, p < 0.02; the means for 0% versus 100% change from S+ were 95.75% and 88.13%, respectively. The birds also showed a small but significant decline for the motion-only condition, t(7) = 3.16, SDdiff = 8.17, p < 0.02; the means were 95.75% and 86.63% for 0% versus 100% change from S+, respectively. Critically, there was a substantial drop in accuracy for the shape + motion condition, t(7) = 27.80, SDdiff = 4.72, p < 0.0001. The means for 0% versus 100% change from S+ were 95.75% and 49.38%. Although both the shape-only and motion-only conditions showed significant (but small) differences in accuracy between the 0% and 100% change from S+, there was no significant difference in accuracy between the shape-only and motion-only conditions at 100% change from S+, t(7) = 0.38, SDdiff = 11.08, p > 0.5. This finding suggests that pigeons could use either learned shape or motion cues with similar accuracy levels.

The accuracy (percentage correct) of the pigeons in the shape-only, motion-only and shape + motion conditions as a function of the percentage change from the S+ stimulus (i.e., percentage change from the S+ shape and/or motion toward the S− shape and/or motion). The left graph shows the results across all test sessions, and the right graph shows the results for the first five test sessions. Error bars represent within-subject 95% confidence interval (Loftus & Masson, 1994) computed from the interaction error term.

Figure 2

The accuracy (percentage correct) of the pigeons in the shape-only, motion-only and shape + motion conditions as a function of the percentage change from the S+ stimulus (i.e., percentage change from the S+ shape and/or motion toward the S− shape and/or motion). The left graph shows the results across all test sessions, and the right graph shows the results for the first five test sessions. Error bars represent within-subject 95% confidence interval (Loftus & Masson, 1994) computed from the interaction error term.

To examine whether the results were influenced by the amount of testing that was done, we also analyzed performance from only the first five sessions. The results were indistinguishable from the analysis that included all of the sessions (Figure 2, right graph). There were again significant main effects of condition and percentage change from S+, F(2, 14) = 44.64, MSE = 120.98, Display Formula\(\eta _p^2\) = 0.86, p < 0.001; F(7, 49) = 70.80, MSE = 21.90, Display Formula\(\eta _p^2\) = 0.91, p < 0.001, respectively, and an interaction between condition and percentage change from S+, F(14, 98) = 7.95, MSE = 55.09, Display Formula\(\eta _p^2 = 0.53, p \lt 0.001,{\rm{\ respectively}}.\) The means for the shape-only, motion-only, and shape + motion conditions were 88.69%, 88.22%, and 72.55%, respectively, 95% CI = ±8.34%.

Experiment 2

Method

Participants

Fourteen adults with normal or corrected-to-normal vision participated in the experiment. The participants were undergraduate students (aged between 18 and 25 years old) from the University of Alberta's subject pool and were naive as to the purpose of the experiment. All participants provided informed consent. One participant's data were eliminated due to experimenter error, and a second participant's data were eliminated because he did not reach the training criterion. Based on previous research, 12 participants is sufficient to detect significant within-subject differences in discrimination tasks that involve multiple trials per condition (e.g., Nankoo, Madan, Spetch, & Wylie, 2015b).

Stimuli, procedure, and design

Stimuli were displayed on a computer with the same specifications as in Experiment 1. However, no touchscreen was used. Instead, participants responded by clicking the mouse cursor on one of the stimuli. A chin rest was used to maintain the distance of the participant to the monitor at 47 cm.

The stimuli were the same as those used in Experiment 1 with the exception that the shape, motion, and both shape and motion dimensions of the S+ stimulus were changed by 0% (i.e., the S+ stimulus), 80%, 90%, 95%, and 100% (i.e., the shape or motion of the S− stimulus) on the shape and motion dimensions. The stimuli subtended approximately 5.96° (160 pixels) × 7.08° (190 pixels) of visual angle.

Three participants were assigned to each of the four shape–motion prototype combinations (Figure 1). Prior to testing, the participants were given 10 training trials to learn to discriminate between the S+ and S− stimuli. The training phase was similar to the training phase 2 in Experiment 1. A correct response resulted in visual feedback (i.e., the word “correct” appeared on the screen) whereas an incorrect response resulted in no feedback, and the trial ended. The training criterion was eight correct responses out of 10 trials. Only one participant failed to achieve this level of accuracy and was thus dropped from the experiment. After the training trials, the participants moved to the testing phase and completed 30 trials per morph percentage difference. In testing, visual feedback was provided for correct responses as was done in training. An incorrect response resulted in no feedback. The design was otherwise similar to Experiment 1 with all three conditions and morph percentage differences randomly intermixed.

Results

Overall, the pattern of results was similar to that for the pigeons. There was a main effect of condition, F(2, 22) = 79.65, MSE = 134.87, Display Formula\(\eta _p^2\) = 0.88, p < 0.001. The means for the shape-only, motion-only, and shape + motion conditions were 98.47%, 98.25%, and 75.18%, respectively, 95% CI = ±7.00%. As with the birds, there was also a main effect of percentage change from S+, F(4, 44) = 24.78, MSE = 51.75, Display Formula\(\eta _p^2\) = 0.69, p < 0.001, and a condition × percentage change interaction, F(8, 88) = 27.09, MSE = 48.30, Display Formula\(\eta _p^2\)= 0.71, p < 0.001. The interaction is shown in the left graph of Figure 3. To explore this interaction further, we compared the accuracy when the S+ stimulus (0% change) or S− stimulus (100% change) was presented for each condition. In contrast to the pigeons, there were no significant differences for either the shape-only or motion-only conditions, t(11) = −0.70 and t(11) = −0.24, respectively. However, like the pigeons, the difference between 0% and 100% change from S+ was significant for the shape + motion condition, t(11) = 28.74, SDdiff = 5.92, p < 0.001. The means for the latter condition were 98.00% and 48.92%, respectively, for 0% versus 100% change from S+.

The accuracy (percentage correct) of human observers in the shape-only, motion-only and shape + motion conditions as a function of the percentage change from the S+ stimulus (i.e., percentage change from the S+ shape and/or motion toward the S− shape and/or motion). The left graph shows the results for all test trials, and the right graphs shows the results for just the first third of the test trials. Error bars represent within-subject 95% confidence interval (Loftus & Masson, 1994) computed from the interaction error term.

Figure 3

The accuracy (percentage correct) of human observers in the shape-only, motion-only and shape + motion conditions as a function of the percentage change from the S+ stimulus (i.e., percentage change from the S+ shape and/or motion toward the S− shape and/or motion). The left graph shows the results for all test trials, and the right graphs shows the results for just the first third of the test trials. Error bars represent within-subject 95% confidence interval (Loftus & Masson, 1994) computed from the interaction error term.

For comparison to the pigeons, we also analyzed the data for the first third of their session (10 trials for each of the 3 conditions × 5 changes from S+ per condition). Overall, and like the pigeons, the pattern of results was similar to the complete set of trials. There was a main effect of condition, F(2, 22) = 53.45, MSE = 299.24, Display Formula\(\eta _p^2\) = 0.829, p < 0.001. The means for the shape-only, motion-only, and shape + motion conditions were 98.17%, 96.67%, and 69.17%, respectively, 95% CI = ±10.36%. As with the birds, there was also a main effect of percentage change from S+, F(4, 44) = 17.52 , MSE = 103.76 , Display Formula\(\eta _p^2\) = 0.614, p < 0.001, and a condition × percentage change interaction, F(8, 88) = 12.16, MSE = 118.50, Display Formula\(\eta _p^2\)= 0.525, p < 0.001. The interaction is shown in the right graph of Figure 3.

To further quantify any differences or similarities between pigeons and humans in the shape + motion condition, we estimated the 80% threshold (i.e., the percentage change from S+ that gives rise to 80% accuracy) for each subject in that condition by fitting an exponential function to the accuracy data for each individual subject. We used the equation

where f(x) is the proportion correct, and x is the percentage change from S+. Table 1 shows the individual thresholds and parameters of the exponential fit for both species as well as the means across these. There are two observations to note. First, the threshold range was wide for both species: The thresholds for the birds ranged from 48.4% to 88.5%, and for the humans, they ranged between 44.46% and 98.81%. Second, four of the human participants performed at ceiling between 0% change to 95% change but showed a drastic drop in performance at the 100% change, which gave rise to A values close to zero and to large B values (and to the larger threshold range).

Estimated threshold values (percentage change from S+ at which accuracy was 80%) and parameters of the exponential fits for individual humans and birds.

Table 1

Estimated threshold values (percentage change from S+ at which accuracy was 80%) and parameters of the exponential fits for individual humans and birds.

We found that the 80% threshold for humans (M = 79.7% change from S+, 95% CI = 10.31) was not significantly different from the threshold for pigeons (M = 69.0% change from S+, 95% CI = 10.81) according to a Mann-Whitney U test (Z = 1.39, p = 0.18). For both species, the exponential function was a good fit to the data (average root mean square error, RMSE = 0.076 for humans and RMSE = 0.050 for pigeons).

Taken together, these results show that human participants' accuracy systematically decreased as a function of percentage change from S+ when both shape and motion were changed as was the case for pigeons. Like the pigeons, humans maintained high accuracy when only one of the cues was changed. Unlike the pigeons, which showed a very small but significant decrease in accuracy when one of the cues was changed, accuracy in humans was unaffected by a change in only one the cues. Furthermore, both species tolerated reasonably large changes in the S+ before accuracy dropped below 80%.

Discussion

The results from both experiments demonstrate the important role of nonrigid motion for object perception across two different species. Specifically, we found that both pigeons and humans readily used nonrigid motion and shape information and were able to rely on either cue alone to recognize novel, dynamic objects. We used our morphing procedure to independently make shape or motion cues less informative relative to the shape and motion information available in the trained, rewarded stimulus (i.e., the S+ stimulus). The cues were rendered less informative by making the stimulus ambiguous (i.e., by systematically morphing the S+ stimulus into the S− stimulus). Thus, when shape information was rendered less informative, both species were able to reliably discriminate between the objects based on the learned nonrigid motion information. The opposite was also true when motion was made less informative for discrimination; in this case, both species relied on the shape information. In other words, both humans and pigeons were able to rely on either of the learned cues for highly accurate object recognition. Only when both shape and motion cues were simultaneously changed from the trained stimulus (i.e., both were made less informative) did we observe a substantial decline in performance as a function of the degree of change from the S+ stimulus. Furthermore, we argue that because we did not provide the participants with instructions about which cue to attend to, our result may be a more accurate reflection of the implicit strategy used by humans and pigeons for object perception. Finally, unlike some previous comparative studies, we used stimuli that were unfamiliar to both humans and pigeons. For example, Qadri et al. (2014) used human actions, which are more familiar to humans than pigeons.

Our results, together with previous results, suggest that motion may play an important role in higher-level object perception in pigeons. Previous studies have shown that birds tend to be more sensitive to motion information than to shape information and more biased to rely on motion information when discriminating patterns of disconnected dots (e.g., Nankoo et al., 2015a) or when shapes were visually similar as with amoeba-like objects with no clear part structures (Spetch et al., 2006). With those stimuli, pigeons seem different than humans. However, with rigidly moving decomposable objects, in which the shapes are more visually distinct than the amoeba-like objects, pigeons instead showed a shape bias similar to humans. However, they still showed stronger sensitivity to motion than did humans (Spetch et al., 2006).

In the current study, which used decomposable objects similar to those in Spetch et al. (2006) but with nonrigid motion, pigeons, like humans, learned both shape and the motion properties of the objects extremely well. There are two key findings supporting this conclusion. First, pigeons could independently use either shape or nonrigid motion with a high degree of accuracy: Pigeons' performance remained extremely high (above 85% accuracy) even when shape or nonrigid motion was completely degraded. The finding that pigeons could independently use either shape or nonrigid motion with high levels of accuracy is an important contribution. Second, when both shape and motion were degraded together, the 80% accuracy threshold was similar for both species and required approximately 70% degradation.

One possibility that should be investigated in future research is that the presence of biological-like motion may facilitate shape perception in birds. In other words, because of the importance of identifying biological agents, motion, and in particular nonrigid motion, may signal the birds to also use the shape information, or it may facilitate processing the shape information. This hypothesis is supported by the pigeons' rapid learning and high performance throughout the experiment. It is possible that in the presence of a biological stimulus, birds use all the available cues to identify the object. It would be important to test this hypothesis with multiple cues to identity, such as color, shape, and rigid and nonrigid motion.

The ability to rely on either shape or motion, found for both humans and pigeons, is consistent with the functional organization of the visual system in both species. In mammals, it is well known that motion and shape are processed primarily through different pathways (Braddick, O'Brien, Wattam-Bell, Atkinson, & Turner, 2000; Livingstone & Hubel, 1988; Milner & Goodale, 1995; Ungerleider & Mishkin, 1982; Van Essen & Gallant, 1994) although there is some interaction between these pathways. This is also congruent with a biologically grounded computational model for object processing (Giese & Poggio, 2003). Similarly, the avian visual system also processes shape and motion primarily through parallel pathways. For example, Nguyen et al. (2004) showed, with a lesion study, that the entopallium (putatively equivalent to the mammalian extrastriate cortex) is divided into several functional units that include the caudal entopallium for motion processing and the rostral portion for shape processing. Yet, in spite of the apparent similarities in the functional organization of the visual system, a major difference between the avian and the mammalian visual systems is that the primary route for visual information in the avian brain is along the tectofugal pathway whereas in the mammalian brain it is the thalamofugal pathway.

Another potential difference between the primate and avian neuroanatomy is that there are known regions in the primate brain that respond when both shape and motion are presented together (e.g., Jastorff et al., 2006; Jastorff, & Orban, 2009; Jastorff, Popivanov, Vogels, Vanduffel, & Orban, 2012; see Kourtzi, Krekelberg, & van Wezel, 2008, and Mather, Pavan, Marotti, Campana, & Casco, 2013, for review). It is not currently known whether homologous brain structures exist in the avian brain. In addition, humans and pigeons have different biological and ecological constraints, such as the fact that the pigeon's visual system has to deal with the unique challenges of flight. Given these differences in neuroanatomy and ecological constraints, our results, in conjunction with neuroanatomical and neurophysiological data, suggest that, although regions may allow for interactions between shape and motion information in both species, the parallel processing of shape and nonrigid motion is a general principle of object recognition which allows an animal to use learned shape or motion cues when the other cue is severely degraded by environmental conditions.

Acknowledgments

This research was supported by grants from the National Science and Engineering Research Council (NSERC) of Canada to A. F., M. L. S., and D. R. W., and by an NSERC Alexander Graham Bell Canada Graduate Scholarship (doctoral-level) to C. R. M. All research was conducted in accordance with Canadian Council on Animal Care guidelines and with approval from the University of Alberta Animal Welfare Policy Committee. These data formed a portion of the dissertation research of Dr. J.-F. Nankoo.

Commercial relationships: none.

Corresponding author: Marcia L. Spetch.

Address: Department of Psychology, University of Alberta, Edmonton, AB, Canada.

(A) All combinations of S+ and S− prototypes used in this study. Two shape prototypes were used: pyramid and brick. Two motion prototypes were used: twisting and bending. (B) The shape and motion stimulus space from which the prototypes and intermediate morphs were derived. The shape dimension is on the y-axis, and the motion dimension is on the x-axis. The intermediate shapes were derived by linear combination of the two shape prototypes, and the intermediate motions were derived by linear combination of the two motion prototypes. In the shape-only conditions, only the values on the shape dimension were varied whereas the value on the motion dimension remained the same as the S+ motion learned during training. Likewise, in the motion-only conditions, the value on the shape dimension remained the same as the S+ shape learned in training. In the shape + motion conditions (diagonal line), values on both the shape and motion dimensions were varied simultaneously.

Figure 1

(A) All combinations of S+ and S− prototypes used in this study. Two shape prototypes were used: pyramid and brick. Two motion prototypes were used: twisting and bending. (B) The shape and motion stimulus space from which the prototypes and intermediate morphs were derived. The shape dimension is on the y-axis, and the motion dimension is on the x-axis. The intermediate shapes were derived by linear combination of the two shape prototypes, and the intermediate motions were derived by linear combination of the two motion prototypes. In the shape-only conditions, only the values on the shape dimension were varied whereas the value on the motion dimension remained the same as the S+ motion learned during training. Likewise, in the motion-only conditions, the value on the shape dimension remained the same as the S+ shape learned in training. In the shape + motion conditions (diagonal line), values on both the shape and motion dimensions were varied simultaneously.

The accuracy (percentage correct) of the pigeons in the shape-only, motion-only and shape + motion conditions as a function of the percentage change from the S+ stimulus (i.e., percentage change from the S+ shape and/or motion toward the S− shape and/or motion). The left graph shows the results across all test sessions, and the right graph shows the results for the first five test sessions. Error bars represent within-subject 95% confidence interval (Loftus & Masson, 1994) computed from the interaction error term.

Figure 2

The accuracy (percentage correct) of the pigeons in the shape-only, motion-only and shape + motion conditions as a function of the percentage change from the S+ stimulus (i.e., percentage change from the S+ shape and/or motion toward the S− shape and/or motion). The left graph shows the results across all test sessions, and the right graph shows the results for the first five test sessions. Error bars represent within-subject 95% confidence interval (Loftus & Masson, 1994) computed from the interaction error term.

The accuracy (percentage correct) of human observers in the shape-only, motion-only and shape + motion conditions as a function of the percentage change from the S+ stimulus (i.e., percentage change from the S+ shape and/or motion toward the S− shape and/or motion). The left graph shows the results for all test trials, and the right graphs shows the results for just the first third of the test trials. Error bars represent within-subject 95% confidence interval (Loftus & Masson, 1994) computed from the interaction error term.

Figure 3

The accuracy (percentage correct) of human observers in the shape-only, motion-only and shape + motion conditions as a function of the percentage change from the S+ stimulus (i.e., percentage change from the S+ shape and/or motion toward the S− shape and/or motion). The left graph shows the results for all test trials, and the right graphs shows the results for just the first third of the test trials. Error bars represent within-subject 95% confidence interval (Loftus & Masson, 1994) computed from the interaction error term.