Binocular rivalry occurs when the images presented to the two eyes do not match. Instead of fusing into a stable percept, perception during rivalry alternates between images over time. However, during rivalry, perception can also resemble a patchwork of parts of both eyes' images. Such integration of image parts across eyes is relatively rare compared to integration of image parts presented to the same eye, suggesting that integration across space during rivalry is primarily rooted at the early monocular level of processing. However, recent evidence suggests that rivalry, and potentially also integration across space during rivalry, has its basis at multiple stages of processing, including stages at which monocular signals are minimal. As such, integration and competition at these later stages would be driven more by image-based factors, such as continuity and color than by eye of origin. Because “higher” visual areas also have increasingly larger receptive fields, image-based integration may occur over a larger spatial extent compared to monocular, eye-based integration. We therefore used rival images containing two separate image parts and varied the interimage-part distance (IIPD) to assess the relative contributions of eye of origin and image features to integration across space at increasing IIPDs. Our hypothesis was that the balance between these contributions would shift toward image features as IIPD increased. Instead, results show that the relative contributions of both factors to grouping remain constant as a function of IIPD. This indicates that image-based grouping is subject to similar spatial constraints as monocular, eye-based grouping, suggesting both kinds of grouping rely on similarly sized receptive fields.

Introduction

Spatial integration of local visual elements is a fundamental property of our visual system (Wertheimer, 1923). It allows for the experience of meaningful objects rather than collections of basic visual features and is thought to rely on Gestalt perceptual grouping principles (Wagemans et al., 2012). Although perceptual grouping into a coherent percept may appear to occur rapidly and automatically under normal viewing conditions (Hochstein & Ahissar, 2002), this process can be hindered during dichoptic stimulation. Specifically, when two incompatible images are presented to corresponding locations on the retinas, the images will fail to combine into a stable percept, and perception goes into an ever-changing cycle instead. This phenomenon is known as binocular rivalry (Wheatstone, 1838). In most cases during binocular rivalry, perception alternates between the two monocular images, but patchwork combinations of the two images can also be perceived (Meenes, 1930). Thus, as perception alternates between various combinations of the monocular input during binocular rivalry, these dynamics provide a glimpse into interocular combination as well as into grouping across space (Blake, Brascamp, & Heeger, 2014; Stuit, Paffen, van der Smagt, & Verstraten, 2011). Accordingly, here we use the term “grouping” to refer to the process that brings about simultaneous dominance of different parts of the images engaged in rivalry.

Previous results show that spatial coherence of image features is an important factor driving this grouping process (e.g., Alais & Blake, 1998, 1999; Kovács, Papathomas, Yang, & Fehér, 1996; Whittle, Bloor, & Pocock, 1968). Diaz-Caneja (1928, cited in Alais, O'Shea, Mesana-Alais, & Wilson, 2000) was the first to show that two spatially incoherent complementary rival images can lead to alternations between spatially coherent percepts, each a patchwork of both eyes' images. This shows that, when dichoptic images fail to fuse, corresponding features from those images, presented to different eyes, can still be grouped together over space. It thus demonstrates that perception during binocular rivalry does not depend only on local resolution of interocular conflict (Blake, O'Shea, & Mueller, 1992), but that spatial dependencies between neighboring regions influence perception during rivalry, too. Indeed, subsequent work showed such image-based influences on perceptual grouping during rivalry for features such as motion, orientation, and color (Alais & Blake, 1998, 1999; Kovács et al., 1996; van Lier & de Weert, 2003; Whittle et al., 1968).

In two earlier studies, we compared the influence of image-based factors to the influence of eye of origin: the tendency for the percept during rivalry to encompass a single eye's image across an extended region of space (Stuit et al., 2011; Stuit, Paffen, van der Smagt, & Verstraten, 2014). In these studies, two image parts were presented to each eye, similar to the design we used in the current experiment. Each image part was always in conflict with an image part at the corresponding location in the other eye. However, across different conditions, each image part was coherent with either the image part in the other location in the same eye (within-eye coherence) or with the image part in the other location in the other eye (across-eyes coherence). Observers' perceptual reports in these conditions allowed us to separately assess the influence of eye-based and image-based factors on perceptual grouping during rivalry. Specifically, eye-based factors would be expected to lead to simultaneous perceptual dominance of both image parts presented to a given eye regardless of image coherence whereas image-based factors should lead to simultaneous dominance of two mutually coherent image parts, either both presented to the same eye or each presented to a different eye. In these studies, we consistently found that grouping during rivalry is most strongly driven by eye of origin. Although other work has suggested that the content of the images engaged in rivalry (e.g., static gratings, moving patterns, faces) can influence the level at which rivalry is resolved (e.g., Alais & Parker, 2006) and the degree of grouping across eyes (e.g., Tong, Meng, & Blake, 2006), we found eye of origin to be the main determinant of grouping regardless of image content (Stuit et al., 2014). This suggested to us that rivalry, or at least grouping during rivalry, primarily relies on a level of processing with which eye-of-origin information is still encoded as opposed to a level of processing with which competition would primarily be between incompatible image features.

Although our findings on grouping during rivalry suggest an emphasis on a relatively low level of visual processing (also see Quinn & Arnold, 2010), there is considerable evidence that binocular rivalry depends on processing at multiple levels of the visual processing hierarchy simultaneously (Blake, 1989; Logothetis, Leopold, & Sheinberg, 1996; Silver & Logothetis, 2007; Tong & Engel, 2001; Tong et al., 2006; Wilson, 2003). Likewise, spatial integration without interocular conflict has also been argued to occur at both early (Palmer, 2003; Schulz & Sanocki, 2003) and later stages of visual processing (Palmer, Neff, & Beck, 1996; Palmer & Nelson, 2000; Rock & Brosgole, 1964; Rock, Nijhawan, Palmer, & Tudor, 1992). In the present study, we aim to investigate whether one can specifically reveal the influence of higher processing levels on perceptual grouping during rivalry by using stimulus settings that should emphasize those processing levels. Our experimental design capitalizes on two neural properties that show a predictable change as one ascends the visual processing hierarchy: receptive field size and the prominence of monocular signals. Specifically, evidence shows that the average receptive field size increases for later visual areas (Amano, Wandell, & Dumoulin, 2009; Dumoulin & Wandell, 2008; Harvey & Dumoulin, 2011; Smith, Singh, Williams, & Greenlee, 2001). This is important because the size of receptive fields limits the extent of spatial integration during rivalry (Blake et al., 1992). The prominence of monocular signals, in contrast, decreases for later visual areas (Barendregt, Harvey, Rokers, & Dumoulin, 2015; Hubel & Wiesel, 1962, 1974). Taken together, this suggest that monocular, eye-based influences on grouping during rivalry should have a smaller spatial extent compared to purely image-based influences and, therefore, that image-based influences may be relatively strong at stimulus settings that require grouping across larger spatial distances. If this prediction is confirmed, then this would provide a direct indication that multiple processing levels provide contributions to perceptual grouping during binocular rivalry. Alternatively, eye-based and image-based influences on grouping during rivalry may originate from a similar, lower level of processing, in which case the spatial extent of both kinds of grouping should be similar.

In our design (Figure 1), each monocular image is built up of two image parts, and we systematically vary the distance between those image parts (interimage-part distance, IIPD) to assess whether this influences the relative impact of eye of origin and image coherence on perceptual grouping during binocular rivalry. In two different conditions, there is figural correspondence between image parts placed in the same eye (Figure 1A) or between image parts placed in opposite eyes (Figure 1B). Importantly, in the former condition, eye of origin and figural coherence both work in concert to promote simultaneous dominance of image parts that are shown to the same eye and that also have corresponding figural content whereas in the latter condition eye-based grouping and image-based grouping are pitted against each other. Assuming that eye-based factors and image-based factors can, indeed, combine to determine perceptual outcome (Knapen, Paffen, Kanai, & van Ee, 2007; Vergeer & van Lier, 2010) and furthermore assuming that the impact of both these kinds of factors eventually disappears when image parts are placed far enough apart (Alais & Blake, 1999; Blake et al., 1992), we can qualitatively sketch out predictions for our experiment.

Schematic representation of the stimulus arrangement in our two conditions. To separate the influence of the eye of origin and image content on the perceptual grouping of image parts during binocular rivalry matching, image parts were either (a) presented to the same eye (A) or (b) presented to different eyes (B). The image parts were always presented at 3° from fixation although IIPD varied. IIPD refers to the distance between image parts from edge to edge. The IIPD was either 0°, 1°, 2°, 3°, 5°, or 6°. Furthermore, the image parts could be presented to either the right or the left of fixation.

Figure 1

Schematic representation of the stimulus arrangement in our two conditions. To separate the influence of the eye of origin and image content on the perceptual grouping of image parts during binocular rivalry matching, image parts were either (a) presented to the same eye (A) or (b) presented to different eyes (B). The image parts were always presented at 3° from fixation although IIPD varied. IIPD refers to the distance between image parts from edge to edge. The IIPD was either 0°, 1°, 2°, 3°, 5°, or 6°. Furthermore, the image parts could be presented to either the right or the left of fixation.

Specifically, regardless of whether the mechanisms that promote, respectively, eye-based grouping and image-based grouping differ in their spatial extent (Figure 2A) or not (Figure 2C), their combined effect should be larger when image parts are closely spaced (small IIPD) than when they are spaced far apart (large IIPD). Accordingly, in the condition in which matching image parts are presented to the same eye (Figures 1A, 2A, and 2C), we expect the predominance of percepts built up of image parts with corresponding figural content (y-axis) to be high for small IIPDs and to decrease with increasing IIPD (x-axis). Note that the relative importance of eye- and image-based factors in the current experiment cannot be estimated a priori, so each panel shows predictions for three scenarios, depending on which type of grouping is stronger when image parts are directly abutting (i.e., at an IIPD of 0°): At this minimal distance, image-based factors can be either stronger than, equal to, or weaker than eye-based grouping at the same IIPD. Note that this question of overall balance does not affect our qualitative predictions in Figure 2A and 2C and that our focus is not on this overall balance across IIPDs, but on changes in this balance with changing IIPD. This becomes clear from our predictions for the second condition (Figure 2B and 2D), in which image-based factors and eye-based factors act against each other (cf. Figure 1B). In this condition, the combined effect of the two kinds of factors is expected to qualitatively depend on whether image-based mechanisms have a larger spatial extent than eye-based mechanisms (Figure 2B) or not (Figure 2D). In particular, if image-based mechanisms have a larger spatial extent, then the predominance of percepts built up of image parts with corresponding figural content should peak at an intermediate IIPD, at which image-based factors still have considerable influence, whereas the eye-based factors that now oppose this percept have little impact. This prediction holds regardless of the relative strengths of the two types of grouping for an IIPD of 0° (i.e., for each of the three curves). Importantly, this presence of a peak at intermediate IIPD is diagnostic for the hypothesized difference in spatial extent because the intermediate peak is predicted to be absent if the spatial extents of image-based grouping and eye-based grouping do not differ (Figure 2D). In the current experiment, we thus aim to investigate whether this intermediate peak is present or not.

Visual representations of hypotheses. Figure 2 shows hypothesized outcomes of our experiment, depending on the question of whether image-based grouping has a larger spatial extent to eye-based grouping (A and B) or whether both types of grouping have a similar spatial extent (C and D). Regardless of spatial extent, the relative strengths of eye- and image-based grouping for immediately abutting image parts are unknown, so each panel shows predictions for three scenarios: image-based grouping at an IIPD of 0° is either stronger than (dotted line), equal to (dashed line), or weaker than (solid line) eye-based grouping at 0° IIPD. See the last paragraph of the Introduction for details concerning the different hypotheses.

Figure 2

Visual representations of hypotheses. Figure 2 shows hypothesized outcomes of our experiment, depending on the question of whether image-based grouping has a larger spatial extent to eye-based grouping (A and B) or whether both types of grouping have a similar spatial extent (C and D). Regardless of spatial extent, the relative strengths of eye- and image-based grouping for immediately abutting image parts are unknown, so each panel shows predictions for three scenarios: image-based grouping at an IIPD of 0° is either stronger than (dotted line), equal to (dashed line), or weaker than (solid line) eye-based grouping at 0° IIPD. See the last paragraph of the Introduction for details concerning the different hypotheses.

A total of nine observers (five male and four female, mean age: 23.4, SD = 3.1) participated in the experiment. All observers had normal or corrected-to-normal vision and experienced binocular rivalry switches as measured during a pretest with dichoptic gratings (see Pretest: Binocular rivalry alternations under Stimuli and procedure). The observers were naive to the purpose of the study and received either a monetary reward or course credits. All observers gave informed consent before conducting the experiment. This research was conducted in line with the principles expressed in the Declaration of Helsinki.

Apparatus

Stimuli were created on an Apple Mac Pro computer running system OS-X and Matlab 2013a with the Psychophysics Toolbox extensions (Brainard, 1997; Kleiner, Brainard, & Pelli, 2007; Pelli, 1997). The stimuli were presented on a linearized LaCie III 22-in. CRT at 120 Hz. Observers viewed the stimuli through a mirror stereoscope. The total viewing distance was 57 cm.

Stimuli and procedure

Pretest: Binocular rivalry alternations

Prior to the main experiment, observers were tested for binocular rivalry alternations using a binocular rivalry tracking paradigm. The stimuli consisted of two dichoptically presented, orthogonal, diagonal sine-wave gratings (1.6 c/°, radius: 0.95°, 0.75% Michelson contrast). The gratings were presented within a gray square (1.19° by 1.19°, luminance: 24.83 cd/m2) surrounded by a white outline (line width: 0.12°, luminance: 49.46 cd/m2). Observers performed two 60-s trials such that each orientation was presented once to each eye. Observers indicated the orientation of the dominant grating by continuously pressing either the left or the right arrow key. The observers were instructed to refrain from pressing a key when they experienced an unclear (or piecemeal) percept. The tracking data were used to calculate the percentage of time resulting in exclusive dominance for each eye. To be included in the experiment, the percentages of exclusive dominance for each eye of an observer needed to fall between 30% and 70%. No observers were excluded from the experiment.

Pretest: Flicker fusion

Because image-based grouping tends to be relatively weak (Stuit et al., 2011, 2014) and color may strengthen image-based grouping during rivalry (Knapen et al., 2007; van Lier & de Weert, 2003; Vergeer & van Lier, 2010), we decided to use color conflict in addition to orientation conflict in our main experiment. To reduce potential biases based on color, observers performed a flicker-fusion paradigm to set two colors to perceived equiluminance. The flicker-fusion paradigm consisted of a colored disk (4° by 4°), and the color of the disk alternated between cyan (CIExy = 0.3204, 0.5955) and magenta (CIExy = 0.6126, 0.3116) at a rate of 60 Hz. The magenta disk was held constant at 12.72 cd/m2. Observers adjusted the luminance of the cyan disk using the up and down arrow keys until the perceived flickering between both colors was perceived as minimal. This procedure was repeated five times for each observer, and each observer's average luminance value for cyan was used in the main experiment.

Main experiment

The rival images for the main experiment (Figure 1) each contained two image parts. The image parts consisted of sine-wave gratings (1.6 c/°). To initiate rivalry, the corresponding image parts had different orientations and colors. Horizontal gratings had a cyan color (CIExy = 0.3204, 0.5955; luminance was observer-specific), and vertical gratings had a magenta color (CIExy = 0.6126, 0.3116; space-average luminance: 6.25 cd/m2). The rivaling gratings were presented in circular apertures (0.72° radius) and were displayed on a gray background (luminance: 24.83 cd/m2) with a central fixation point (0.44° diameter). The small-aperture radius was chosen to minimalize piecemeal rivalry within an aperture (Blake et al., 1992). The image parts were presented to the left of fixation in one half of the trials and to the right of fixation in the other half of the trials. The apertures were displayed within an 8.7° by 8.7° gray square surrounded by a white frame (luminance: 49.46 cd/m2, line width 0.2°). To aid fusion, the frames were surrounded by a white noise edge (1° thick, 98% Michelson contrast, space-average luminance: 24.83 cd/m2) that was identical for both eyes.

Of critical importance to our design, matching image parts were either presented to the same eye (Figure 1A) or to different eyes (Figure 1B). The former of these conditions allowed us to assess the balance between percepts boosted by both image-based grouping as well as eye-based grouping (simultaneous dominance of matching image parts, each presented to the same eye) and percepts supported by neither (simultaneous dominance of nonmatching image parts, presented to different eyes). The latter condition allowed us to assess the balance between percepts that benefited from image-based grouping (simultaneous dominance of matching image parts, albeit presented to different eyes) and percepts that benefited from eye-based grouping (simultaneous dominance of image parts, presented to the same eye but not matching in image content).

The IIPD between image parts used in the experiment were 0°, 1°, 2°, 3°, 5°, or 6°. All combinations of conditions (IIPD; within eye vs. between eyes matching image parts; left vs. right hemifield) occurred equally often and were presented in a randomized order per block of 48 1-min trials. Each participant performed two blocks of trials, each on a different day, resulting in a total of 96 trials. The experiment was self-paced, and observers initiated trials by pressing the space bar. Observers were allowed to take short breaks between trials. Each block lasted between 60 and 75 min. Observers tracked the dominance of both the upper and lower image part simultaneously in a two-alternative, forced-choice rivalry tracking paradigm. Specifically, observers pressed the up arrow key when the upper image part was perceived as completely or mostly magenta and the down arrow key when the lower image part was perceived as completely or mostly magenta. The absence of a key press was interpreted as a completely or mostly dominant cyan image part. These response possibilities were chosen to keep the task relatively easy.

Results

Before analyzing the effect of IIPD on binocular rivalry grouping, we tested if epoch durations were affected by factors beyond our main interest, namely color and hemifield. Note that the hemifield of presentation may affect the temporal dynamics of binocular rivalry (Chen & He, 2003; Piazza & Silver, 2014). To test for such influences, we compared the median epoch durations, pooled over all IIPDs, using a color by hemifield repeated-measures ANOVA. Results show no evidence for a main effect of color, F(1, 8) = 3.851, p = 0.085, ηp2 = 0.325, or hemifield, F(1, 8) = 0.036, p = 0. 854, ηp2 = 0.004, nor an interaction between the two, F(1, 8) = 1.848, p = 0.211, ηp2 = 0.188. Because basic temporal dynamics showed no evidence of being affected by color or hemifield of presentation, we pooled the tracking results for different colors and hemifields. Next, for each observer, we extracted the median epoch durations for perceiving matching and mismatching image parts for both conditions (i.e., the one in which matching image parts were presented to the same eye and the one in which they were not). Our main interest, then, is comparing the median duration of periods in which matching image parts are perceived and the median duration of periods in which mismatching image parts are perceived. Note that, in the first condition (Figure 1A), perception of matching image parts is facilitated by image-based grouping as well as eye-based grouping whereas in the second condition (Figure 1B) image-based grouping supports perception of matching image parts and eye-based grouping supports perception of mismatching image parts.

In Figure 3, we—for the sake of completeness—show the median durations of the matching and mismatching percepts separately, averaged across observers. Subsequently, in Figure 4A and 4B, we evaluate the hypotheses we sketched in Figure 2 by evaluating the difference between those two median durations, which is a measure of the relative predominance of the matching percept.

Average median grouping durations as a function of IIPD. (A) The average median grouping durations (y-axis) over IIPD (x-axis) for perceiving matched (solid curve, combined effect of image- and eye-based grouping) and mismatched (dashed curve, neither eye- nor image-based grouping) image parts when matching image parts are presented to the same eye. Note that the separation between the curves decreases with IIPD, suggesting a decrease in the effectiveness of grouping over larger separations between image parts. (B) The average median grouping durations of matched (solid curve, image-based grouping) and mismatched (dashed curve, eye-based grouping) image parts (y-axis) over IIPD (x-axis) when matching image parts are presented to different eyes. Error bars for both panels indicate standard errors of the mean.

Figure 3

Average median grouping durations as a function of IIPD. (A) The average median grouping durations (y-axis) over IIPD (x-axis) for perceiving matched (solid curve, combined effect of image- and eye-based grouping) and mismatched (dashed curve, neither eye- nor image-based grouping) image parts when matching image parts are presented to the same eye. Note that the separation between the curves decreases with IIPD, suggesting a decrease in the effectiveness of grouping over larger separations between image parts. (B) The average median grouping durations of matched (solid curve, image-based grouping) and mismatched (dashed curve, eye-based grouping) image parts (y-axis) over IIPD (x-axis) when matching image parts are presented to different eyes. Error bars for both panels indicate standard errors of the mean.

Average duration differences as a function of IIPD. To differentiate between the predictions made in the Introduction, we converted the median epoch durations to difference scores. The black curves show the averaged results, when matching image parts were presented to the same eye (A) and when they were presented to different eyes (B). For illustrative purposes, the best corresponding hypothesized curves (orange) based on the outcomes of the statistical analyses from Figure 2C and 2D were scaled and added to the figure.

Figure 4

Average duration differences as a function of IIPD. To differentiate between the predictions made in the Introduction, we converted the median epoch durations to difference scores. The black curves show the averaged results, when matching image parts were presented to the same eye (A) and when they were presented to different eyes (B). For illustrative purposes, the best corresponding hypothesized curves (orange) based on the outcomes of the statistical analyses from Figure 2C and 2D were scaled and added to the figure.

Figure 3A shows the average median grouping durations when matching image parts are presented to the same eye. In this condition, eye- and image-based grouping facilitate the same dominant percepts. The data are separated between perceiving matching (solid curve) and mismatching image parts (dashed curve). The median grouping durations for perceiving matching and mismatching image parts in our second condition, in which matching image parts were presented to different eyes, are show in Panel B (solid curve and dashed curve, respectively). Note that in this condition eye- and image-based grouping facilitate opposite percepts.

Figure 3A clearly suggests a decline over IIPD of the joint impact of both eye-based and image-based grouping cues working together (see below for a rigorous test of this suggestion). In particular, percepts consisting of matching image parts, which are facilitated by both types of grouping, appear to last particularly long for closely spaced image parts, but this preference for perceiving matching image parts declines with IIPD. Conversely, the alternative percept of nonmatching image parts, supported by neither type of grouping in this condition, appears to become more prominent as IIPD increases. This is consistent with the prediction that grouping cues lose their influence at large spatial separations because such a declining influence would benefit this percept as it is counteracted by both grouping cues. Figure 3B, on the other hand, shows no indication that the balance between eye-based and image-based factors shifts with IIPD when the two factors are pitted against each other as was predicted in Figure 2B. To evaluate these impressions more formally and to facilitate comparison to Figure 2, Figure 4A and 4B show, for both panels of Figure 3, the difference between the two curves shown in the panel. In other words, for both presentation conditions, we calculated the balance between both types of percepts by subtracting the median durations for perceiving mismatching image parts from the median durations for perceiving matching image parts. In the presentation condition in which matching image parts were presented to the same eye, this resulted in a measure of the combined strength of image-based grouping and eye-based grouping (Figure 4A; cf. Figure 2A and 2C); in the presentation condition in which matching image parts were shown to different eyes, this resulted in a measure of the balance between image-based grouping and eye-based grouping with positive values corresponding to stronger image-based grouping (Figure 4B; cf. Figure 2B and 2D).

In our statistical analysis of the data of Figure 4A and 4B, we first tested the basic assumption that the combined benefit of eye- and image-based grouping cues, as compared to no grouping cues at all, decreases with increasing IIPD. Note that this should be the case irrespective of the relative spatial extents of eye- and image-based grouping (cf. Figure 2). Indeed, there was a significant negative correlation between the median duration difference depicted in Figure 4A and IIPD (r = −0.9211, p < 0.0091). This confirms that, generally, grouping cues are more effective across shorter spatial separations. We next tested our critical prediction regarding the condition in which matching image parts were presented to different eyes. There, the influence of IIPD on the balance between image-based grouping and eye-based grouping is indicative of the relative spatial extents of the two kinds of grouping. In particular, as noted in the Introduction, if image-based grouping operates over a larger spatial extent than eye-based grouping, then this should result in a peak at an intermediate IIPD in Figure 4B. This would mean that observers are particularly inclined to report the percept that is facilitated by image-based grouping at an intermediate IIPD, relative to 0° IIPD at which both image- and eye-based grouping should be at their strongest (Blake et al., 1992) even though it is counteracted by eye-based grouping.

We investigated whether image-based grouping during binocular rivalry shows parametric properties that suggest a neural substrate later in the processing hierarchy compared to eye-based grouping. Specifically, we tested if image-based grouping during binocular rivalry is possible at greater IIPDs than eye-based grouping. If binocular rivalry competition, including its effects on grouping, occurs at multiple levels of the visual processing hierarchy (Blake, 1989; Logothetis et al., 1996; Silver & Logothetis, 2007; Tong & Engel, 2001; Tong et al., 2006; Wilson, 2003) yet monocular information is mostly lost after early visual processing (Barendregt et al., 2015; Hubel & Wiesel, 1962, 1974), then image-based grouping is expected to be relatively stronger at larger IIPDs due to the increase of receptive field size throughout the visual hierarchy (Amano et al., 2009; Dumoulin & Wandell, 2008; Harvey & Dumoulin, 2011; Smith et al., 2001). In contrast, our results provide no evidence that these contributors to binocular rivalry grouping differ in their relationship with IIPD and generally support the null hypothesis that such a difference is absent.

Out of our Bayesian t tests, the only one that did not provide substantial evidence for the null hypothesis was the one that compared an IIPD of 0° to the largest IIPD measured: 6°. In principle, this leaves open the possibility that image-based grouping specifically outweighs eye-based grouping at particularly large IIPDs of 6° and, perhaps, beyond. Although we cannot definitively rule out this possibility, it should be noted that the results from our condition in which image-based grouping and eye-based grouping work together (Figure 4A) indicate that both types of grouping have only a very weak influence at such large spatial separations to begin with (i.e., even when they work together, they hardly bring about a preference for the percept they both support). In other words, the inconclusive result for an IIPD of 6° hinges on a comparison between two grouping effects that, at that IIPD, are both close to absent themselves, suggesting that we should not base strong conclusions on this inconclusive result. Having said that, a conclusive answer with regard to this point may be provided by future work that extends the range of IIPDs to larger values than the ones considered here.

The reason our stimuli included colors that could match or mismatch, in addition to orientations that could match or mismatch, is that we required a substantial influence of image-based grouping as a precondition to compare the influences of IIPD between eye- and image-based grouping effects. In existing work, eye-based grouping effects have generally been strong (Stuit et al., 2011, 2014), but image-based grouping can be weak when based on orientation alone (Stuit et al., 2011). We therefore added color because this is known to further promote image-based grouping (van Lier & de Weert, 2003; Vergeer & van Lier, 2010), similar to the effect of adding other image cues, such as motion coherence (Holten, Stuit, Verstraten, & van der Smagt, 2016). To keep our behavioral task feasible, observers reported only one of the feature dimensions, namely color, while leaving orientation unreported. Given that color and form can sometimes alternate independently during binocular rivalry between stimuli that differ in both these feature dimensions (Breese, 1909; Creed, 1935), one might be concerned that our results are mainly informative regarding rivalry between different colors and that perceived orientations were not what we inferred them to be on the basis of the color reports. However, such perceptual dissociations of color and form during rivalry are rare (Hollins & Leung, 1978) and appear to require specific paradigms to become apparent (Holmes, Hancock, & Andrews, 2006). To our knowledge, it has never been reported for standard rivalry designs such as ours, rendering us confident that the color reports employed in our paradigm are also a reliable proxy for perceived orientation.

Although our main finding is concerned with the influence of IIPD on perceptual grouping, the contributions of eye- and image-based grouping, irrespective of IIPD, were roughly equal in our experiment. In particular, our data are most similar to the prediction shown by the dashed curve of Figure 2D, by which the balance between image-based factors and eye-based factors is not only constant with IIPD, but also close to perfect. In previous work, eye-based grouping was much stronger compared to image-based grouping (Holten et al., 2016; Stuit et al., 2011, 2014). The difference can plausibly be explained by the fact that our stimuli included color to facilitate image-based grouping as discussed in the previous paragraph. In hindsight, it is not unreasonable that this addition of color would bring image-based grouping on par with eye-based grouping in our study. In particular, previous findings suggest that the effect sizes of color-based intraocular grouping and cardinal orientation-based intraocular grouping are similar (see figure 3 in Vergeer & van Lier, 2010). If we combine this fact with the observation that the relative strength of cardinal orientation-based grouping is roughly half the strength of eye-based grouping (see figure 6 in Stuit et al., 2011), then it is not unreasonable that orientation-based grouping and color-based grouping combined may add up to roughly the same strength as eye-based grouping in our study.

The current results fit reasonably well with the interactions seen in the hybrid rivalry model proposed by Tong et al. (2006). They stated that neural bases of rivalry involve a hierarchical network of excitatory and inhibitory mechanisms that extends across both monocular and binocular processing stages. Note that the model is based on binocular rivalry competition for two isolated dichoptic images. Still, the same model can be applied to the dynamics of binocular rivalry grouping. The only aspect of the model that may not fit well with binocular rivalry grouping results is the proposed feedback from higher levels of processing (also see de Weert, Snoeren, & Koning, 2005). For one, if image-based grouping as observed in our experiment were a result of feedback from higher levels of processing, it would appear that the same predictions would hold as illustrated in Figure 2B, in which image-based grouping outweighs eye-based grouping at intermediate IIPDs. This is expected because receptive field sizes become larger for later visual areas (Amano et al., 2009; Dumoulin & Wandell, 2008; Harvey & Dumoulin, 2011; Smith et al., 2001) and larger receptive fields allow for more spatial integration during rivalry (Alais & Blake, 1999; Blake et al., 1992). Another relevant observation is that previous results show no evidence that grouping during rivalry is better for stimuli depicting face parts than for gratings (Stuit et al., 2014). In the case of such grouping between face parts, feedback from face-processing areas, which are thought to respond to faces holistically (Kanwisher, Tong, & Nakayama, 1998), may be expected to result in stronger grouping dominance for face parts compared to gratings. Work on grouping during rivalry, then, does not seem to provide evidence for a critical influence of feedback. This does not, of course, mean that feedback is altogether unimportant in binocular rivalry. For example, the influence of attention on binocular rivalry dynamics is well established (Brascamp & Blake, 2012; Lack, 1978; Ooi & He, 1999; Paffen & van der Stigchel, 2010). Regarding the current results, we argue that an influence of higher processing levels should be evidenced in a broader spatial profile regardless of whether these higher levels exert their influence via feedback.

It remains difficult to conclusively determine whether binocular rivalry grouping, like binocular rivalry competition, involves multiple levels of the visual processing hierarchy. Part of the difficulty may lie in the lack of specificity that this statement entails. In particular, a blanket statement that binocular rivalry involves multiple levels is consistent with many possible observations because it can imply several different implementations. For instance, distinct proposals that incorporate the idea of multiple levels being involved differ regarding whether different levels would be engaged by different stimuli (Alais & Parker, 2006; Wilson, 2003) or whether a given stimulus would engage several levels at the same time (Tong et al., 2006). Nevertheless, our present results add to a body of studies (Dong, Holm, & Bao, 2017; Holten et al., 2016; Stuit et al., 2011, 2014) that have separately assessed grouping based on eye-based cues and grouping based on cues that go beyond eye of origin (in these cases, image-based cues) and that all point to monocular information as critical (also see Quinn & Arnold, 2010). Perhaps, then, the multiple levels at which binocular rivalry grouping and, potentially, binocular rivalry competition more generally, can occur, are restricted to areas at which monocular information is available.

Schematic representation of the stimulus arrangement in our two conditions. To separate the influence of the eye of origin and image content on the perceptual grouping of image parts during binocular rivalry matching, image parts were either (a) presented to the same eye (A) or (b) presented to different eyes (B). The image parts were always presented at 3° from fixation although IIPD varied. IIPD refers to the distance between image parts from edge to edge. The IIPD was either 0°, 1°, 2°, 3°, 5°, or 6°. Furthermore, the image parts could be presented to either the right or the left of fixation.

Figure 1

Schematic representation of the stimulus arrangement in our two conditions. To separate the influence of the eye of origin and image content on the perceptual grouping of image parts during binocular rivalry matching, image parts were either (a) presented to the same eye (A) or (b) presented to different eyes (B). The image parts were always presented at 3° from fixation although IIPD varied. IIPD refers to the distance between image parts from edge to edge. The IIPD was either 0°, 1°, 2°, 3°, 5°, or 6°. Furthermore, the image parts could be presented to either the right or the left of fixation.

Visual representations of hypotheses. Figure 2 shows hypothesized outcomes of our experiment, depending on the question of whether image-based grouping has a larger spatial extent to eye-based grouping (A and B) or whether both types of grouping have a similar spatial extent (C and D). Regardless of spatial extent, the relative strengths of eye- and image-based grouping for immediately abutting image parts are unknown, so each panel shows predictions for three scenarios: image-based grouping at an IIPD of 0° is either stronger than (dotted line), equal to (dashed line), or weaker than (solid line) eye-based grouping at 0° IIPD. See the last paragraph of the Introduction for details concerning the different hypotheses.

Figure 2

Visual representations of hypotheses. Figure 2 shows hypothesized outcomes of our experiment, depending on the question of whether image-based grouping has a larger spatial extent to eye-based grouping (A and B) or whether both types of grouping have a similar spatial extent (C and D). Regardless of spatial extent, the relative strengths of eye- and image-based grouping for immediately abutting image parts are unknown, so each panel shows predictions for three scenarios: image-based grouping at an IIPD of 0° is either stronger than (dotted line), equal to (dashed line), or weaker than (solid line) eye-based grouping at 0° IIPD. See the last paragraph of the Introduction for details concerning the different hypotheses.

Average median grouping durations as a function of IIPD. (A) The average median grouping durations (y-axis) over IIPD (x-axis) for perceiving matched (solid curve, combined effect of image- and eye-based grouping) and mismatched (dashed curve, neither eye- nor image-based grouping) image parts when matching image parts are presented to the same eye. Note that the separation between the curves decreases with IIPD, suggesting a decrease in the effectiveness of grouping over larger separations between image parts. (B) The average median grouping durations of matched (solid curve, image-based grouping) and mismatched (dashed curve, eye-based grouping) image parts (y-axis) over IIPD (x-axis) when matching image parts are presented to different eyes. Error bars for both panels indicate standard errors of the mean.

Figure 3

Average median grouping durations as a function of IIPD. (A) The average median grouping durations (y-axis) over IIPD (x-axis) for perceiving matched (solid curve, combined effect of image- and eye-based grouping) and mismatched (dashed curve, neither eye- nor image-based grouping) image parts when matching image parts are presented to the same eye. Note that the separation between the curves decreases with IIPD, suggesting a decrease in the effectiveness of grouping over larger separations between image parts. (B) The average median grouping durations of matched (solid curve, image-based grouping) and mismatched (dashed curve, eye-based grouping) image parts (y-axis) over IIPD (x-axis) when matching image parts are presented to different eyes. Error bars for both panels indicate standard errors of the mean.

Average duration differences as a function of IIPD. To differentiate between the predictions made in the Introduction, we converted the median epoch durations to difference scores. The black curves show the averaged results, when matching image parts were presented to the same eye (A) and when they were presented to different eyes (B). For illustrative purposes, the best corresponding hypothesized curves (orange) based on the outcomes of the statistical analyses from Figure 2C and 2D were scaled and added to the figure.

Figure 4

Average duration differences as a function of IIPD. To differentiate between the predictions made in the Introduction, we converted the median epoch durations to difference scores. The black curves show the averaged results, when matching image parts were presented to the same eye (A) and when they were presented to different eyes (B). For illustrative purposes, the best corresponding hypothesized curves (orange) based on the outcomes of the statistical analyses from Figure 2C and 2D were scaled and added to the figure.