AbstractWe study the dynamics of perceptual switching in ambiguous visual scenes that admit more than two interpretations/percepts to gain insight into the dynamics of perceptual multistability and its underlying neural mechanisms. We focus on visual plaids that are tristable and we present both experimental and computational results. We develop a firing-rate model based on mutual inhibition and adaptation that involves stochastic dynamics of multiple-attractor systems. The model can account for the dynamic properties (transition probabilities, distributions of percept durations, etc.) observed in the experiments. Noise and adaptation have both been shown to play roles in the dynamics of bistable perception. Here, tristable perception allows us to specify the roles of noise and adaptation in our model. Noise is critical in considering the time of a switch. On the other hand, adaptation mechanisms are critical in considering perceptual choice (in tristable perception, each time a percept ends, there is a possible choice between two new percepts).

Introduction

When observers view for an extended time an ambiguous visual scene (admitting two or more different interpretations), they report spontaneous switching between different perceptions. Typical examples of perceptual bistability (two interpretations) and spontaneous switching in visual perception include binocular rivalry (alternation of two different images, one presented to each eye), ambiguous geometric figures such as the Necker cube (alternation of two depth organizations), ambiguous figure–ground segregation such as Rubin's vase-face (alternation of two figure–ground organizations), ambiguous motion displays (alterations of two arrangements of moving objects), and more (for reviews, see Leopold & Logothetis, 1999; Blake & Logothetis, 2002; Long & Toppino, 2004).

Experimental results (imaging in humans and electrophysiology in monkeys) have revealed neuronal activity that correlates with the subject's perception in the visual cortex as well as the parietal and frontal cortex (Leopold & Logothetis, 1999; Tong & Engel, 2001; Sterzer & Kleinschmidt, 2007). Although the neuronal bases of perception for ambiguous stimuli are still controversial (see Sterzer, Kleinschmidt, & Rees, 2009, for a review), those observations inspired the current formal models of multistability. The models try to explain as much as possible of the variability of perceptual dynamics with simple mechanisms that could be implemented in relatively low-level sensory systems (like the visual cortex for visual multistability).

The existing models for perceptual multistability focus on bistable rivalry (Lago-Fernández & Deco, 2002; Laing & Chow, 2002; Wilson, 2003; Moreno-Bote, Rinzel, & Rubin, 2007). In these models, the mechanism underlying the alternating rhythmic behavior involves competition between two neuronal populations (whose activity is correlated to a particular percept) via reciprocal inhibition and some form of slow adaptation or negative feedback acting on the dominant population (spike frequency adaptation and/or synaptic depression). Noise is added to the system to account for the irregular oscillations and, in some models, to become the essential driving force for the switching mechanism (Moreno-Bote et al., 2007; Shpiro, Moreno-Bote, Rubin, & Rinzel, 2009).

The roles of slow adaptation and neuronal noise in bistable rivalry have been extensively studied. It is widely accepted that both elements are involved in rivalry; the discussion focuses on the balance between the two (Brascamp, van Ee, Noest, Jacobs, & van den Berg, 2006; Moreno-Bote et al., 2007; Shpiro et al., 2009). In oscillatory models, slow adaptation is ultimately responsible for alternations, while in noise-driven attractor models, noise drives switching in a winner-take-all framework. To assess the roles of noise and adaptation, the models try to conform to experimental data on dominance durations (averages, histogram shapes, correlations between successive durations, etc.) and some well-known principles of binocular rivalry known as Levelt's propositions, especially Propositions 2 and 4 (Levelt, 1968). For those models, the system should operate near the boundary between being adaptation driven and noise driven (Shpiro et al., 2009; Pastukhov et al., 2013). Here we study tristability to further constrain these models, and we find that adaptation and noise not only are both important but also play different roles.

Though the dynamics of bistable rivalry have been extensively studied, attempts to generalize these models to stimuli with more than two competing percepts are scarce in the literature. However, unlike in bistable rivalry, where only temporal patterns are informative, in multistable rivalry with more than two percepts, differential transition patterns provide more insight into the plausible mechanisms that generate perceptual multistability (Burton, 2002; Suzuki & Grabowecky, 2002; Naber, Gruenhage, & Einhauser, 2010; Wallis & Ringelhan, 2013). Moreover, given that sensory information can have multiple interpretations, moving from bistability to multistability is a necessary step in achieving understanding about how the brain deals with ambiguity.

To study multistable rivalry, we focus on a classical paradigmatic stimulus, called visual plaids, consisting of two superimposed drifting gratings (Wallach, 1935; Hupé & Rubin, 2003; for a demonstration, visit http://cerco.ups-tlse.fr/∼hupe/plaid_demo/demo_plaids.html). With visual plaids, tristable perception is experienced (see Figure 1): one coherent or integrated percept (the gratings moving together as a single pattern) and two transparent or segregated percepts (the gratings sliding across one another) with alternating depth order (which grating is perceived as foreground and which as background; Rubin & Hupé, 2005; Moreno-Bote, Shpiro, Rinzel, & Rubin, 2008; Naber et al., 2010; Hupé & Pressnitzer, 2012).

(A) Visual plaids consist of two superimposed gratings whose normal vectors differ by an angle α (VP). Representation of different interpretations for visual plaids: coherent motion (C) and transparent motion (T). Transparent motion is ambiguous with respect to depth ordering and admits two different interpretations: with the grating moving to the left perceived on top (TL) and with the grating moving to the right perceived on top (TR). (B) Tristability refers to coherent (C), transparent left (TL), and transparent right (TR) percepts, which we identify with the colors red, blue, and green, respectively.

Figure 1

(A) Visual plaids consist of two superimposed gratings whose normal vectors differ by an angle α (VP). Representation of different interpretations for visual plaids: coherent motion (C) and transparent motion (T). Transparent motion is ambiguous with respect to depth ordering and admits two different interpretations: with the grating moving to the left perceived on top (TL) and with the grating moving to the right perceived on top (TR). (B) Tristability refers to coherent (C), transparent left (TL), and transparent right (TR) percepts, which we identify with the colors red, blue, and green, respectively.

Here we present experimental data from psychophysics on tristable plaids and a computational model that specifies quantitatively current hypotheses of perceptual switching to reproduce experimental observations.

For bistable stimuli, the effects of varying parameters of the stimulus are typically described by the changes on the dominance durations (the period of time a percept stays active). But for tristable stimuli, one can also look at the effect on percept probabilities (fraction of percept occurrences) and their relation to dominance durations (Naber et al., 2010). In the case of visual plaids, there are several parameters of the stimulus that can be modified: speed, spatial frequency, contrast, directions of motion, and so on (Adelson & Movshon, 1982; Hupé & Rubin, 2003; Moreno-Bote, Shpiro, Rinzel, & Rubin, 2008; Hedges, Stocker, & Simoncelli, 2011). In this study, we investigated a highly constrained set of parameter conditions—only three stimuli (corresponding to three different values of the angle α between the normal vectors to the gratings) and a single motion direction (see Figure 1)—but repeated many times, in order to gather for each subject enough perceptual sequences collected within the exact same conditions. In agreement with previous experiments (Naber et al., 2010), we observe that changes in α produce changes in both dominance durations and percept probabilities.

By examining triplets consisting of two transparent percepts interleaved with a coherent one, we find that the next percept probability depends on the durations of the current and the previous percept. These relationships are newfound with respect to bistable stimuli, where correlations could be measured only between dominance durations. These correlations were reported as absent or insignificant for bistability (although see, for example, van Ee, 2009; Pastukhov & Braun, 2011), as we also find for our tristable stimuli. Thus, our results showing that percept choice but not percept duration depends on recent perceptual history suggest that adaptation and noise are involved in different aspects of perceptual switching.

We test the roles of adaptation and noise in a computational model that is based on the firing-rate models for alternations in perceptual bistability (Laing & Chow, 2002; Moreno-Bote et al., 2007). Our model consists of three mutually coupled populations of cells, each one encoding a different percept. We propose inhibition-based competition along with adaptation and noise as plausible mechanisms for the dynamics of perceptual switching. Importantly, optimal parameters are obtained for a noise-driven regime, suggesting that noise is the ultimate cause of perceptual switching. However, slow adaptation, in particular subtractive adaptation, is essential in accounting for the decrease in the probability of a switch back after a short duration, suggesting that adaptation is important for perceptual choice.

Methods

Psychophysical experiment

Observers

Nine observers participated in the experiment. They had normal or corrected-to-normal eyesight and gave informed consent for their participation. Data are presented for eight subjects (see “Data analysis” later; average age = 26, range = [20, 46]; four women and four men).

Apparatus

We presented stimuli on a 21-in. Sony Trinitron GDM-F520 monitor (30.4 cm vertical viewable screen size) at a frame rate of 85 Hz. The screen resolution was 1600×1200 pixels. Subjects were comfortably seated 57 cm in front of the screen in a dimly lit room, with their chin and forehead resting on a chinrest (University of Houston College of Optometry, Houston, TX). Two small cameras were attached to the chinrest just above the eyes and looked at the eyes through semitransparent mirrors. Eye position (difference between the pupil and corneal reflection centers) and pupil diameter were recorded binocularly at 240 Hz by using an ISCAN ETL-200 system (Burlington, MA). The experimenter (Marie Fellmann, as training for her first-year master's thesis) was present in the room to verify the quality of the eye signals. Off-line visual inspection of eye positions revealed that all subjects were maintaining accurate fixation.

Stimuli

The stimuli comprised two rectangular-wave gratings presented through a circular aperture 6° in radius. The luminance of the gray surround was 24 cd/m2 (20% of the maximal luminance of the screen). The gratings comprised thin dark stripes (14 cd/m2, duty cycle = 0.3, spatial frequency = 0.3 c/°) on a lighter background (28 cd/m2) and appeared as figures moving over the background. The intersecting regions were darker than the gratings (11 cd/m2, in the middle of the transparency range). Which grating was in front was ambiguous. Gratings moved at 1.5°/s (measured in the direction normal to their orientation) in directions 80°, 100°, and 120° apart (angle α hereafter). The pattern was moving upwards when perceived as coherent. A red fixation point over a circular gray mask with a radius of 1° was added in the middle of the circular aperture to minimize optokinetic nystagmus, and subjects were instructed to fixate this point throughout the stimulus presentation.

Experimental procedure

Subjects were first familiarized with the stimuli and procedure. They had to continuously report their percept with a three-button mouse, indicating whether they perceived coherent upward motion (middle mouse button) or transparent motion with the rightward (right button) or the leftward (left button) grating moving in front. They were instructed to passively report the percepts, without trying to influence them. If they were unsure about the percept, they were asked to press no button. They did not use this option at all (except one subject, who pressed no button for less than 5% of the time on average). There were 10 repetitions of each stimulus (three possible angle values). Presentation time was 180 s. Each subject viewed a total of 30 stimuli, distributed in three sessions of 10 stimuli (with counterbalanced angle values, same order for all subjects) performed on different days. Each stimulus was separated by a 30-s series of 15 plaids moving for 2 s in different directions (to counteract adaptation effects): Plaid directions were either downwards or oblique, angle α was 100° or 160°, and grating speed was either 1°/s or 3°/s (other parameters were the same as for the main stimuli); so subjects experienced both coherent and transparent motion in varied directions. The order of presentation was random.

We chose the parameters in attempting to collect data within critical ranges where percept proportions (fraction of time a percept was reported) are similar, considering either bistability between coherence and transparency, or tristability. These equidominance conditions were approximately obtained across subjects (Table 1) for α = 100° (about 50% coherence) and α = 120° (where the three percepts were reported about a third of the time each).

Average percentage of the time each percept was reported. The percentages were computed in each 3-min trial (N = number of trials), starting from the first report of a transparent percept as in the study by Hupé and Rubin (2003). Numbers represent the average (and the range of values obtained across subjects) for the nine subjects (no trial or subject was excluded; “missing” trials were the few trials interrupted by the participants). Average of “no response” was −0.5% (range: −2.3% to 4.2%), negative sign corresponding mostly to button-press overlap.

Table 1

Average percentage of the time each percept was reported. The percentages were computed in each 3-min trial (N = number of trials), starting from the first report of a transparent percept as in the study by Hupé and Rubin (2003). Numbers represent the average (and the range of values obtained across subjects) for the nine subjects (no trial or subject was excluded; “missing” trials were the few trials interrupted by the participants). Average of “no response” was −0.5% (range: −2.3% to 4.2%), negative sign corresponding mostly to button-press overlap.

C

TL

TR

Average (nine subjects, N = 265)

50 (42–59)

26 (20–31)

24 (18–30)

α = 80 (N = 87)

68 (60–83)

17 (6–23)

16 (10–21)

α = 100 (N = 88)

53 (43–61)

24 (18–30)

23 (16–30)

α = 120 (N = 90)

30 (13–44)

37 (29–45)

34 (25–39)

No other parameter was manipulated, in order to collect as much empirical data as possible for a single condition. Such a constraint is paramount in obtaining reliable estimates to which the model can be fit. Multistable perception is highly stochastic even though global statistics are constant with everything else being equal, requiring the collection of many data points for each subject. Any parametric change, even one as subtle as the motion direction of the stimulus, does change the balance between the different percepts (Hupé & Rubin, 2004), which would translate in the model to a change of inputs.

Input level, strength of inhibition and adaptation, and noise level are all arbitrary parameter values in the model that are meaningful only relative to each other. In order to measure within the model the relative roles of adaptation and noise, it is necessary to keep the input constant and fit the model to empirical data obtained with that constant input. Otherwise, we would have one degree of freedom too many. Although such a strong constraint could limit the generalization of our model, we emphasize here that a primary goal is to identify how each element of the model accounts for the statistical features of the experimental data. We will address this question in the Discussion.

Data analysis

The dominance durations were measured between successive presses and releases of the mouse buttons. We also computed the durations between successive presses of different mouse buttons (unless no button was pressed during more than 500 ms). Both methods gave very similar results. The latter method had the advantage of avoiding overlap between percepts (perceptual transitions were so fast most of the time that subjects often pressed a button a few tens of milliseconds before releasing the other button). This procedure considered successive presses of the same button as indicating a single percept, as long as the interruption was less than 500 ms (only one subject had longer interruptions, in 18 cases and for a maximum 2.3 s, average = 940 ms). The duration of the last interrupted percept was not computed. The first percept was coherent in all but four trials and lasted longer than successive coherent percepts, as expected (Hupé & Rubin, 2003). It was not included in the analyses. Percept durations were stable over time, as observed in previous studies (Rubin & Hupé, 2005). For each trial, the proportion of coherent percept was computed from the first report of a transparent percept to the end of the trial, as in work by Hupé and Rubin (2003). A trial was considered as truly multistable if this proportion was between 20% and 80% of the time. Out of 235 trials (eight subjects, see later), 208 met this arbitrary, conservative criterion and were included in the analyses (respectively 63, 77, and 68 trials for α = 80°, 100°, and 120°).

In order to estimate the dominance duration of each percept for each stimulus, we considered three sources of variability: variability within and between trials, reflecting stochastic variability as well as fluctuations of attention and fatigue, and variability between subjects. Within- and between-trials variabilities were either pooled or computed separately. In the first case, the dependent variable was the log-transform of each individual percept duration expressed in milliseconds. Independent variables were percept type, α, and subject (considered as a random variable). The analysis of residuals of the ANOVA confirmed that the residuals were normally distributed, validating the log transformation (Hupé & Rubin, 2003). Eleven percepts lasting less than 200 ms were strong outliers and were removed. These very short button presses were likely due to errors. In the second case, the dependent variable was the median duration of each percept computed for each trial. Results were very similar with both analyses. In order to compare precisely the duration distributions of the data and model (see Results), dominance durations were divided by the median duration of each percept type for each subject and α value (across-trials median). Such normalization is similar to what has been performed classically at least since work by Logothetis, Leopold, and Sheinberg (1996), except that we used the median rather than the mean duration, which is an unreliable summary statistic for highly skewed distributions. All analyses presented here were also computed independently for each subject.

The patterns of results were very similar, unless otherwise indicated, except for one subject. For this subject, the relationship between intermediate coherent percept duration and switch-back proportion (see Results) showed an opposite trend. Moreover, this subject had a high probability of consecutive transparent percepts and did not show an above-chance probability of transition to a coherent percept like all other subjects (see Results); the average duration of his transparent percepts was especially short (1.5 s, while it was between 3 and 6 s for the other subjects). His data were therefore excluded from all analyses, since we do not know if he truly experienced higher-than-average depth-ordering switches or if he had, for example, some difficulty reporting which grating was in front. His quality of fixation was as good as that of the other subjects. It should therefore be kept in mind that the present model only accounts for the data of the eight typical subjects. If fast alternations of depth ordering should be observed in other subjects, together with other characteristics similar to those of this atypical subject and different from those of the majority of subjects, the perceptual dynamics for these subjects should be estimated and accounted for by the model.

Model formulation and simulation

Neuronal model with adaptation

In this section we present a rate-based model (Laing & Chow, 2002; Wilson, 2003) for the architecture in Figure 2. The model consists of three populations that encode the three different percepts: coherent (C), transparent with the left grating on top (TL), and transparent with the right grating on top (TR); see Figure 1. The activity of each population is described by its mean firing rate ri, for i = C, TL, TR. For simplicity, the firing rates are dimensionless, normalized by their maximum firing rate, so that 0 ≤ ri ≤ 1. The three populations compete through direct cross-inhibition (each population inhibits the other two through direct connections). We use different inhibition strengths between the transparent and the coherent percepts than between the two transparent percepts. Moreover, firing-rate adaptation is used as a slow negative feedback. The evolution of the population firing rates ri is determined by the following system of differential equations:

Network architecture for the neuronal competition model with direct mutual inhibition. Each population activity is correlated to a different percept: coherent (C), transparent right (TR), or transparent left (TL). Each population receives an excitatory deterministic input of strength Ii and independent noise ni. Spike-frequency adaptation is present in each population. Lines with circles represent inhibitory connections of strength βi between the three competing populations.

Figure 2

Network architecture for the neuronal competition model with direct mutual inhibition. Each population activity is correlated to a different percept: coherent (C), transparent right (TR), or transparent left (TL). Each population receives an excitatory deterministic input of strength Ii and independent noise ni. Spike-frequency adaptation is present in each population. Lines with circles represent inhibitory connections of strength βi between the three competing populations.

where τs = 200 ms, σ = 0.08, and ξ(t) is a white-noise process with zero mean and 〈 ξ(t)ξ(t′) 〉 = δ(t – t′), that is, no temporal correlations. The first and second terms in Equation 4 correspond to the drift and diffusion functions, respectively. The noise ni(t) has a Gaussian stationary probability distribution with zero mean and standard deviation σ and is temporally correlated (with decorrelation time τs). The noise terms ni(t) for i = C, TR, TL are taken to be independent.

The set of parameters has been chosen so that the deterministic system (σ = 0 in Equation 4) is in a winner-take-all regime. That is, one population has higher activity than the other two. The adaptation is too weak to make the system oscillate. Once noise is added to the system (σ ≠ 0 in Equation 4), the dynamics produce transitions between different states. The noisy fluctuations can induce switching, even in the absence of adaptation.

Neuronal model with synaptic depression

Synaptic depression can be used as an alternative mechanism to spike-frequency adaptation for the slow negative feedback. The formulation of the model is described by the following system of differential equations:

where si is the variable controlling synaptic depression of the inhibitory synapse from population ri. We model it as

where τs = 2500 ms and φ is a parameter controlling the strength of si that we vary in our simulations. The terms Ii and ni are, as in the adaptation case, the external input and noise for population i, respectively.

Numerical procedures

The differential equations are integrated using the Euler–Maruyama method for stochastic differential equations (Higham, 2001) with time step Δt = 1 ms. The simulation for each condition involved integration of the model (Equations 1–4) for 4 × 104 s, generating around 104 durations (comparable to the number of experimental durations).

A transition occurs when the firing rate of the population that becomes dominant exceeds the firing rate of the other two by 0.5. We introduced this strategy to avoid counting as a transition the situation when two populations activate simultaneously and are, for a very short time, both active.

The programs were coded in C and run in a Linux environment. Python was used to analyze and plot data. The random generator for white noise that generated long nonrepetitive series was taken from GNU libraries.

Results

Percept probabilities and dominance durations depend on the angle between gratings

For bistable stimuli, the probability of percept occurrence in a sequence—that is, the fraction of occurrences irrespective of durations—is always 0.5 for both percepts. However, the mean dominance durations or fraction of time dominant for these percepts may change with the stimulus parameters, such as the angle α between gratings. On the other hand, for tristable stimuli the probability of percept occurrence is not necessarily fixed and may range from almost 0 (the percept is almost never observed) to 0.5 (the percept is observed every other switch). Here, we contrast the fraction of percept occurrences with the mean dominance durations and how these two quantities change as the angle α changes.

Figure 3 shows that percept probabilities and dominance durations both change with the angle α. Indeed, as α increases, coherence occurrences decrease along with their mean dominance durations, while transparent percept occurrences and durations increase (Figure 3A, B). Moreover, as was observed by Naber and colleagues (2010), the changes in dominance durations are not proportional to the changes in percept probability. Consider the case of α = 120 in Figure 3; the coherent percept occurs more often, but its mean dominance durations are shorter than the mean dominance durations of the other two transparent percepts.

Statistics of switching: dependence on parameter α for psychophysics experiments (top) and for model simulations (bottom). (A) Mean of the natural logarithm of the dominance durations expressed in milliseconds (seconds in parentheses; for the experimental data, N = 6,516 durations, some epochs were removed, see Methods). (B) Percept probabilities in each trial (proportion of number of occurrences (for the experimental data, N = 6,817 percepts). (C) Probability to switch to the coherent percept after a transparent percept (for the experimental data, N = 3,752 sequences). Bars represent the means, and error bars are plus and minus one standard error estimated by ANOVA models including the variable subject as a random factor (here and in all the subsequent figures). Parameter values for the model are given in Methods. We used the same parameter values throughout the article, unless stated otherwise.

Figure 3

Statistics of switching: dependence on parameter α for psychophysics experiments (top) and for model simulations (bottom). (A) Mean of the natural logarithm of the dominance durations expressed in milliseconds (seconds in parentheses; for the experimental data, N = 6,516 durations, some epochs were removed, see Methods). (B) Percept probabilities in each trial (proportion of number of occurrences (for the experimental data, N = 6,817 percepts). (C) Probability to switch to the coherent percept after a transparent percept (for the experimental data, N = 3,752 sequences). Bars represent the means, and error bars are plus and minus one standard error estimated by ANOVA models including the variable subject as a random factor (here and in all the subsequent figures). Parameter values for the model are given in Methods. We used the same parameter values throughout the article, unless stated otherwise.

Figure 3C illustrates in a different way the phenomenon shown in Figure 3B: It shows that the probability to switch to the coherent percept after a transparent percept is higher than 0.5 not just for α = 80 and 100 but also for α = 120. This phenomenon indicates an asymmetry in the system, showing a clear tendency to visit the coherent percept more often.

Model reproduces experimental data

Using the model described earlier, we identified inhibition and input strength as playing essential roles in determining the mean dominance durations and percept probabilities. As the angle α increases, we increase the input strength to the transparent percepts ( Display Formula and Display Formula ) and reduce the input strength to the coherent percept (IC). These changes in the input strength increase both the mean dominance durations and percept probabilities of the transparent percepts, while decreasing those of the coherent one. As a result, the percept with longer dominance durations also shows a higher probability of occurring.

In order to account for the effect observed for α = 120, where the mean dominance durations for coherence are shorter than those for transparent percepts, while coherence occurs more often, we propose unbalanced inhibition: The two transparent populations inhibit each other more strongly than they inhibit the coherent one (β2 > β1 in Equation 1; see also Figure 2), making the latter more dominant and more likely to occur. We refer the reader to the “Dynamics of the model” subsection later and Unbalanced inhibition and input strength in Appendix 1 for more details. Our simulated results agree with those obtained in experiments (see Figure 3; compare top with bottom plots).

Histograms of dominance durations are well approximated by log-normal or gamma distributions

In bistable rivalry, the histograms of dominance durations are unimodal and skewed, with a long tail at long durations. Typically, they are well approximated by log-normal or gamma distributions (Levelt, 1968; Lehky, 1995; Hupé & Rubin, 2003). We explored whether dominance durations for tristable visual plaids had the same distributions as the ones observed for bistable stimuli. Moreover, since visual plaids can be also interpreted as bistable when the observer is asked to report only whether the plaid was perceived as coherent or transparent (without taking into account the depth reversals—see Figure 1), we also looked at the distributions of dominance durations for aggregated transparent percepts.

Figure 4 shows the distributions of the normalized dominance durations (NDDs) for the coherent percept (A), for aggregated consecutive transparent percepts (B), and for depth-segregated transparent percepts (C). To compute the normalized dominance durations, we divided the durations by the median duration of each percept type for each subject (experiments) and α value (experiments and model). Histograms for the model were normalized to have an area of 1.

Histograms of dominance durations can be approximated by a log-normal or gamma distribution, as in the bistable case. Following Moreno-Bote and colleagues (2007), these distributions suggest that the noise in the system is ultimately responsible for switching. In that study, it was shown that when switching is dominated by adaptation, histograms have a normal distribution, but when adaptation is weakened, effectively giving more weight to noise, the histogram gradually evolves into a skewed one (log-normal/gamma).

Thus, in order to reproduce the experimental results, we have chosen a set of parameters for which the deterministic system (Equation 1) with ni = 0 (noise term) operates in a winner-take-all regime. Indeed, the system has three stable fixed points. Without noise, the trajectories remain in one of these three attractors (each one corresponding to a different percept) and no switching occurs. When noise is restored, the trajectories start to switch between these three states. See “Dynamics of the model” and Balance between adaptation, noise, and input strength in Appendix 1 for more details.

Figure 4 shows the best fits by a log-normal distribution (red) and by a gamma distribution (green). The quality of these fits is very similar. Notice that the model can reproduce the histograms in both cases, when the stimulus is treated as bistable and when it is treated as tristable.

Coherent percept duration affects the probability of a switch back and provides evidence for adaptation

As opposed to bistable percepts, where the only possibility is alternation between the two percepts, in the tristable case we can look at the probability of the next percept's being a switch back—the same percept as the previous one—or a switch forward—a different percept from the previous one (we adopt the terminology from Naber et al., 2010). Moreover, we can ask whether this probability depends on the dominance durations (Figure 5A).

The probability that two transparent percepts when interleaved with a coherent percept have the same depth pattern increases as the duration of the coherent percept lengthens and decreases as the duration of the preceding transparent percept lengthens. (A) Given a triplet of the form T1CT2 in the perceptual sequence for α = 100 (T1 and T2 stand for both TL and TR), we show the probability of a switch back, that is T1 = T2, as a function of the duration of the coherent percept (B) and the first transparent percept (C). Triplets are ordered according to the dominance durations of the intermediate coherent percept (B) or the first transparent percept (C) in the triplet and then grouped in 10 bins of equal size (for experiments, n = 100, except for the last bin, n = 107). The coordinates of each dot are the middle point of each bin and the proportion of triplets in that bin that are a switch back. Notice that the probability of a switch back is on average below the chance level of 0.5. The red line is the linear regression and the blue line is the sigmoid fit; r is the correlation coefficient, and p-values are below or about 0.01 for both experiments and model. Fits for experimental data were obtained by excluding the first data point (see text).

Figure 5

The probability that two transparent percepts when interleaved with a coherent percept have the same depth pattern increases as the duration of the coherent percept lengthens and decreases as the duration of the preceding transparent percept lengthens. (A) Given a triplet of the form T1CT2 in the perceptual sequence for α = 100 (T1 and T2 stand for both TL and TR), we show the probability of a switch back, that is T1 = T2, as a function of the duration of the coherent percept (B) and the first transparent percept (C). Triplets are ordered according to the dominance durations of the intermediate coherent percept (B) or the first transparent percept (C) in the triplet and then grouped in 10 bins of equal size (for experiments, n = 100, except for the last bin, n = 107). The coordinates of each dot are the middle point of each bin and the proportion of triplets in that bin that are a switch back. Notice that the probability of a switch back is on average below the chance level of 0.5. The red line is the linear regression and the blue line is the sigmoid fit; r is the correlation coefficient, and p-values are below or about 0.01 for both experiments and model. Fits for experimental data were obtained by excluding the first data point (see text).

For symmetry reasons, we focus here on triplets consisting of two transparent percepts interleaved with a coherent one. We denote them by T1CT2, where T1 and T2 stand for both TL and TR. Indeed, we observed that percept probabilities as well as mean dominance durations for TL and TR are the same (see Figure 3A, B). So the next-percept probability when the current percept is coherent is 0.5 for both TL and TR. Hence, when we look at the next-percept probability as a function of the previous percept—when the current percept is coherent—we know that any change in the probability is due to perceptual history dependence and not any other intrinsic asymmetry in the mechanism. Notice that this is not case when the current percept is transparent, because results show (see Figure 3C) that it is more likely to switch to a coherent percept than to a transparent one. So there is already an intrinsic bias towards coherence, meaning that when the current percept is transparent and the previous one has been coherent, the probability of switching back is biased by this intrinsic predominance.

Figure 5B shows the probability that the two transparent percepts in the triplet T1CT2, Ti ϵ {TL, TR}, have the same depth pattern (i.e., T1 = T2)—what we call a switch back—as a function of the duration of the intermediate coherent percept C. It clearly shows that the probability of a switch back increases as the duration of the coherent percept increases, and saturates at 0.5 (chance level).

Notice, however, that the experimental data show that for very short coherent durations, the probability to switch back is well above zero (open circle in Figure 5B). We excluded this point (considered as an outlier) to fit the functions. Without excluding it, fits were of course not as good, but the increase of switches back as a function of coherent duration was always highly significant (and present in every subject). So including this point in the data analysis or not would not change the main conclusions of this study.

Further exploration of the effect of short coherent durations on switch-back probabilities would be required, but it might prove difficult because these events are rare for conditions of equiprobability between percepts. In addition, short percepts may not be reported accurately. Indeed, the data for the first bin correspond to coherent durations that are between 200 ms and 1 s. They may therefore include some errors in the button presses. But on the other hand, some subjects may not report percepts that are too short. Individual data were quite variable indeed (as expected, given the limited power for such analysis), with two subjects having a clear higher switch-back proportion for very short durations and three subjects clearly not showing that phenomenon. We should keep in mind that this high probability of switch back for short durations may correspond to a real mechanism not included in the model if, for example, it corresponds to some priming effect, like the tendency to report the same percept for brief interruptions of the stimulus (Leopold, Wilke, Maier, & Logothetis, 2002; Maier, Wilke, Logothetis, & Leopold, 2003). The possible addition of this mechanism should not, however, affect the mechanisms that we reveal here.

We also explored whether the durations of the first transparent percept in the triplet influence switching probabilities. Figure 5C shows the probability of switching back as a function of the mean dominance durations for the first transparent percept T1 in the triplet T1CT2. Although the dependence is less strong than for the coherent durations, results show that the probability of a switch back is higher if T1 is short.

We chose α = 100 for Figure 5B and C in order to explore the relationship between coherent duration and probability of switching back with everything else being equal (i.e., independent of any other features in the response properties). Moreover, we found the largest variability of coherent durations for α = 100. We observed similar trends for α = 80 and α = 120 (results not shown). Further, a similar and even stronger relationship between coherent-percept duration and switch-back probability was observed in two other independent data sets collected over many subjects and pooled across different plaid parameters (Hupé, 2010; these data were presented in Hupé & Pressnitzer, 2012; see also Appendix 2).

We interpret this result as showing evidence for a negative feedback mechanism that acts at a slower timescale than the firing-rate variable r. We found that a subtractive negative feedback opposing the excitatory input, typically referred as spike-frequency adaptation and represented by the variable a in Equation 1, is essential to explain the mentioned effect. Thus, in our model (Equation 1), when a percept becomes dominant, it starts to recruit some adaptation. The adaptation recruited will prevent this percept from becoming dominant again immediately after being suppressed, making it very unlikely for this percept to recur after a short duration. Moreover, the less time a percept has been active, the less adaptation this percept has recruited while active and the more likely a reappearance of this percept. In “Dynamics of the model” we discuss this mechanism in more detail. The simulations agree with the experimental results (Figure 5).

Other possible mechanisms for negative feedback include synaptic depression—a divisive mechanism acting directly on the inhibitory input; the depression variable multiplies the term that models inhibition strength during prolonged firing (see Equations 5 and 6). When we implemented synaptic depression in our model, we were unable to reproduce the probability dependence on durations. We refer the reader to Different roles for spike-frequency adaptation and synaptic depression in Appendix 1 for a more detailed mathematical discussion on the two negative feedback mechanisms.

Noise-driven switching removes correlations in dominance durations

Results in Figure 5 suggest that T1 durations are negatively correlated with C durations: Switch-back triplets (T1 = T2) are more likely for short T1 and long C, while switch-forward triplets (T1 ≠ T2) are more likely for long T1 and short C. Figure 6A shows C durations plotted against T1 durations for T1CT2 triplets. T1 and C percept durations were normalized, independently, by their median durations for each subject. We included T1 and C of T1CT2 sequences only for α = 100, in order to allow the comparison with the analysis of percept choice in Figure 5. The lack of any strong correlation is clearly seen, in agreement with previous observations (see Rubin and Hupé, 2005, for example). We interpret this lack as evidence that both the duration of T1 and the duration of C contribute to the probability of a switch back, even though both durations are not correlated with each other. In the model, such an absence of relationship is captured by having a high level of noise (noise-driven attractor model; see Figure 6B). Thus, when considering percept duration independently of percept choice, there is no evidence of adaptation.

Durations of T1 and C for T1CT2 sequences for experiments (A) and model (B). For experimental data, plot durations were normalized, independently, by the median durations for each subject. Correlations are absent or small. Here, N is the number of dots, R2 is the coefficient of determination, and p is the p-value.

Figure 6

Durations of T1 and C for T1CT2 sequences for experiments (A) and model (B). For experimental data, plot durations were normalized, independently, by the median durations for each subject. Correlations are absent or small. Here, N is the number of dots, R2 is the coefficient of determination, and p is the p-value.

In the previous sections, we have described several features of the system (dominance durations, percept probabilities, distributions, and switch-back probabilities) and discussed how the external input, inhibition, adaptation, and noise affect them. We offer here a mechanistic explanation via a simple schematic for the dynamics of switching in the model that combines these elements (Figure 7A). A population becomes active when the total input (Figure 7Ba) to the input–output function S given in Equation 2 (Figure 7A, inset) is above the threshold θ. The model parameters are chosen so that when a population becomes active, it suppresses the other two, ensuring that only one population is active at the same time. The total input of the active population (Figure 7Ba, time t1) decreases over time due to the adaptation current (Figure 7Bc, time t1), while the suppressed populations recover from adaptation, bringing their total input closer to the threshold θ. Since adaptation and input strength have been chosen so that the total input never crosses the threshold, the system would never show alternations without the presence of noise. Indeed, the noise-driven fluctuations may bring the total input of the suppressed populations above threshold, causing the switching.

Dynamical properties of the model. (A) Schematic representation of the mechanism underlying transitions between suppressed and dominant states. The height of the bars indicates the total input to the input–output function for each population at three different times indicated in (B). (B) Time courses of the total input minus the noise term ni (Ba), activity (Bb), and adaptation (Bc) of the three populations: coherent (red), transparent right (green), and transparent left (blue). The horizontal line in (A) and (Ba) corresponds to the threshold (θ = 0.2) of the input–output function [inset, (A)]. When the bar is above the threshold, the population is active (activity near 1), and when it is below the threshold, the population is suppressed (activity near 0)—see the corresponding times in (Ba) and (Bb). The arrow indicates the effect of adaptation on the total input. A downward arrow indicates that adaptation is increasing for the active population, reducing the total input for that population [see the corresponding times in (Ba) and (Bc)]. An upward arrow indicates that adaptation is decreasing for the suppressed population, increasing the total input for that population [see the corresponding times in (Ba) and (Bc)]. Notice that adaptation drives the input level closer to the transition threshold but still well above or below it.

Figure 7

Dynamical properties of the model. (A) Schematic representation of the mechanism underlying transitions between suppressed and dominant states. The height of the bars indicates the total input to the input–output function for each population at three different times indicated in (B). (B) Time courses of the total input minus the noise term ni (Ba), activity (Bb), and adaptation (Bc) of the three populations: coherent (red), transparent right (green), and transparent left (blue). The horizontal line in (A) and (Ba) corresponds to the threshold (θ = 0.2) of the input–output function [inset, (A)]. When the bar is above the threshold, the population is active (activity near 1), and when it is below the threshold, the population is suppressed (activity near 0)—see the corresponding times in (Ba) and (Bb). The arrow indicates the effect of adaptation on the total input. A downward arrow indicates that adaptation is increasing for the active population, reducing the total input for that population [see the corresponding times in (Ba) and (Bc)]. An upward arrow indicates that adaptation is decreasing for the suppressed population, increasing the total input for that population [see the corresponding times in (Ba) and (Bc)]. Notice that adaptation drives the input level closer to the transition threshold but still well above or below it.

Notice that in this regime of parameters, the total input for the suppressed percepts is closer to the threshold than is the input for the active population, suggesting that the transitions occur due to an escape mechanism (the total input to one of the suppressed populations crosses threshold, causing the suppression of the active one). We could have also considered the case where the active population is closer to the threshold and the transition occurs due to a release mechanism (the total input to the active population falls below the threshold, allowing the suppressed populations to take over; see Shpiro, Curtu, Rinzel, & Rubin, 2007). In our simulations, an escape mechanism introduces more variability to the dominance durations, which fits better with experimental observations (see Balance between adaptation, noise, and input strength in Appendix 1) than a release mechanism, but we did not observe any other remarkable differences that are relevant for visual plaids (results not shown).

Just after coherence is suppressed, its total input is below the total input for the other suppressed transparent population (Figure 7Ba, time t2). However, if the active population TR remains active for long enough (based on the timescale of adaptation), coherence recovers from adaptation and its total input approaches threshold. Moreover, since coherence receives less inhibition than the other suppressed transparent population TL from the active TR (unbalanced inhibition), its total input overtakes that of the other suppressed percept, making it more likely that coherence will reappear.

Thus, for equal inputs to the three populations, coherence will be more likely to appear, while its durations will be shorter compared to the ones for the other transparent percepts. Indeed, when a population becomes active, it can be overtaken by two populations. If one of these two suppressed populations is pushed further down from the threshold, the chances of its being overtaken are reduced and the dominance durations of the active population lengthen. That is the case when transparent populations are active (see Unbalanced inhibition and input strength in Appendix 1).

We next examine Figure 7 to gain a further understanding of the dependence of switch-back probabilities on adaptation strength. Indeed, when the coherent population is active, the two suppressed transparent populations have different total inputs because of adaptation (see Figure 7Ba, times t1 and t3). The suppressed population that was active before the current state still has some adaptation, and therefore its total input is lower and further from the threshold than the total input of the other suppressed population. Hence, if coherent dominance durations are short, a switch forward is more likely than a switch back. As coherent durations become longer, the total input of both suppressed populations becomes similar and the probability of a depth switch becomes 0.5 (see Figure 7Ba, times t1 and t3). Moreover, if the first transparent population in the triplet was active for a long time, it recruited more adaptation and will have less probability of reappearing for a longer time.

The question that arises now is what the balance is between input, inhibition, adaptation, and noise that generates the desired output. Of course, a system that is dominated by adaptation would show small variability, while a system dominated by noise would show switching probabilities independent of percept durations and exponential distributions for dominance durations (peaked around the timescale of noise order of 200 ms). Moreover, the relative inputs to the populations will contribute to the dominance durations and percept probabilities. We include a mathematical exploration of this balance in Appendix 1, under Balance between adaptation, noise, and input strength and Unbalanced inhibition and input strength.

Noise and adaptation

A unique contribution from tristability is that it helps to constrain more precisely the balance between noise and adaptation beyond the constraints for bistability. Hitherto, the discussion about the role of noise and adaptation in the switching mechanism for bistable models has mainly focused on how they affect the histograms for dominance durations and the lack of correlations for successive percepts. For tristability, we can add another constraint: the bias in switch-back probabilities. If adaptation is removed from the system or noise is too strong, the switch-back probability in Figure 5 remains constant at 0.5 independent of the durations of coherence. See also Balance between adaptation, noise, and input strength in Appendix 1.

Using the parameters of the model that best fit the data, we removed adaptation from the model and found that distributions were exponential instead of log-normal or gamma; there was no dependence of the switch-back probability on the coherence duration, and durations increased (switch rate decreased). For instance, for α = 120, we found that the mean percept durations doubled (a halving of the switch rate). Namely, within a 3-min run, we had on average 39 switches with adaptation (for experiments, we had a range from 19 to 45); without adaptation we would have had 21 switches. For α = 80, we observed a reduction of switches from 30 to 10 in 3 min. These observations relate to the results of Blake, Sobel, and Gilroy (2003), who managed to reduce the amount of adaptation in the system by means of a visual stimulus that was moving in the visual field so that it was constantly engaging unadapted neural tissue. They found, as our numerical model predicts, that perceptual alternations were slowed down when adaptation was reduced.

Other possible network architectures

So far, we have considered a model architecture in which three populations compete at the same level through direct cross-inhibition (Figure 2). We also considered a hierarchical architecture for the model (Figure 8A), in which depth and movement are encoded by two different groups of populations (motion segregation activates depth perception). The model consists of four populations: The first pair of populations, labeled C and T, encode motion perception (coherence vs. transparency). They compete through inhibition. When the population encoding transparency is active (coherence is suppressed), a second pair of populations (excited by population T) is recruited. Those two subpopulations, labeled TR and TL, encode depth perception (which drifting grating is on top), and they compete through inhibition as well.

Transparent percepts are significantly shorter when preceded by a coherent percept for the hierarchical model. (A) Model architecture for a hierarchical model. The competition is split into two competitions: coherent versus transparent and, within transparent, TL versus TR. (B) Histogram of dominance durations of transparent percepts TL and TR showing abundance of short durations and reduction of the most frequent ones. The distributions are no longer gamma or log-normal. Compare with histograms in Figure 4. (C) Normalized dominance durations (normalized by the median) for transparent percepts observed before a coherent percept (BC) and before the opposite transparent percept (BT) for the hierarchical model (HM). In the hierarchical model, the duration of a transparent percept preceding the coherent one is forced shorter when the latter becomes active, which is not observed in the nonhierarchical model or in the experiments, which in fact show the opposite trend (results not shown).

Figure 8

Transparent percepts are significantly shorter when preceded by a coherent percept for the hierarchical model. (A) Model architecture for a hierarchical model. The competition is split into two competitions: coherent versus transparent and, within transparent, TL versus TR. (B) Histogram of dominance durations of transparent percepts TL and TR showing abundance of short durations and reduction of the most frequent ones. The distributions are no longer gamma or log-normal. Compare with histograms in Figure 4. (C) Normalized dominance durations (normalized by the median) for transparent percepts observed before a coherent percept (BC) and before the opposite transparent percept (BT) for the hierarchical model (HM). In the hierarchical model, the duration of a transparent percept preceding the coherent one is forced shorter when the latter becomes active, which is not observed in the nonhierarchical model or in the experiments, which in fact show the opposite trend (results not shown).

We have run simulations for this model and have observed that when the two transparent patterns (depth reversals) are distinguished, the distributions in the hierarchical model are no longer log-normal or gamma; instead, there is an abundance of short durations and a reduction of most frequent ones (see Figure 8B). Indeed, since the competition between the two transparent percepts TR and TL takes place only when the transparent percept T is active, the duration of the transparent percept preceding the coherent one is prematurely terminated when coherence becomes active. Hence, the mean dominance durations of the transparent percepts preceding a coherent one are significantly shorter than those of the transparent percepts preceding the opposite transparent percept (see Figure 8C).

Based on firing-rate models, our results suggest that a nonhierarchical architecture where motion is encoded together with depth can better fit the experimental results. We do not claim that the architecture described in Figure 2 is the only or optimal possibility. Indeed, we think that several network architectures (probably more complex) could produce similar results. However, we emphasize that the one considered herein (Figure 2) is the simplest architecture that could encode a direct influence between the individual transparent percepts and the coherent one.

Discussion

We have studied the dynamics of perceptual switching for tristable visual plaids. We have developed an idealized neuronal competition model (that extends the existing models for bistability) that can account for the results, reported here, from behavioral experiments of switching. Working with this relatively simple model (with minimal degrees of freedom), we have discovered that noise and adaptation have different roles. Thus, while noise is ultimately responsible for switches (adaptation alone cannot trigger a switch) and thereby controls percept durations, adaptation affects the percept choice.

Different roles for noise and adaptation

The issue of perceptual history dependence and, in consequence, the roles that adaptation and noise play in switching, has been central in discussions on binocular rivalry. Several researchers have reported the absence of correlations between successive dominance durations (Fox & Hermann, 1967; Levelt, 1968; Lehky, 1995; Logothetis et al., 1996; Rubin & Hupé, 2005), suggesting that perceptual multistability is a memoryless process; perceptual history does not affect the durations of subsequent percepts. This observation has been interpreted as meaning that adaptation plays a secondary role in the switching mechanism and that switching is dominated by noise (Brascamp et al., 2006; Lankheet, 2006; Moreno-Bote et al., 2007).

However, other studies exploring interrupted bistable stimuli have suggested the possibility of priming (implicit memory effect) in bistable rivalry. When the presentation of an ambiguous stimulus is interrupted by blank presentations or other visual stimuli, subjects tend to report restarting with the just-previous percept (Leopold et al., 2002; Maier et al., 2003). This “stabilization effect” depends not only on the latest percept before the interruption but also on the previous perceptual history that traces back several seconds (Brascamp et al., 2008; Pastukhov & Braun, 2008). Although percept choice and percept switching may involve different processes (Noest, van Ee, Nijs, & van Wezel, 2007), these results suggest that a form of adaptation must be present in ambiguous visual perception. Moreover, recent results for other types of tristable visual stimuli also point in favor of an adaptation model to account for certain aspects of transitions and durations (Naber et al., 2010; Wallis & Ringelhan, 2013).

Our results for tristable percepts indicate that both noise and adaptation are present in the system but they have different roles, thus shedding new light on a longstanding controversy. In the model, we have chosen parameters so that the balance between noise and adaptation causes noise to drive the switches—adaptation is too weak to produce switches by itself—and causes adaptation to determine the percept switch—adaptation, even if weak, is strong enough to bias the switch-back probabilities. Indeed, short durations of the current percept increase the probability of a switch forward; from the two suppressed percepts, the less adapted is, on average, favored to become active (see Figure 7A). On the contrary, long durations reduce any advantages to the suppressed percepts, making them equally likely to become active; therefore, the “memory” of the last transparent percept is erased (see Figure 5). Moreover, adaptation decays or builds up on a timescale of the order of a few seconds, so one might expect that its effect would be erased after one percept. Thus, the adaptation process has the effect of creating a sort of memory as well as disfavoring early transitions (Moreno-Bote et al., 2007).

The nature of the slow adaptation process has a strong influence in this perceptual bias. In our simple model, we could explain the trends shown in Figure 5 only with spike-frequency adaptation (subtractive negative feedback), not with synaptic depression (divisive negative feedback), suggesting a different functional role for these slow negative feedback processes (see Different roles for spike-frequency adaptation and synaptic depression in Appendix 1). Previous studies analyzing different types of slow adaptation, divisive versus subtractive, have not reported functional differences between these two mechanisms in bistable rivalry (Shpiro et al., 2007; Shpiro et al., 2009).

Parametric manipulations

The discovery of different roles for noise and adaptation was made by studying a very specific set of stimuli and parameters. This was necessary to identify the mechanisms with constant input. Such a strong constraint could question the generalization of our result: Would similar results be obtained for different parameters and stimuli?

For parameters of visual plaids, extensive studies of the long-term dynamics of plaids (Hupé & Rubin, 2003, 2004; Rubin & Hupé, 2005; Pressnitzer & Hupé, 2006; Hupé, Joffo, & Pressnitzer, 2008; Hupé & Pressnitzer, 2012) have shown that the effects of parametric manipulations studied so far are independent of each other. In particular, this is the case for motion direction (Hupé & Rubin, 2004) and speed (Hupé & Rubin, 2003), both parameters producing effects notably independent of α. One could, however, question whether the main unexpected empirical observations could have been due to the choice of always having the coherent motion aligned with the vertical direction. Indeed, this vertical symmetry of the stimulus may have led to the coherent percept (also vertical) being more often visited than the two transparent percepts (also oblique), as shown in Figure 3. Even though the critical relationship between the duration of the coherent percept and the switch-back probability (Figure 5B) may be more difficult to explain by the vertical symmetry of the stimulus, one may also legitimately wonder whether such a result may generalize to other parametric conditions. Independent data were in fact collected earlier by J-MH confirming both results over a larger range of parameters. In Appendix 2 we include a summary of these experiments, for which different directions were used. For that reason as well as wanting as much empirical data as possible for a single condition, we decided to limit the set of parameters to be explored. Indeed, the present data are stronger because they were obtained on a larger data set without parametric manipulation (10 repetitions of 3-min trials, that is 15 times as many sequences for each stimulus and subject as in the previous data set).

For other tristable stimuli, several observations made by Naber and colleagues (2010) and Wallis and Ringelhan (2013) after pooling data over several parameters showed some commonalities with our behavioral data and could be accounted for by our model with the aforementioned roles for noise and adaptation. For instance, Naber and colleagues (2010) reported that switch-back triplets typically have longer durations than average for the intermediate percept (in our case, C) and shorter durations than average for the first percept in the triplet (in our case, T1). On the other hand, Wallis and Ringelhan (2013) reported that the switch-back transitions were longer than the switch-forward ones, a property that was also observed in our model (results not shown). Although we may need to adjust several parameters in the model to reproduce the dominance durations and percept probabilities of each particular stimulus, the roles of noise and adaptation in the model described herein will remain unchanged. The fact that histograms of dominance durations in our model match across a multitude of stimuli (Naber et al., 2010; Wallis & Ringelhan, 2013, supplementary material) provides grounds for this speculation.

Actually, when we considered a different architecture for the model, such as the hierarchical one in Figure 8, the roles of noise and adaptation were the same as for the same-level architecture. Of course, the exact parameters of the model depend on the specific architecture of the model, but the roles they play in the dynamics do not. Indeed, the hierarchical model that could best fit the experimental data was noise driven but could reproduce the switch-back dependence on percept duration with adaptation (results not shown).

The role of inhibition and input strength in percept duration and percept choice

Biases that may be specific to this paradigm should not be confounded with general mechanisms. Here, the higher number of occurrences of the coherent percept (see Figure 3) could simply be accounted for by unbalanced inhibition. Previous work on bistable models has shown that dominance durations are affected by the input strength (Laing & Chow, 2002; Shpiro et al., 2007), but in bistable stimuli, percept probabilities are always fixed at 0.5. Here, we have shown that inhibition and input strength affect both the dominance durations and percept probabilities and the relation between them (see “Model fits with the experimental data,” earlier). Moreover, we have shown how inhibition and input strength can be manipulated asymmetrically to adjust dominance durations and percept probabilities independently (see also Unbalanced inhibition and input strength in Appendix 1). From this understanding, one can then adjust the model to other experimental data. For instance, if a stimulus leads to dominance durations and percept probabilities that are equally dominant in both mean and percept probability (see, for example, Wallis and Ringelhan, 2013) we suggest symmetric inhibition and input strength.

Visual plaids and their relevance for motion and depth perception

The same-level model was better than the hierarchical model for fitting the experimental data. This has interesting consequences for the mechanisms of motion segmentation and depth ordering (Adelson & Movshon, 1982; Hupé & Rubin, 2003; Hupé & Pressnitzer, 2012). The underlying ambiguity here comes from the phenomenon known as the aperture problem. The direction of movement of a bar with occluded edges cannot be determined. In the absence of other external cues, the subject tends to perceive the velocity in the direction normal to the stripes. When two gratings are superimposed, they can be perceived as moving independently (each one in the direction orthogonal to its stripes) or coherently (both gratings in the same direction). While transparency is perceived, since gratings move in opposite directions and share intersections, there is a conflict that is resolved by separating the object into two and placing them on different planes. Since the intersections can be assigned equally well to both gratings, alternation occurs.

A reasonable question is whether incoherent motion leads to depth perception or whether depth perception is encoded together with motion segregation. The architecture of our model is designed so that these two cues influence each other with no hierarchy, so one is not leading the other. Simulations of our model with a hierarchical architecture (motion and depth are encoded separately) were not able to reproduce experimental results (Figure 8).

Existing physiological data from the middle temporal visual area (MT) of the visual cortex provide a neural substrate for a nonhierarchical model architecture (see Born & Bradley, 2005, for an MT review). Neurons in the visual area MT are involved in detection of motion direction; they are selective to a particular direction of motion (their activity is enhanced or reduced depending on whether a preferred or nonpreferred motion occurs, respectively; Albright, Desimone, & Gross, 1984; Britten, Shadlen, Newsome, & Movshon, 1992). Subsequent studies have suggested that MT is involved in the perception of depth as well (Bradley, Chang, & Andersen, 1998; DeAngelis, Cumming, & Newsome, 1998; Dodd, Krug, Cumming, & Parker, 2001). The suppression of MT responses due to nonpreferred motion is reduced when the nonpreferred and preferred motions occur in separate depth planes (Bradley, Qian, & Andersen, 1995). Hence, placing opposing movements in different planes prevents the cancellation of the motion signal.

Limitations of the model

We emphasize that we did not attempt to account for all properties of visual plaids. Instead, we tried to keep the model as simple as possible to allow for mathematical analysis of the parameters and still reproduce the most prominent features observed in psychophysics experiments. We believe that some of the remaining features can be explained by straightforward extensions of our model.

For instance, no attempt was made to include a mechanism to deal with first-percept inertia, a phenomenon observed in experiments with visual plaids (Hupé & Rubin, 2003; Hupé & Pressnitzer, 2012). This refers to a tendency, observed at the stimulus onset, for the first percept (which is almost always coherent: first-percept bias) to be longer than the subsequent coherent ones (see Hupé and Pressnitzer, 2012, for a suggestion of an additional mechanism to account for first-percept inertia and for a simple explanation of first-percept bias).

Two other observations may require additional mechanisms, if confirmed: one subject with short transparent durations and a reverse relationship for switch back, and the high switch-back probability for very short durations of the intermediate percept (Figure 5B). Once these additional phenomena are fully documented and explained, the simple model presented here may require additional variables, such as an additional negative feedback, either slower or of a different type (divisive).

Plausible neural bases for noise and adaptation

We think that the inclusion of various additional mechanisms should not change the essential roles of noise and adaptation identified in this study (yet we cannot prove it). We can speculate what these roles mean for the neural correlates of multistable perception. Neural correlates of visual bistable perception have been observed in low-level areas (with fMRI: Tong & Engel, 2001; Wilson, Blake, & Lee, 2001; Haynes, Deichmann, & Rees, 2005; Lee, Blake, & Heeger, 2005, 2007; Wunderlich, Schneider, & Kastner, 2005), high-level visual areas (with monkey electrophysiology: Leopold & Logothetis, 1999; Williams, Elfar, Eskandar, Toth, & Assad, 2003), and in nonvisual parietal and frontal areas (Sterzer & Kleinschmidt, 2007). Among these neural correlates, some may relate to the percept consciously experienced, while others may relate to the mechanism of switching. The distinct roles of adaptation and noise that we have discovered may help in clarifying the apparently conflicting results regarding the brain areas involved in multistable perception. Adaptation is more likely to concern the neural populations that encode each competing percept, and therefore should be observed within the visual cortex. The time course that we observed, for both adaptation of the dominant percept and recovery of the suppressed percept, could be used as a precise signature of the neural correlates. Simply looking for the neural correlates of the perceived interpretations is not decisive, since once an interpretation is selected, it may be both transmitted to higher level areas and fed back to lower visual areas, for example for attention mechanisms (Watanabe et al., 2011). Looking for the dynamics of the neural correlates of both the suppressed and the dominant percepts would provide much more stringent criteria. Our model revealed the critical role of noise in determining the time of switch. Noise in the model could reflect many different mechanisms, including blinks and non-stimulus-related eye movements (that change the visual input) as well as high-level attention and intention mechanisms. Our proposed role for noise is therefore fully compatible with the involvement of parietal and frontal structures (Sterzer & Kleinschmidt, 2007). However, depending on the content of such noise, prefrontal activity may not be systematically necessary to trigger a switch, if other sources of noise are available.

In sum, we propose that adaptation and noise are both involved in perceptual alternations, but with different roles: Noise controls the time of the switch, while adaptation controls which percept is next. Based on our results, we think it is worthwhile and interesting to pursue this proposal in other contexts of perceptual multistability (which include a more general group of experiences), to probe whether there is a kind of structured response of the brain to ambiguous stimuli.

Acknowledgments

The research for this study was supported, in part, by i-math Fellowship “Proyectos flechados,” Swartz Foundation, “Juan de la Cierva” fellowship, MCyT/FEDER grant MTM2012-31714, and CUR-DIUE grant 2009SGR859 (GH); and Agence Nationale de Recherche ANR-08-BLAN-0167-01 (J-MH). GH wants to acknowledge the use of the UPC Applied Math cluster system for research computing. GH and JR thank the Mathematical Biosciences Institute (MBI) at Ohio State University for hospitality and support during long-term visits when the final version of the manuscript was written; GH especially appreciates receiving an Early Career Award from the MBI.

In this appendix, we describe the derivation of the model. In particular, we discuss how we chose the parameters and elements of the model.

Balance between adaptation, noise, and input strength

We start the exploration of the parameter space with a totally symmetric model: identical inhibition β1 = β2 = 1 and identical input strength I = IC = Display Formula = Display Formula in Equation 1 for the three populations. We treat the effects of asymmetry later in Unbalanced inhibition and input strength. Notice that in this symmetric model, the mean dominance durations and percept probabilities are exactly the same for the three populations. This case is not observed in the experimental data, but it will be our starting point to reproduce the experimental results obtained for α = 120, when mean dominance durations and percept probabilities are more similar between the three percepts (see Figure 3 and Table 1). Of course, the next step will be to explore possible mechanisms to create an asymmetry in these parameter values so that the simulated and experimental results match well.

So we focus on the experimental results for α = 120—the closest to isodominance (see Table 1). When we take the natural logarithm of the dominance durations in milliseconds, the mean for the three percepts is 8.2 (around 3.6 s) and ranges between 7.9 and 8.9 among subjects (that is, between 2.6 and 7.3 s), and the standard deviation is around 0.74 and ranges between 0.6 and 0.88. We also know that the switch-back probability when the current percept is coherent is on average 0.28 and ranges between 0.13 and 0.42.

We start with a symmetric model and vary the adaptation strength γ in Equation 3 and the input strength I for the three populations, searching for those parameter regions where the mean and the standard deviations of the dominance durations and the switch-back probabilities are in the relevant range described previously. We repeat the parameter exploration for different values of the noise level σ in Equation 4. Results are shown in Figure 9. Notice that these results for input and adaptation are always relative to the inhibition strength, which in this case is 1. Hence one can also read the results in Figure 9, as well as in Figures 10 and 11, as relative strength between input or adaptation and inhibition.

Parameter regions for which the model's behavior satisfies various statistical constraints. The mean and standard deviations of dominance durations and the switch-back probabilities are in the relevant range in the presence of noise of strength (left) σ = 0.02 (center) σ = 0.08, and (right) σ = 0.12 for a symmetric model: same inhibition β1 = β2 and same input strength I = IC = = in Equation 1 for the three populations. Regions: blue in (A) where log-mean dominance durations are in the relevant range of 7.9 (2.6 s) and 8.9 (7.3 s); red in (B) where standard deviations are in the relevant range of 0.6 to 0.88; yellow in (C) where switch-back probabilities are in the relevant range of 0.13 to 0.42; and light gray in (D) where the three previous regions overlap. The white line indicates the border between the region with three attractors (each corresponding to a solution where one population is active and the other two are suppressed) and other dynamic regimes. Black regions correspond to those that fail to meet the criteria described as well as those where fewer than 100 percept durations were generated (less than 0.01 switches/s).

Figure 9

Parameter regions for which the model's behavior satisfies various statistical constraints. The mean and standard deviations of dominance durations and the switch-back probabilities are in the relevant range in the presence of noise of strength (left) σ = 0.02 (center) σ = 0.08, and (right) σ = 0.12 for a symmetric model: same inhibition β1 = β2 and same input strength I = IC = = in Equation 1 for the three populations. Regions: blue in (A) where log-mean dominance durations are in the relevant range of 7.9 (2.6 s) and 8.9 (7.3 s); red in (B) where standard deviations are in the relevant range of 0.6 to 0.88; yellow in (C) where switch-back probabilities are in the relevant range of 0.13 to 0.42; and light gray in (D) where the three previous regions overlap. The white line indicates the border between the region with three attractors (each corresponding to a solution where one population is active and the other two are suppressed) and other dynamic regimes. Black regions correspond to those that fail to meet the criteria described as well as those where fewer than 100 percept durations were generated (less than 0.01 switches/s).

Switch-back probability maps as input and strength of the slow negative feedback process vary [adaptation (A) and synaptic depression (B)] in the presence of noise of strength σ = 0.08. Notice that for synaptic depression, the range of switch-back probabilities is restricted to values around 0.5, whereas for adaptation, they show a wider range decreasing down to 0. The white line indicates the border between the region with three attractors (each corresponding to a solution where one population is active and the other two are suppressed) and other dynamic regimes. Small black regions in the upper central part correspond to those where fewer than 100 percept durations were generated (less than 0.01 switches/s).

Figure 10

Switch-back probability maps as input and strength of the slow negative feedback process vary [adaptation (A) and synaptic depression (B)] in the presence of noise of strength σ = 0.08. Notice that for synaptic depression, the range of switch-back probabilities is restricted to values around 0.5, whereas for adaptation, they show a wider range decreasing down to 0. The white line indicates the border between the region with three attractors (each corresponding to a solution where one population is active and the other two are suppressed) and other dynamic regimes. Small black regions in the upper central part correspond to those where fewer than 100 percept durations were generated (less than 0.01 switches/s).

Mean dominance durations (left column) and percept probabilities (right column) for coherent population (red), transparent left (blue), and transparent right (green). (A–B) Only the inhibition strength β2 between the two transparent percepts is varied; β1 is fixed at 1 and IC = ITL = ITR = 1. (C–D) Only the input strength IC to coherence is varied; = is fixed at 0.95 and β1 = β2 = 1.

Figure 11

Mean dominance durations (left column) and percept probabilities (right column) for coherent population (red), transparent left (blue), and transparent right (green). (A–B) Only the inhibition strength β2 between the two transparent percepts is varied; β1 is fixed at 1 and IC = ITL = ITR = 1. (C–D) Only the input strength IC to coherence is varied; = is fixed at 0.95 and β1 = β2 = 1.

We focus on the set of parameters for which the deterministic system presents three stable fixed points, each one corresponding to one population active and the other two suppressed. This set corresponds to the region that lies inside the solid white line in Figure 9.

For parameter values close to the right branch of the boundary curve (solid white line in Figure 9), the system operates in an escape mechanism (Shpiro et al., 2007); the total input for the suppressed populations is closer to the threshold when adaptation is fully removed than is the input for the active population when it is fully adapted (see Figure 7A). For parameter values close to the left branch of the boundary curve, the system is set in a release mechanism (Shpiro et al., 2007); the total input for the active population when it is fully adapted is closer to the threshold than is the input for the suppressed populations when adaptation is fully removed. For values in the middle, both the active and the suppressed populations are equally far from the threshold and both mechanisms are involved in transitions.

The closer to the boundary curve, the shorter the dominance durations will be. Indeed, for the parameter values that are close to the boundary curve, the total input when the population is fully adapted or adaptation is fully removed is close to the threshold, and therefore the dominance durations are short. Moreover, when noise is added to the system, the durations are shortened. These two observations can explain the changes in the blue region (that satisfies the constraint on mean dominance durations) as noise varies (see Figure 9A). Take a fixed value for adaptation, for instance a = 0.25; then for small values of noise, the relevant region for dominance durations lies close to the boundary curve. As noise is increased, durations are shortened and the relevant region for dominance durations is found now in an area that is farther from the boundary curve. As noise is increased still more, the relevant region shrinks and moves farther away from the boundary curve; and eventually it becomes impossible to find durations of that length.

The area of the region that matches the standard deviation of the dominance durations (red region in Figure 9B) increases with noise strength (left to right column). Noise in the system is needed to generate transitions and have large variability. Indeed, the black region corresponds to nonrelevant regions as well as regions where fewer than 100 percepts were generated (less than 0.01 switches/s).

From Figure 9C, it is clear that when adaptation is not present in the system or is very small, it is not possible to match the switch-back probabilities observed in the experiments (0.13–0.42). Indeed, the yellow region starts at positive values of adaptation strength, and the minimum value of adaptation strength in the yellow region increases with noise (Figure 9C moving from left to right). Adaptation competes with noise in trying to preserve a sort of “memory.”

Figure 9D shows the overlapping among the three regions. For weak and strong values of noise, the three regions do not overlap. In the case of weak noise, the parameters need to be at a certain distance from the boundary curve (white line) to achieve the desired variability, but then the durations become too long and the effect of adaptation on switch-back probabilities is erased. In the case of strong noise, strong adaptation is needed to compensate for the effect of noise in erasing memory, causing the shortening of durations. So a moderate amount of noise and adaptation provides the best fit to experimental data.

Different roles for spike-frequency adaptation and synaptic depression

For the simulations reported here, we have used a subtractive mechanism for slow adaptation. Other possible mechanisms for negative feedback include synaptic depression—a divisive mechanism acting directly on the inhibitory input (Equations 5 and 6). When we implemented synaptic depression in our model, we could find a region of parameters that matched the experimental data for the mean and standard deviation of dominance durations. However, we were unable to reproduce the dependence of switch-back probability on durations described in Figure 5. Figure 10 shows the probability of switching back for both slow negative feedback mechanisms (spike-frequency adaptation and synaptic depression) and for different values of the input and the strength of slow negative feedback. We can observe that for synaptic depression, the probability remains close to 0.5 (chance level), whereas for adaptation it ranges from 0 to 0.5. The model with synaptic depression therefore cannot account for the observed values of switch-back probabilities.

Unbalanced inhibition and input strength

In Balance between adaptation, noise, and input strength, we worked with a symmetric model (the inhibition strength and external input were the same for the three populations). The next step is to explore different ways to create an asymmetry in the parameter settings that reproduces the dominance durations and percept probabilities observed in experiments (see Figure 3). We explore asymmetry in both the input strength IC ≠ Display Formula = Display Formula and the inhibition strength β1 ≠ β2 in Equation 1.

We chose a point in the relevant region of parameters (black dot in the gray area of Figure 9D) for noise strength σ = 0.08, corresponding to adaptation γ = 0.15 and input strength I = 0.95. First, we keep the input fixed at I = 0.95 and vary the inhibition strength between the two transparent percepts β2 while leaving β1 = 1 fixed (Figure 11A, B). Then we keep β1 = β2 = 1 fixed and vary the input strength to the coherent population while keeping Display Formula = Display Formula = 0.95 fixed (Figure 11C, D).

Asymmetric input strength leads to larger dominance durations and number of occurrences for the percept with higher input. Asymmetric inhibition strength causes asymmetry in the dominance durations and number of occurrences in opposite directions (Figure 11). Hence, for α = 120 we assume the same input strength for the three percepts (I = IC = Display Formula = Display Formula = 0.95) and we introduce unbalanced inhibition between coherent and transparent percepts (β1 = 1 and β2 = 1.05). This imbalance yields higher percept probability for coherence along with longer dominance durations for the transparent percepts (Figure 3A, B).

As α decreases, we keep the inhibition unbalanced but, in addition, we assume unbalanced input: The input strength to the coherent population IC increases with α while the input strength to the transparent populations Display Formula and Display Formula decreases, yielding longer dominance durations and percept probabilities for coherence (Figure 3A, B).

We review here some experiments presented in two conferences by J-MH (Society for Neuroscience 2009, Chicago, IL, USA; Hupé, J. M., & Juillard, V. A. Buildup of visual plaid segmentation and auditory streaming may be explained by the perception of these ambiguous stimuli being tristable rather than bistable. Abstract No. 652.16; Vision Sciences Society 2010, Naples, FL, USA: Hupé, J. M. Dynamics of menage a trois in moving plaid ambiguous perception. Journal of Vision, 10: 1217[abstract]). Part of this data set was also included in the study by Hupé and Pressnitzer (2012), where the exact parameters and protocols were described; overall, they were very similar to those of the present study. Briefly, 25 subjects reported continuously the three possible percepts of red/green plaids displayed for 1 min (for transparent motion they had to indicate whether the red or the green grating was in front). The pattern could move in eight possible directions (when perceived as coherent)—four cardinal (right, left, up, down) and four oblique (45° from a cardinal axis)—as in work by Hupé and Rubin (2004). The angle α was either 105°, 115°, or 125°. There were only two repetitions of each stimulus.

Results

Switches between two transparent states were interleaved with a coherent percept more often than expected, given the percept probabilities, similar to what we observed in Figure 3. Importantly, this was still the case when the analysis was restricted to oblique directions. After a transparent percept (N = 2,678), the probability of the next percept's being coherent was 0.73, while the coherent percept was experienced on average 45% of the time (compare with Table 1 and Figure 3C; these values are close to those obtained for α = 100). When the analysis was performed using only trials with oblique directions, the probability of the next percept's being coherent after a transparent percept (N = 1,202) was 0.65, while the coherent percept was experienced on average 38.5% of the time (less than for cardinal directions—50%—due to the “oblique plaid effect”; see Hupé & Rubin, 2004). These values are close to those obtained for α = 120 (compare with Table 1 and Figure 3C).

Switch-back probability in T1CT2 sequences was strongly related to the coherent-percept duration (Figure 12A, left), as well as to the duration of T1 (Figure 12B, left). The results were similar when the analysis was restricted to oblique directions (Figure 12A, B, right).

Proportion triplets of the form T1CT2 that are a switch back (i.e., T1 = T2) as a function of the duration of the intermediate coherent percept C (A) and the first transparent percept T1 (B) for all directions (left) and oblique directions (right). Here, n is the number of durations included in each bin. R2 is the coefficient of determination, and p is the p-value.

Figure 12

Proportion triplets of the form T1CT2 that are a switch back (i.e., T1 = T2) as a function of the duration of the intermediate coherent percept C (A) and the first transparent percept T1 (B) for all directions (left) and oblique directions (right). Here, n is the number of durations included in each bin. R2 is the coefficient of determination, and p is the p-value.

Notice that the main results and conclusions presented in this article are robust to changes in the direction of moving plaids; compare Figure 5 with Figure 12. We take care in interpreting these results because the durations of the coherent percept C (though not T1) were also strongly correlated with the steady-state probabilities of each percept (here estimated by the proportion of time the coherent percept was reported in each trial, as in Table 1). Therefore, some element to the relationship in Figure 12A between coherent durations and switch-back probabilities could be due to a change of input balance rather than duration per se. Although this is just speculation, such a causal relationship likely plays a role, because switch-back probability reached values well above 0.5 (the chance level) for long coherent durations (see Figure 12A).

(A) Visual plaids consist of two superimposed gratings whose normal vectors differ by an angle α (VP). Representation of different interpretations for visual plaids: coherent motion (C) and transparent motion (T). Transparent motion is ambiguous with respect to depth ordering and admits two different interpretations: with the grating moving to the left perceived on top (TL) and with the grating moving to the right perceived on top (TR). (B) Tristability refers to coherent (C), transparent left (TL), and transparent right (TR) percepts, which we identify with the colors red, blue, and green, respectively.

Figure 1

(A) Visual plaids consist of two superimposed gratings whose normal vectors differ by an angle α (VP). Representation of different interpretations for visual plaids: coherent motion (C) and transparent motion (T). Transparent motion is ambiguous with respect to depth ordering and admits two different interpretations: with the grating moving to the left perceived on top (TL) and with the grating moving to the right perceived on top (TR). (B) Tristability refers to coherent (C), transparent left (TL), and transparent right (TR) percepts, which we identify with the colors red, blue, and green, respectively.

Network architecture for the neuronal competition model with direct mutual inhibition. Each population activity is correlated to a different percept: coherent (C), transparent right (TR), or transparent left (TL). Each population receives an excitatory deterministic input of strength Ii and independent noise ni. Spike-frequency adaptation is present in each population. Lines with circles represent inhibitory connections of strength βi between the three competing populations.

Figure 2

Network architecture for the neuronal competition model with direct mutual inhibition. Each population activity is correlated to a different percept: coherent (C), transparent right (TR), or transparent left (TL). Each population receives an excitatory deterministic input of strength Ii and independent noise ni. Spike-frequency adaptation is present in each population. Lines with circles represent inhibitory connections of strength βi between the three competing populations.

Statistics of switching: dependence on parameter α for psychophysics experiments (top) and for model simulations (bottom). (A) Mean of the natural logarithm of the dominance durations expressed in milliseconds (seconds in parentheses; for the experimental data, N = 6,516 durations, some epochs were removed, see Methods). (B) Percept probabilities in each trial (proportion of number of occurrences (for the experimental data, N = 6,817 percepts). (C) Probability to switch to the coherent percept after a transparent percept (for the experimental data, N = 3,752 sequences). Bars represent the means, and error bars are plus and minus one standard error estimated by ANOVA models including the variable subject as a random factor (here and in all the subsequent figures). Parameter values for the model are given in Methods. We used the same parameter values throughout the article, unless stated otherwise.

Figure 3

Statistics of switching: dependence on parameter α for psychophysics experiments (top) and for model simulations (bottom). (A) Mean of the natural logarithm of the dominance durations expressed in milliseconds (seconds in parentheses; for the experimental data, N = 6,516 durations, some epochs were removed, see Methods). (B) Percept probabilities in each trial (proportion of number of occurrences (for the experimental data, N = 6,817 percepts). (C) Probability to switch to the coherent percept after a transparent percept (for the experimental data, N = 3,752 sequences). Bars represent the means, and error bars are plus and minus one standard error estimated by ANOVA models including the variable subject as a random factor (here and in all the subsequent figures). Parameter values for the model are given in Methods. We used the same parameter values throughout the article, unless stated otherwise.

The probability that two transparent percepts when interleaved with a coherent percept have the same depth pattern increases as the duration of the coherent percept lengthens and decreases as the duration of the preceding transparent percept lengthens. (A) Given a triplet of the form T1CT2 in the perceptual sequence for α = 100 (T1 and T2 stand for both TL and TR), we show the probability of a switch back, that is T1 = T2, as a function of the duration of the coherent percept (B) and the first transparent percept (C). Triplets are ordered according to the dominance durations of the intermediate coherent percept (B) or the first transparent percept (C) in the triplet and then grouped in 10 bins of equal size (for experiments, n = 100, except for the last bin, n = 107). The coordinates of each dot are the middle point of each bin and the proportion of triplets in that bin that are a switch back. Notice that the probability of a switch back is on average below the chance level of 0.5. The red line is the linear regression and the blue line is the sigmoid fit; r is the correlation coefficient, and p-values are below or about 0.01 for both experiments and model. Fits for experimental data were obtained by excluding the first data point (see text).

Figure 5

The probability that two transparent percepts when interleaved with a coherent percept have the same depth pattern increases as the duration of the coherent percept lengthens and decreases as the duration of the preceding transparent percept lengthens. (A) Given a triplet of the form T1CT2 in the perceptual sequence for α = 100 (T1 and T2 stand for both TL and TR), we show the probability of a switch back, that is T1 = T2, as a function of the duration of the coherent percept (B) and the first transparent percept (C). Triplets are ordered according to the dominance durations of the intermediate coherent percept (B) or the first transparent percept (C) in the triplet and then grouped in 10 bins of equal size (for experiments, n = 100, except for the last bin, n = 107). The coordinates of each dot are the middle point of each bin and the proportion of triplets in that bin that are a switch back. Notice that the probability of a switch back is on average below the chance level of 0.5. The red line is the linear regression and the blue line is the sigmoid fit; r is the correlation coefficient, and p-values are below or about 0.01 for both experiments and model. Fits for experimental data were obtained by excluding the first data point (see text).

Durations of T1 and C for T1CT2 sequences for experiments (A) and model (B). For experimental data, plot durations were normalized, independently, by the median durations for each subject. Correlations are absent or small. Here, N is the number of dots, R2 is the coefficient of determination, and p is the p-value.

Figure 6

Durations of T1 and C for T1CT2 sequences for experiments (A) and model (B). For experimental data, plot durations were normalized, independently, by the median durations for each subject. Correlations are absent or small. Here, N is the number of dots, R2 is the coefficient of determination, and p is the p-value.

Dynamical properties of the model. (A) Schematic representation of the mechanism underlying transitions between suppressed and dominant states. The height of the bars indicates the total input to the input–output function for each population at three different times indicated in (B). (B) Time courses of the total input minus the noise term ni (Ba), activity (Bb), and adaptation (Bc) of the three populations: coherent (red), transparent right (green), and transparent left (blue). The horizontal line in (A) and (Ba) corresponds to the threshold (θ = 0.2) of the input–output function [inset, (A)]. When the bar is above the threshold, the population is active (activity near 1), and when it is below the threshold, the population is suppressed (activity near 0)—see the corresponding times in (Ba) and (Bb). The arrow indicates the effect of adaptation on the total input. A downward arrow indicates that adaptation is increasing for the active population, reducing the total input for that population [see the corresponding times in (Ba) and (Bc)]. An upward arrow indicates that adaptation is decreasing for the suppressed population, increasing the total input for that population [see the corresponding times in (Ba) and (Bc)]. Notice that adaptation drives the input level closer to the transition threshold but still well above or below it.

Figure 7

Dynamical properties of the model. (A) Schematic representation of the mechanism underlying transitions between suppressed and dominant states. The height of the bars indicates the total input to the input–output function for each population at three different times indicated in (B). (B) Time courses of the total input minus the noise term ni (Ba), activity (Bb), and adaptation (Bc) of the three populations: coherent (red), transparent right (green), and transparent left (blue). The horizontal line in (A) and (Ba) corresponds to the threshold (θ = 0.2) of the input–output function [inset, (A)]. When the bar is above the threshold, the population is active (activity near 1), and when it is below the threshold, the population is suppressed (activity near 0)—see the corresponding times in (Ba) and (Bb). The arrow indicates the effect of adaptation on the total input. A downward arrow indicates that adaptation is increasing for the active population, reducing the total input for that population [see the corresponding times in (Ba) and (Bc)]. An upward arrow indicates that adaptation is decreasing for the suppressed population, increasing the total input for that population [see the corresponding times in (Ba) and (Bc)]. Notice that adaptation drives the input level closer to the transition threshold but still well above or below it.

Transparent percepts are significantly shorter when preceded by a coherent percept for the hierarchical model. (A) Model architecture for a hierarchical model. The competition is split into two competitions: coherent versus transparent and, within transparent, TL versus TR. (B) Histogram of dominance durations of transparent percepts TL and TR showing abundance of short durations and reduction of the most frequent ones. The distributions are no longer gamma or log-normal. Compare with histograms in Figure 4. (C) Normalized dominance durations (normalized by the median) for transparent percepts observed before a coherent percept (BC) and before the opposite transparent percept (BT) for the hierarchical model (HM). In the hierarchical model, the duration of a transparent percept preceding the coherent one is forced shorter when the latter becomes active, which is not observed in the nonhierarchical model or in the experiments, which in fact show the opposite trend (results not shown).

Figure 8

Transparent percepts are significantly shorter when preceded by a coherent percept for the hierarchical model. (A) Model architecture for a hierarchical model. The competition is split into two competitions: coherent versus transparent and, within transparent, TL versus TR. (B) Histogram of dominance durations of transparent percepts TL and TR showing abundance of short durations and reduction of the most frequent ones. The distributions are no longer gamma or log-normal. Compare with histograms in Figure 4. (C) Normalized dominance durations (normalized by the median) for transparent percepts observed before a coherent percept (BC) and before the opposite transparent percept (BT) for the hierarchical model (HM). In the hierarchical model, the duration of a transparent percept preceding the coherent one is forced shorter when the latter becomes active, which is not observed in the nonhierarchical model or in the experiments, which in fact show the opposite trend (results not shown).

Parameter regions for which the model's behavior satisfies various statistical constraints. The mean and standard deviations of dominance durations and the switch-back probabilities are in the relevant range in the presence of noise of strength (left) σ = 0.02 (center) σ = 0.08, and (right) σ = 0.12 for a symmetric model: same inhibition β1 = β2 and same input strength I = IC = = in Equation 1 for the three populations. Regions: blue in (A) where log-mean dominance durations are in the relevant range of 7.9 (2.6 s) and 8.9 (7.3 s); red in (B) where standard deviations are in the relevant range of 0.6 to 0.88; yellow in (C) where switch-back probabilities are in the relevant range of 0.13 to 0.42; and light gray in (D) where the three previous regions overlap. The white line indicates the border between the region with three attractors (each corresponding to a solution where one population is active and the other two are suppressed) and other dynamic regimes. Black regions correspond to those that fail to meet the criteria described as well as those where fewer than 100 percept durations were generated (less than 0.01 switches/s).

Figure 9

Parameter regions for which the model's behavior satisfies various statistical constraints. The mean and standard deviations of dominance durations and the switch-back probabilities are in the relevant range in the presence of noise of strength (left) σ = 0.02 (center) σ = 0.08, and (right) σ = 0.12 for a symmetric model: same inhibition β1 = β2 and same input strength I = IC = = in Equation 1 for the three populations. Regions: blue in (A) where log-mean dominance durations are in the relevant range of 7.9 (2.6 s) and 8.9 (7.3 s); red in (B) where standard deviations are in the relevant range of 0.6 to 0.88; yellow in (C) where switch-back probabilities are in the relevant range of 0.13 to 0.42; and light gray in (D) where the three previous regions overlap. The white line indicates the border between the region with three attractors (each corresponding to a solution where one population is active and the other two are suppressed) and other dynamic regimes. Black regions correspond to those that fail to meet the criteria described as well as those where fewer than 100 percept durations were generated (less than 0.01 switches/s).

Switch-back probability maps as input and strength of the slow negative feedback process vary [adaptation (A) and synaptic depression (B)] in the presence of noise of strength σ = 0.08. Notice that for synaptic depression, the range of switch-back probabilities is restricted to values around 0.5, whereas for adaptation, they show a wider range decreasing down to 0. The white line indicates the border between the region with three attractors (each corresponding to a solution where one population is active and the other two are suppressed) and other dynamic regimes. Small black regions in the upper central part correspond to those where fewer than 100 percept durations were generated (less than 0.01 switches/s).

Figure 10

Switch-back probability maps as input and strength of the slow negative feedback process vary [adaptation (A) and synaptic depression (B)] in the presence of noise of strength σ = 0.08. Notice that for synaptic depression, the range of switch-back probabilities is restricted to values around 0.5, whereas for adaptation, they show a wider range decreasing down to 0. The white line indicates the border between the region with three attractors (each corresponding to a solution where one population is active and the other two are suppressed) and other dynamic regimes. Small black regions in the upper central part correspond to those where fewer than 100 percept durations were generated (less than 0.01 switches/s).

Mean dominance durations (left column) and percept probabilities (right column) for coherent population (red), transparent left (blue), and transparent right (green). (A–B) Only the inhibition strength β2 between the two transparent percepts is varied; β1 is fixed at 1 and IC = ITL = ITR = 1. (C–D) Only the input strength IC to coherence is varied; = is fixed at 0.95 and β1 = β2 = 1.

Figure 11

Mean dominance durations (left column) and percept probabilities (right column) for coherent population (red), transparent left (blue), and transparent right (green). (A–B) Only the inhibition strength β2 between the two transparent percepts is varied; β1 is fixed at 1 and IC = ITL = ITR = 1. (C–D) Only the input strength IC to coherence is varied; = is fixed at 0.95 and β1 = β2 = 1.

Proportion triplets of the form T1CT2 that are a switch back (i.e., T1 = T2) as a function of the duration of the intermediate coherent percept C (A) and the first transparent percept T1 (B) for all directions (left) and oblique directions (right). Here, n is the number of durations included in each bin. R2 is the coefficient of determination, and p is the p-value.

Figure 12

Proportion triplets of the form T1CT2 that are a switch back (i.e., T1 = T2) as a function of the duration of the intermediate coherent percept C (A) and the first transparent percept T1 (B) for all directions (left) and oblique directions (right). Here, n is the number of durations included in each bin. R2 is the coefficient of determination, and p is the p-value.

Average percentage of the time each percept was reported. The percentages were computed in each 3-min trial (N = number of trials), starting from the first report of a transparent percept as in the study by Hupé and Rubin (2003). Numbers represent the average (and the range of values obtained across subjects) for the nine subjects (no trial or subject was excluded; “missing” trials were the few trials interrupted by the participants). Average of “no response” was −0.5% (range: −2.3% to 4.2%), negative sign corresponding mostly to button-press overlap.

Table 1

Average percentage of the time each percept was reported. The percentages were computed in each 3-min trial (N = number of trials), starting from the first report of a transparent percept as in the study by Hupé and Rubin (2003). Numbers represent the average (and the range of values obtained across subjects) for the nine subjects (no trial or subject was excluded; “missing” trials were the few trials interrupted by the participants). Average of “no response” was −0.5% (range: −2.3% to 4.2%), negative sign corresponding mostly to button-press overlap.