We present a Bayesian ideal observer model that estimates observer translation and rotation from optic flow and an extra-retinal eye movement signal. The model assumes a rigid environment and noise in velocity measurements, and that eye movement provides a probabilistic cue for rotation. The model can simulate human heading perception across a range of conditions, including: translation with simulated vs. actual eye rotations, environments with various depth structures, and the presence of independently moving objects.

Introduction

As the observer moves through the world, the relative motion of objects in the world creates an informative pattern of visual motion on the retina, referred to as optic flow. Optic flow is known to be a strong visual cue to self-motion (see Warren, 2004 for a review) and has been demonstrated to contribute to control of locomotion and steering (Li & Warren, 2002; Warren, Kay, Zosh, Duchon, & Sahuc, 2001; Wilkie & Wann, 2005).

Perception of self-motion from optic flow is influenced by nonvisual information about eye and head movements. Rotation of an observer's viewpoint while translating, such as due to pursuit eye movements, adds a rotational component to optic flow that can complicate estimation of self-motion. This has been referred to as the rotation problem, described in the next section. While optic flow could be sufficient to resolve this problem (Longuet-Higgins & Prazdny, 1980), there is psychophysical evidence that nonvisual cues to eye and head rotation also contribute to human perception of self-motion during gaze rotation (Banks, Ehrlich, Backus, & Crowell, 1996; Crowell, Banks, Shenoy, & Andersen, 1998; Royden, Banks, & Crowell, 1992).

The model is conceptually simple and has few parameters, yet is able to account for a range of findings from studies of human perception of self-motion. Because it is an ideal observer model, rather than a mechanistic model, its behavior reflects the properties of the sensory input and the model's computational assumptions and goals. There are various ways that a Bayesian computation might be implemented neurally (e.g., Ma, Beck, Latham, & Pouget, 2006). Our simulation results would be representative of any model that approximates Bayesian estimation and uses similar assumptions and goals, regardless of how the estimation was implemented. A further advantage of our approach is that the model can be easily extended to incorporate additional sensory information, such as vestibular input or static depth cues, as well as constraints and inter-relations between cues. Our model can thus provide a useful tool for generating and evaluating novel predictions regarding perception of self-motion from the multiple sources of information.

Problem of decomposing observer translation and rotation

The instantaneous optic flow field on the observer's retina can be described as a combination of radial motion due to the observer's linear translation and laminar motion due to rotation of viewing direction (Koenderink & Van Doorn, 1987; Longuet-Higgins & Prazdny, 1980). The translational component is a radial pattern centered at the observer's instantaneous heading direction. The rotational component can be due to pursuit eye and head movements, or due to rotations of the body as when moving along a curved path. The instantaneous flow field does not distinguish between these sources of rotation (Royden, 1994; Warren, Mestre, Blackwell, & Morris, 1991). Decomposing an optic flow field into translational and rotational components would therefore allow the observer's direction of self-motion to be determined, as well as the total amount of viewpoint rotation due to eye, head, and body rotations.

If depth variations are not present, however, the interpretation of optic flow becomes ambiguous (Longuet-Higgins, 1984). Figure 1 shows an example of two situations that produce identical instantaneous optic flow. In one case, the observer is translating in a slightly leftward direction toward a frontal plane while making a pursuit eye movement to fixate the central point (left panel). The same radial pattern of optic flow would be produced if an observer translates straight ahead toward a slightly slanted surface with no eye movement (middle panel). These are but two possibilities from a family of rigid interpretations of this flow pattern. If the observer's field of view is limited, translation without rotation toward a frontal plane also produces very similar optic flow. Thus, in the absence of depth variations, observer translation and rotation cannot be unambiguously determined from optic flow alone.

Two situations that would produce the same retinal optic flow. In the situation shown in the left panel, the observer is translating toward a frontal plane with heading 10° to the left while making a 5°/s pursuit eye movement to keep the center point fixated. The resulting optic flow field (right panel) has a focus of expansion at the fixated point in the center. The same instantaneous optic could be produced by translation straight ahead toward a slanted plane with no eye rotation (center panel). Thus, observer translation and rotation are ambiguous from this optic flow field.

Figure 1

Two situations that would produce the same retinal optic flow. In the situation shown in the left panel, the observer is translating toward a frontal plane with heading 10° to the left while making a 5°/s pursuit eye movement to keep the center point fixated. The resulting optic flow field (right panel) has a focus of expansion at the fixated point in the center. The same instantaneous optic could be produced by translation straight ahead toward a slanted plane with no eye rotation (center panel). Thus, observer translation and rotation are ambiguous from this optic flow field.

The rotational component within retinal optic flow is often due to pursuit eye movements, so nonvisual information about eye movements could potentially resolve the ambiguity in degenerate cases. Extra-retinal eye movement signals have been shown to contribute to perceived self-motion (e.g., Royden et al., 1992), and there is evidence that proprioceptive input from active head movements can provide a similar benefit (Crowell et al., 1998).

On the other hand, the rotational component of a retinal flow field is often not solely due to pursuit eye movements. Rotation of the body and rotation of the head relative to the body also contribute to the rotational component of retinal flow. For example, driving a car along a circular path produces a rotational component due to body rotation without a corresponding eye movement signal. Thus, it might not be advantageous to rely solely on eye movement signals to determine the amount of rotation in optic flow. In some conditions, observers can accurately perceive translation direction in the presence of rotation even when extra-retinal information provides a conflicting cue (e.g., Li, Chen, & Peng, 2009; Li & Warren, 2000, 2004).

In summary, the rotational component in optic flow could potentially be identified from either visual or nonvisual information. Human performance indicates that the contribution of nonvisual information depends on the ambiguity of optic flow information. In situations where translation and rotation are not well specified by optic flow alone, extra-retinal eye movement information has a large effect on the perceptual interpretation. When optic flow is richer and unambiguously specifies observer translation and rotation, extra-retinal information has less effect.

This qualitative finding fits well with a Bayesian approach to cue integration. Information from optic flow is represented as likelihoods of different combinations of translation and rotation, rather than as a single interpretation. The ambiguity of optic flow information is represented by the spread of likelihood functions. When likelihood functions from optic flow and extra-retinal cues are combined, the influence of extra-retinal cues naturally depends on the ambiguity of optic flow. This is an emergent property of a Bayesian estimation model. Our simulations demonstrate that such a model can account for not only human perception in the simulated rotation conditions but also a number of other previously observed phenomena in heading perception.

Model

Bayesian formulation

In this section, we describe the Bayesian formulation of our problem. The goal is to estimate an observer's instantaneous self-motion and the depth structure of the scene given sensory input. In our simulations, we consider only horizontal variations in heading and eye rotation, but the model can be applied to the more general case. Visual input consists of the retinal motion of points in the scene, represented as a set of motion vectors at different locations on the retina. Nonvisual information about eye or head movements is also known to contribute to human perception of optic flow (e.g., Crowell et al., 1998; Royden et al., 1992), so our model also assumes an extra-retinal signal indicating gaze rotation due to active visual pursuit.

The instantaneous motion of an observer through space can be described as a combination of linear translation and rotation of viewing direction. The visual velocity of a point produced by observer motion is a function of these parameters and the distance of the point from the observer. Thus, our model estimates a depth map representing egocentric distances of the points along with observer self-motion. Exact distances to points in the world cannot be resolved from the optic flow alone as the magnitude of the flow is determined by both observer speed and distance, so the depth map represents distances up to an overall speed scaling.

Given the sensory input from optic flow {v1, v2, …} and efferent copy E, our model estimates the most likely combination of observer translation T, rotation R, and the distances of points in the scene {z1, z2, …}. In a Bayesian formulation, this corresponds to maximizing the posterior probability, P(T, R, {zk}∣{vk}, E). Applying Bayes' rule, this posterior probability is proportional to the likelihood of the visual and extra-retinal sensory information given an observer motion and scene depth structure:

P(T,R,{zk}|{vk},E)∝⁢P({vk},E⁢|⁢T,R,{zk})·P(T,R,{zk}).

(1)

If all observer translations and rotations are assumed equally likely and independent of the depth of points in the world, this can be simplified to

P(T,R,{zk}|{vk},E)∝⁢P({vk},E⁢|⁢T,R,{zk})·P({zk}).

(2)

We further assume that velocity vectors are mutually independent and that the efferent copy is independent of the visual input. The likelihood function can then be expressed as a product of separate likelihood functions from flow information and from extra-retinal information:

P(T,R,{zk}|{vk},E)∝(∏kP(vk|⁢T,R,zk)·P(zk))·P(E⁢|⁢T,R,{zk}).

(3)

The probability P(zk) could represent a prior function on possible point distances, or information from other depth cues. For our simulations, we assume a uniform prior on depth. We also assume that the likelihood function of the efferent signal does not depend on depth of points. These and other assumptions are further discussed in a later section. The likelihood to be maximized therefore simplifies to

P(T,R,{zk}|{vk},E)∝(∏kP(vk|⁢T,R,zk))·P(E⁢|⁢T,R).

(4)

Figure 2 illustrates how the likelihood function from optic flow was computed. The likelihood of each T and R combination is a function of how well it fits the input optic flow field (Figure 2a). If velocity vectors are assumed to be independent, the global likelihood can be computed in a vector-by-vector manner (Figure 2b). Given a T and R, the expected velocity at a point is also a function of its depth, which is unknown. The likelihood of an input vector was computed for all possible depths, and the zk that maximized the likelihood was used as an estimate of the depth at that point (Figure 2c). The likelihood of an input velocity vector given T, R, and depth was computed using a generative model described in the next section.

Illustration of the computation of likelihood functions from optic flow. (a) The goal is to compute the likelihood for all possible combinations of observer translation T and rotation R given a set of optic flow vectors, {v1, v2, …}. The scaled depths of flow vectors, {z1, z2, …}, relative to observer speed, are also free parameters. (b) Likelihood of a (T, R) combination. Flow vectors are assumed to be independent, so the likelihood of an optic flow field {v1, v2, …} given (T, R) is the product of the likelihood of each individual flow vector vk. The figure shows examples of observed and expected flow vectors for a given T and R. In our simulations, we sampled points on a hex grid with 4° spacing. (c) Likelihood of an individual flow vector vk. The flow vector expected from translation and rotation in a rigid environment can be computed given T, R, and the scaled depth of the point zk. With further assumptions about sensory measurement noise, the likelihood of an observed vector vk can be computed from how much it deviates from an expected vector. The estimate of scaled depth zk was taken to be the value that maximized the likelihood of vk.

Figure 2

Illustration of the computation of likelihood functions from optic flow. (a) The goal is to compute the likelihood for all possible combinations of observer translation T and rotation R given a set of optic flow vectors, {v1, v2, …}. The scaled depths of flow vectors, {z1, z2, …}, relative to observer speed, are also free parameters. (b) Likelihood of a (T, R) combination. Flow vectors are assumed to be independent, so the likelihood of an optic flow field {v1, v2, …} given (T, R) is the product of the likelihood of each individual flow vector vk. The figure shows examples of observed and expected flow vectors for a given T and R. In our simulations, we sampled points on a hex grid with 4° spacing. (c) Likelihood of an individual flow vector vk. The flow vector expected from translation and rotation in a rigid environment can be computed given T, R, and the scaled depth of the point zk. With further assumptions about sensory measurement noise, the likelihood of an observed vector vk can be computed from how much it deviates from an expected vector. The estimate of scaled depth zk was taken to be the value that maximized the likelihood of vk.

To estimate the likelihood of a velocity vector given a combination of T and R, we assumed that visual measurements are corrupted by noise. Given T, R, and zk, expected velocity at a location is specified. However, because noise is present, other possible velocities would also have some probability.

Following Koenderink and Van Doorn (1987), and in accordance with the data collected by Crowell and Banks (1996), we assumed that sensory noise in motion direction is constant and that noise in speed perception is proportional to speed (i.e., Weber-fraction discrimination), except at very slow speeds (Crowell & Banks, 1996; McKee & Nakayama, 1984; Westheimer & Wehrhahn, 1994). Noise was assumed to be Gaussian distributed and centered around the input velocity vector on a tangent plane. For speed, the noise magnitude was σspeed = ∣vk∣ · (0.20 + 0.02/∣vk∣). For motion direction, the noise magnitude was σdir = 30° · (1 + 0.02/∣vk∣).

Note that these noise parameters do not directly correspond to human discrimination thresholds for speed and motion direction, which are an order of magnitude lower. Our model treats each motion vector as contributing independently, so combining many vectors reduces aggregate noise by √n. If the number of simulated motion vectors is large, an assumption of independence is probably not accurate and leads to predicted thresholds that are unrealistically small. To compensate for this, we assumed larger amounts of noise in estimates of individual motion vectors. For the velocity field sampling we used, the model's noise parameters predict heading discrimination thresholds of around 0.5°. Typical measured thresholds in comparable conditions would be 1–2° (Warren, Morris, & Kalish, 1988), and expert observers can achieve thresholds as low as 0.2° (Crowell & Banks, 1996).

The likelihood function for an extra-retinal signal specifying eye and head rotations, P(E∣T, R), was assumed to be independent of T and was modeled as a Gaussian distribution over R with a standard deviation of 1.5°/s. This parameter is not intended to directly model noise in the efferent and proprioceptive signals. Any such noise would be a factor in the relationship between E and R. However, an equally important factor is whether the rotational component in the flow is due to primarily pursuit eye and head movements, or if a significant component is due to rotation of the body. Typically, most of the rotation in retinal flow would be due to gaze pursuit. However, rotation can also arise from body rotation, such as when traveling along a curved path (see Circular paths section). When this occurs, the rotational component of optic flow would not directly correspond to the amount of gaze rotation. Thus, there is naturally occurring variation in the relationship between R and E, beyond the variability due to noise in the efferent copies. Our assumed distribution P(E∣T, R) was intended to model this variability. While the range of noise in efferent and proprioceptive feedback could be inferred from previous empirical work (e.g., Wertheim, 1981), we know of no direct basis for estimating the variability in R − E due to body rotation and path curvature. The 1.5°/s parameter value was arbitrary, chosen to provide realistic performance in simulated rotation conditions.

Simulation results

Single frontal plane, translation only

We first consider the case of observer translation toward a frontal plane with no eye movement. This situation produces a purely radial pattern of optic flow with a focus of expansion at the direction of instantaneous heading, which we simulated to be straight ahead (zero heading). If eye rotation is not known, the optic flow from a frontal plane is ambiguous (Longuet-Higgins, 1984); multiple combinations of observer translation and rotation could produce identical, or highly similar, velocity fields (see Figure 1).

Figure 3a shows the likelihood functions computed for this case. The left panel shows the likelihood computed from optic flow, P({vk}∣T, R). There is a range of combinations of heading and eye rotation that have nonzero likelihood, including the veridical interpretation of heading = 0° and no eye rotation. The likelihood is distributed along a line in the heading-rotation space, which corresponds to the set of valid rigid interpretations of the optic flow field. Thus, the ambiguity of optic flow in the special case of a frontal plane is captured by the likelihood function. Many possible combinations of heading and eye rotation could produce the input velocity field, and the likelihoods computed by our model are concentrated around this set of interpretations.

Likelihood functions computed for observer translation toward a frontal plane under different rotation conditions. (a) Translation with no gaze rotation. The instantaneous optic flow forms a radial pattern with focus of expansion (FOE) at the heading direction. The likelihood function computed from optic flow, P(vk∣T, R), has likelihood concentrated along a family of combinations of translation and rotation, corresponding to approximately rigid interpretations of the velocity field. When combined with likelihood from an extra-retinal eye movement signal P(E∣T, R), the result has a peak at the correct interpretation. (b) Translation with leftward rotation due to a pursuit eye movement. The instantaneous flow radiates from a point offset from the heading direction. The velocity field is again consistent with multiple combinations of translation and rotation. The likelihood from the extra-retinal signal is centered at the correct (nonzero) rotation rate, so the combined likelihood then has a peak at the correct interpretation. (c) Translation with simulated leftward rotation that is not due to an eye movement. In this case, the eye movement signal provides a conflicting cue that rotation is zero, rather than the actual simulated rotation. When this information is combined with the ambiguous likelihood function from optic flow, the result has a maximum at an incorrect interpretation of leftward heading and no rotation.

Figure 3

Likelihood functions computed for observer translation toward a frontal plane under different rotation conditions. (a) Translation with no gaze rotation. The instantaneous optic flow forms a radial pattern with focus of expansion (FOE) at the heading direction. The likelihood function computed from optic flow, P(vk∣T, R), has likelihood concentrated along a family of combinations of translation and rotation, corresponding to approximately rigid interpretations of the velocity field. When combined with likelihood from an extra-retinal eye movement signal P(E∣T, R), the result has a peak at the correct interpretation. (b) Translation with leftward rotation due to a pursuit eye movement. The instantaneous flow radiates from a point offset from the heading direction. The velocity field is again consistent with multiple combinations of translation and rotation. The likelihood from the extra-retinal signal is centered at the correct (nonzero) rotation rate, so the combined likelihood then has a peak at the correct interpretation. (c) Translation with simulated leftward rotation that is not due to an eye movement. In this case, the eye movement signal provides a conflicting cue that rotation is zero, rather than the actual simulated rotation. When this information is combined with the ambiguous likelihood function from optic flow, the result has a maximum at an incorrect interpretation of leftward heading and no rotation.

Even though optic flow is ambiguous in the case of a frontal plane, humans are able to make fairly accurate and reliable judgments of heading (e.g., Warren et al., 1988). It is likely that nonvisual information about eye movements is used to resolve this ambiguity. Our model treats an extra-retinal eye movement signal as a probabilistic cue for rotation. If rotation is typically due to eye movements, then the extra-retinal signal in this case indicates no rotation. The middle panel of Figure 3a shows the likelihood from an extra-retinal eye movement signal, P(E∣T, R), used in our model. The likelihood is concentrated around zero rotation. We assumed that visual and extra-retinal signals are independent, so P(E∣T, R) is a function solely of rotation.

In an ambiguous case like the optic flow from a frontal plane, an extra-retinal eye movement signal would have a strong influence on the perceptual interpretation. The right panel shows the effect of combining the model's likelihood functions from visual and extra-retinal cues. While the information from either cue is ambiguous by itself, the combined likelihood is concentrated around the veridical interpretation. The additional extra-retinal information serves to disambiguate the information provided by optic flow.

Single frontal plane, translation and rotation

Figure 3b illustrates the case of observer translation toward a frontal plane while making a pursuit eye movement. In this simulation, the observer is moving straight ahead toward a frontal plane, at distance 1.5 times observer speed (τ = 1.5 s), while rotating eyes 5°/s to the left. The leftmost likelihood function shows the likelihoods computed from optic flow alone. There is again a family of likely interpretations, which include the correct interpretation (heading = 0°, rotation = −5°).

Because the rotation is due to actual pursuit eye movements, extra-retinal cues would suggest that the optic flow should have a nonzero rotational component. We assume that extra-retinal information accurately indicates eye rotation rate, which in this case corresponds to the rotational component of the optic flow. The likelihood function from extra-retinal cues in this case is shown in the middle panel of Figure 3b. When visual and extra-retinal information is integrated, the result has a concentrated peak at the correct interpretation (right panel). Thus, when a rotational component is introduced due to actual eye movements, extra-retinal information allows the model to correctly decompose the optic flow into translational and rotational components even for the ambiguous case of a frontal plane.

We now consider the case where optic flow is a combination of translation and rotation, but the rotation is not due to eye movements. For example, suppose the same retinal flow as in Figure 3b was presented, but while the eyes are stationary. This corresponds to “simulated rotation” conditions tested in some previous studies (e.g., Royden et al., 1992; Warren & Hannon, 1988). In this case, extra-retinal cues do not correctly indicate rotation. Using extra-retinal input to resolve the ambiguity of optic flow would therefore lead to systematic error.

Figure 3c shows the likelihood functions from optic flow and extra-retinal information for this case. When the extra-retinal information is combined, the result is a likelihood function concentrated at an incorrect combination of translation and rotation. The model predicts that perceived heading will be strongly biased in the direction of rotation. For the case of translation and simulated rotation toward a frontal plane, human heading judgments show similar biases (Li et al., 2009; Royden et al., 1992; Stone & Perrone, 1997). In such cases, human observers tend to perceive their heading direction to be toward the focus of expansion on the screen, which corresponds to our model's interpretation.

Stone and Perrone (1997) further observed that these heading biases increase linearly as a function of simulated rotation rate, as is predicted theoretically (Koenderink & Van Doorn, 1987). This is also consistent with our model. Changing the simulated rotation rate would correspond to shifting the likelihood function from optic flow upward or downward, while leaving the likelihood function from extra-retinal signals unchanged. The heading estimate from combined information would shift in a linear manner.

Multiple depth planes

A single frontal plane is a degenerate case in which the instantaneous optic flow admits multiple possible rigid interpretations (Longuet-Higgins, 1984). When a scene has depth structure, optic flow has a unique rigid interpretation and it becomes possible theoretically to recover translational and rotational components from optic flow alone (first shown by Longuet-Higgins & Prazdny, 1980, see also Koenderink & Van Doorn, 1987). In this section, we consider the optic flow produced by translation and simulated rotation toward two separate frontal planes at different depths relative to the observer.

We first consider a situation in which the two planes have a modest amount of separation in depth, with τ = 1.5 s for nearer plane (as before) and τ = 2.25 s for the farther plane (Figure 4a). Adding the second plane in this case provides only a limited amount of depth contrast. As can be seen from the likelihood functions, the optic flow from each frontal plane by itself is ambiguous, consistent with an infinite set of possible observer translation and rotation combinations. The set of interpretations is different for the two planes because they have different τ. When the likelihood functions from the two planes are combined, the resulting likelihood is no longer ambiguous and has a peak at the correct interpretation. Thus, when flow information from different depths in the scene is available, the maximum likelihood interpretation from optic flow alone is disambiguated, and translational and rotational observer motions can be resolved.

Likelihood functions for observer translation and simulated rotation toward two planes at different depths. (a) Narrow range of depth (1.5:1). The three middle panels show likelihood functions computed from only the near plane, only the far plane, and from motion vectors from both planes. The likelihood from aggregate motion has a peak at the correct interpretation, but there remains a range of other interpretations that have likelihood. When combined with a conflicting extra-retinal signal, the resulting estimate is biased. (b) Wide range of depth (3:1). The optic flow from near and far planes are more distinct, and the likelihood from aggregate motion vectors is more concentrated around the correct interpretation. Integration with an extra-retinal signal results in less bias.

Figure 4

Likelihood functions for observer translation and simulated rotation toward two planes at different depths. (a) Narrow range of depth (1.5:1). The three middle panels show likelihood functions computed from only the near plane, only the far plane, and from motion vectors from both planes. The likelihood from aggregate motion has a peak at the correct interpretation, but there remains a range of other interpretations that have likelihood. When combined with a conflicting extra-retinal signal, the resulting estimate is biased. (b) Wide range of depth (3:1). The optic flow from near and far planes are more distinct, and the likelihood from aggregate motion vectors is more concentrated around the correct interpretation. Integration with an extra-retinal signal results in less bias.

Unlike the case of a single plane, an extra-retinal signal is not required to disambiguate the likelihood function from optic flow in this case. However, a conflicting extra-retinal signal would still have influence, because the optic flow does not strongly constrain the interpretation of observer motion. While the likelihood function from optic flow alone has a peak at the correct interpretation, there remains a wide range of T and R combinations with high likelihood. When a conflicting extra-retinal signal is added, the peak of the resulting likelihood (right panel) is biased away from the correct interpretation and toward the rotation rate indicated by the extra-retinal cue. The bias is reduced compared to the case of a single frontal plane, because the likelihood from optic flow is less ambiguous. However, because the depth range is small, the bias is not fully eliminated.

If the distance between two planes is increased, the interpretation of optic flow becomes less ambiguous, and a conflicting extra-retinal signal has less effect. Figure 4b shows model simulations for two planes that have larger depth separation, τ = 1.5 s and τ = 4.5 s. Again, the maximum likelihood interpretation from each frontal plane by itself is a family of T and R combinations. In this case, however, the likelihood functions of the two planes are more disparate and their combination results in a more concentrated likelihood. Because there is less spread in the likelihood function from optic flow alone, adding the same conflicting extra-retinal information has much less effect on the final estimates of T and R.

These simulations illustrate a general property of our model: estimated heading during simulated rotation becomes less biased as the range of depth within a scene becomes larger. Studies of human heading perception have found similar effects. Stone and Perrone (1997) tested perception of heading for simulated translation and rotation toward either one frontal plane or two frontal planes that were separated in depth (τfar/τnear = 2). Heading judgments remained biased in the two-plane condition, but biases were approximately half as large as in the single plane condition. These results are consistent with partial compensation for a conflicting extra-retinal signal, as in our simulations shown in Figure 4a. Li, Sweet, and Stone (2006) tested translation and rotation for a simulated environment with larger depth range, a cloud of random dots with time to contacts ranging from 1.5 s to 5 s, and found almost no bias in heading judgments. This is comparable to the performance of our model for two planes with large depth separation (Figure 4b). Li et al. (2009) tested cloud stimuli with various ranges of depth and observed systematic reduction in heading bias as the range of depth was expanded. When the cloud had a narrow range of depth (τfar/τnear = 1.5), heading biases were approximately 50% as large as expected for a frontal plane. When the cloud had a more extended range of depth (τfar/τnear > 8), biases were reduced to only 5%.

Our model shows similar reductions in bias as depth range is increased. Figure 5 plots heading bias for the case of translation and simulated rotation toward two planes as a function of separation between planes, expressed as a tau ratio. The figure also plots human results from Li et al.'s (2009) and Stone and Perrone's (1997) studies cited above, which are close to model predictions.

Heading bias due to simulated rotation as a function of depth range. Simulations were the same as in Figure 4, except that the distance of the far plane was parametrically varied. Depth range is expressed as the ratio of the time to contact of the near and far planes. Heading bias is expressed as a proportion of the bias expected from the near plane alone (i.e., if the near plane FOE was interpreted as heading). The solid line plots model simulation results. The open circles and square plot human results from Li et al. (2009) and Stone and Perrone (1997), respectively. The human data were normalized with respect to the FOE of the nearest points, and the tau ratios were based on the time to contact of the nearest and farthest points.

Figure 5

Heading bias due to simulated rotation as a function of depth range. Simulations were the same as in Figure 4, except that the distance of the far plane was parametrically varied. Depth range is expressed as the ratio of the time to contact of the near and far planes. Heading bias is expressed as a proportion of the bias expected from the near plane alone (i.e., if the near plane FOE was interpreted as heading). The solid line plots model simulation results. The open circles and square plot human results from Li et al. (2009) and Stone and Perrone (1997), respectively. The human data were normalized with respect to the FOE of the nearest points, and the tau ratios were based on the time to contact of the nearest and farthest points.

A ground plane provides variation in depth relative to the observer, so the optic flow from a ground plane would strongly constrain estimates of observer translation and rotation. Figure 6a shows the likelihood function computed for translation plus simulated rotation along a ground plane. Observer translation was straight ahead at 2 m/s and rate of rotation was 5°/s, which was comparable to the previous simulations. The nearest motion vectors sampled had a depth of 5 m (20° from horizon), and the farthest had a depth of 50 m (2° from horizon). As in the case of two planes separated in depth, likelihood is clustered around the correct translation and rotation. Adding a conflicting extra-retinal signal therefore has little effect on the final estimate.

Likelihood functions for observer translation and simulated rotation along a ground plane, with varied regions of the ground visible. (a) Wide range of depth (9–50 m). The likelihood from optic flow is concentrated around the correct interpretation, so addition of a conflicting extra-retinal signal produces little bias. (b) Only near regions of the ground (5–9 m). Although the likelihood from optic flow has a peak at the correct interpretation, it remains highly spread out. A conflicting extra-retinal signal would produce larger bias. (c) Only far regions of the ground (9–50 m). Rotation rate is better specified by optic flow when distant regions are visible, so a conflicting extra-retinal signal produces less bias than when only near regions are visible.

Figure 6

Likelihood functions for observer translation and simulated rotation along a ground plane, with varied regions of the ground visible. (a) Wide range of depth (9–50 m). The likelihood from optic flow is concentrated around the correct interpretation, so addition of a conflicting extra-retinal signal produces little bias. (b) Only near regions of the ground (5–9 m). Although the likelihood from optic flow has a peak at the correct interpretation, it remains highly spread out. A conflicting extra-retinal signal would produce larger bias. (c) Only far regions of the ground (9–50 m). Rotation rate is better specified by optic flow when distant regions are visible, so a conflicting extra-retinal signal produces less bias than when only near regions are visible.

The accurate performance of the model in this case might appear to conflict with studies of human heading perception, many of which have reported large biases for the case of translation and simulated rotation along the ground (Ehrlich, Beck, Crowell, Freeman, & Banks, 1998; Li & Warren, 2000, 2004; Royden, 1994; Royden et al., 1992; Royden, Cahill, & Conti, 2006). However, an important distinction must be made between perception of heading and perception of future path. When optic flow simulates travel on a straight path while rotating, observers often experience the illusion of traveling on a curved path. An interpretation of this illusion is discussed in a later section. For a curved path of travel, one's future path does not correspond to the direction of heading at an instant. Previous studies testing translation and simulated rotation along the ground have used path judgments. From such data, it is difficult to distinguish the effects of (1) inaccurate perception of heading and (2) illusory perception of path curvature. Furthermore, there is evidence suggesting that observers experience dual percepts in this cue conflict situation and are capable of judging either heading or future curved path (Li & Warren, 2004; Royden et al., 2006).

Li and Warren (2000) found that observers were capable of similar, near-accurate performance in conditions with either simulated or actual eye rotations when the simulated environment was a large-field textured ground plane. In this case, observers appeared to be judging heading, or future straight path. In the same study, Li and Warren (2000) also varied the visible range of depth on the ground, presenting either near, intermediate, or far regions. Reducing the range of depth caused judgments to be biased due to simulated rotation, and biases were largest when only the near region of the ground was visible.

Our model behaves similarly when the visible region of the ground is varied. Figures 6b and 6c show likelihood functions computed from optic flow for movement along a ground plane, with same translation and rotation as before, but using subsets of the motion vectors corresponding to either the near half of the visual field (5–9 m) or the far half (9–50 m). In both cases, the likelihood functions are less concentrated than when the full ground is visible (Figure 6a), so a conflicting extra-retinal signal has more potential to introduce bias. When only near regions are visible, the optic flow only weakly constraints the possible rotation, so the extra-retinal signal has a large effect. This is similar to the case of two planes with a narrow range described in the previous section. When only far regions are visible, rotation is more strongly constrained by optic flow information, so the extra-retinal signal has comparatively less effect. These simulation results are consistent with the human data from Li and Warren (2000).

Independently moving object

The presence of an independently moving object has been observed to cause biases in perceived heading (Royden & Conti, 2003; Royden & Hildreth, 1996; Warren & Saunders, 1995). Moving objects violate the constraint of rigidity, which is a basic computational assumption used to interpret optic flow. However, in the conditions tested in these previous studies, the instantaneous optic flow fields also have rigid interpretations. If observer translation and rotation are estimated from these flow fields using an assumption of rigidity, as in our model, the resulting heading estimate would be biased. Our model exhibits heading biases similar to those observed in human data.

We first consider a situation similar to the conditions tested by Warren and Saunders (1995), which is shown in Figure 7a. The background and moving object are both frontal planes. The heading of the observer relative to the background was zero. Because the object is moving, the heading of the observer relative to the object is different from the heading relative to the stationary background. We simulated a difference in heading of 5°. The moving object was simulated to be approaching at a faster rate than the background, with time to contacts of 0.75 s vs. 1.5 s.

Likelihood functions for observer translation toward a stationary frontal plane and an independently moving planar object that is approaching in depth. Middle panels show likelihood from background motion, object motion, and combined motion. In each case, the likelihood from combined motion has a concentrated peak that corresponds to an approximate rigid interpretation. (a) Object approaching faster than the background with heading 5° to the right. The bias based on the rigid interpretation is rightward, opposite the direction of object motion. (b) Object approaching slower than the background with heading 5° to the right. The bias based on the rigid interpretation is leftward, in the direction of object motion.

Figure 7

Likelihood functions for observer translation toward a stationary frontal plane and an independently moving planar object that is approaching in depth. Middle panels show likelihood from background motion, object motion, and combined motion. In each case, the likelihood from combined motion has a concentrated peak that corresponds to an approximate rigid interpretation. (a) Object approaching faster than the background with heading 5° to the right. The bias based on the rigid interpretation is rightward, opposite the direction of object motion. (b) Object approaching slower than the background with heading 5° to the right. The bias based on the rigid interpretation is leftward, in the direction of object motion.

Although the simulated scene is nonrigid because of the moving object, the instantaneous optic flow admits a rigid interpretation. Recall that translation and rotation toward two frontal planes at different depths, as discussed in a previous section, result in two radial patterns of motion with different FOEs and expansion rates. Approximately the same velocity field as in Figure 7a could be produced by observer translation toward two stationary planes at different depths, if heading was 10° to the right and the observer also rotated by 6.7°/s to the left. When our model is applied to this situation, the likelihood from optic flow is concentrated around this rigid interpretation, which could explain human heading biases.

Figure 7a shows likelihood functions computed separately from just the motion of the stationary background and from just the motion of the moving object. Because both the background and object are frontal planes, the likelihood functions from each component are similar to the examples shown previously in Figure 4. For both planes, there is a range of combinations of heading and rotation that have high likelihood. The relationship between rotation and heading is different, because the background and object have different time to contact. For the object motion, the heading corresponding to translation without rotation is also shifted by 5° (the FOE of the object).

The right panel of Figure 7a shows the combined likelihood function computed from both background and object motion, along with the extra-retinal eye rotation signal. The peak has a heading that is shifted toward the FOE of the object, opposite the lateral motion of the moving object, consistent with human heading biases observed by Royden and Conti (2003) and Warren and Saunders (1995).

We now consider a similar situation in which an independently moving object has the same offset FOE relative to the background but is approaching the observer at a slower rate than the background. The background had τ = 1.5 s, as before, but the moving object had τ = 3 s. The rigid interpretation of the combined optic flow would be a heading 5° to the left and rotation 3.3°/s to the right. Figure 7b shows the likelihood functions for this case. The likelihood function from combined background and object motion has a peak at the rigid interpretation. When further combined with extra-retinal information (right panel), the peak remains shifted relative to a correct interpretation of background motion. This corresponds to a heading bias in the direction of object motion, opposite the previous case. Thus, the predicted direction of bias reverses depending on whether the moving object approaches faster or slower than the background.

Royden and Conti (2003) similarly observed these two directions of bias in human performance when the rate of approach of a moving object was varied. Figure 8 plots the simulated heading bias of the model over a range of object tau's, together with human results from Royden and Conti (2003). There is close agreement between model and human results.

Heading bias due to an approaching moving object as a function of the object's rate of approach. The simulated object motion conditions were chosen to match those of Royden and Conti (2003). The moving object's FOE differed from the background FOE by 5°, and its rate of approach relative to the background, expressed as a time-to-contact ratio, varied between 0.1 and 3. The solid line plots heading estimates of the model, and open circles plot human results from Royden and Conti (2003).

Figure 8

Heading bias due to an approaching moving object as a function of the object's rate of approach. The simulated object motion conditions were chosen to match those of Royden and Conti (2003). The moving object's FOE differed from the background FOE by 5°, and its rate of approach relative to the background, expressed as a time-to-contact ratio, varied between 0.1 and 3. The solid line plots heading estimates of the model, and open circles plot human results from Royden and Conti (2003).

Another situation with an independently moving object that has been tested is when the object moves in a purely lateral direction relative to the observer. Human heading judgments show a bias in the direction of object motion, approximately proportional to object speed (Royden & Conti, 2003; Royden & Hildreth, 1996). Duffy and Wurtz (1993) reported a similar bias for superimposed radial and lateral motion fields.

Figure 9 shows the likelihood functions computed from the radial motion of the stationary background and from the lateral object motion. The object motion in this case is very similar to motion that would be produced by an eye rotation alone, or translation plus eye rotation if observer speed was low or the object was very distant. The likelihood function from optic flow reflects this similarity, differentiating between possible rotation rates but not between possible headings. Thus, the contribution of object motion is similar to providing an erroneous eye movement signal. Duffy and Wurtz (1993) proposed that constant lateral motion could provide a visual re-afferent signal for eye movement. The contribution of lateral motion in our model could be interpreted in a similar manner.

Likelihood functions for observer translation toward a stationary frontal plane and an independently moving planar object moving laterally relative to the observer. Middle panels show likelihood from background motion, object motion, and combined motion. In each case, the likelihood from combined motion has a concentrated peak that corresponds to an approximate rigid interpretation. The motion from the laterally moving object contributes acts like a cue for rotation, resulting in a leftward bias.

Figure 9

Likelihood functions for observer translation toward a stationary frontal plane and an independently moving planar object moving laterally relative to the observer. Middle panels show likelihood from background motion, object motion, and combined motion. In each case, the likelihood from combined motion has a concentrated peak that corresponds to an approximate rigid interpretation. The motion from the laterally moving object contributes acts like a cue for rotation, resulting in a leftward bias.

When background and object motion are combined, the maximum likelihood occurs for a heading that is shifted away from the background heading, in the direction of object motion (leftward in this example). This is consistent with the direction of bias in human judgments. Increasing the speed of a laterally moving object would increase the bias in an approximately linear manner, also consistent with human data (Duffy & Wurtz, 1993; Royden & Conti, 2003).

Thus, heading biases that have been observed in the presence of an independently moving object can be interpreted as rigid interpretations of a nonrigid scene. The scenes tested previously happen to have approximately rigid interpretations if one allows for uncertainty in the rotational component of optic flow. Our model exhibits the same pattern of biases observed in human data.

Discussion

Summary

We simulated a Bayesian ideal observer that estimates observer translation and rotation based on optic flow and an extra-retinal signal. An important property of the likelihood functions derived from optic flow is a dependence on depth variation. When a wide range of depth is present, translation and rotation are unambiguously specified by optic flow. As the range of depth is decreased, optic flow becomes more ambiguous. Our simulations capture this fundamental property, which allows us to model effects of depth structure observed in human heading judgments.

We treated extra-retinal information about pursuit eye or head movements as a probabilistic cue to rotation. Because we assume some uncertainty in rotation based on the extra-retinal signal, its contribution depends on the quality of information specified by optic flow. Our model can therefore account for why extra-retinal information strongly influences perceived self-motion in some cases (e.g., frontal plane) and has little effect in other cases (e.g., dense cloud). Because the estimate of observer rotation depends on both optic flow and extra-retinal information, it would not necessarily match the rate of pursuit. The difference provides a potential cue to path curvature, as discussed in a later section.

The analysis of optic flow in our model assumes a rigid environment. If a scene were nonrigid, the maximum likelihood estimate of observer translation and rotation under an assumption of rigidity would generally be inaccurate. Our simulations demonstrate that some previously observed biases due to an independently moving object could be explained in this manner.

Comparison to other heading models

A number of models have been proposed for how the visual system estimates translation and rotation from optic flow and how an extra-retinal signal could contribute to this analysis. In this section, we compare our ideal observer model to previously proposed models.

A neural network model by Lappe and Rauschecker (1993) is capable of estimating heading in the presence of rotation from optic flow alone. The model implements an algorithm developed by Heeger and Jepson (1990), which directly fits a heading direction to global optic flow under an assumption of rigidity, allowing for unknown rotation and depths of points. Our model is similar in that it also optimizes a measure of fit to an optic flow field under an assumption of rigidity, with both translation and rotation as free parameters. We estimate a combination of translation and rotation, rather than translation alone. The heading maps computed by Lappe and Rauschecker's model would be similar to likelihood functions computed by our model if the likelihoods were collapsed across possible rotations. For example, Lappe and Rauschecker (1993) show that reducing depth range causes their model's estimate of heading from optic flow to become more ambiguous, consistent with our simulations (Figures 4 and 5). Lappe (1998) generalized the earlier Lappe and Rauschecker's (1993) model to include extra-retinal input. As in our simulations, the extra-retinal input has a large effect when optic flow is ambiguous, and less effect when heading is well specified by optic flow alone.

Another class of models solves the problem of estimating translation in the presence of rotation by using global motion templates tuned to various combinations of translation and rotation. Perrone (1992) and Perrone and Stone (1994, 1998) first proposed such a template model. Another model by Beintema and Van den Berg (1998, 2001) also uses translation and rotation templates, combined with a biologically inspired mechanism for integrating extra-retinal signals. A generalized template model such as these could potentially be used to implement our model's estimation of translation and rotation from optic flow. Templates would represent possible observer motion parameters (T, R), with log likelihoods of flow vectors representing the fit to each template. To reduce the number of templates in their model, Perrone and Stone (1994, 1998) also assume that observers make pursuit eye movements to fixate a stationary object. This additional constraint has been criticized as inconsistent with human performance (Crowell, 1997) but could be modeled in a Bayesian framework in terms of the prior P(T, R) (see the later Observer motion prior section).

Another strategy for achieving invariance to rotation is based on analysis of differential motion. A model by Royden (1997) uses this approach, following earlier work by Longuet-Higgens and Prazdny (1980) and Rieger and Lawton (1985). In Royden's model, a radial optic flow pattern is fit to a representation of local differential motion, rather than directly to velocity vectors. The component of optic flow due to rotation is approximately constant across the visual field, so it does not affect differential motion. If differential motion can be computed, such as from difference in velocity of neighboring points at different depths, the result can be fit to a radial pattern. Royden (1997) proposed a first stage of local motion filters, modeled after area MT, to compute a representation of differential motion, followed by a second stage of motion pattern analysis. When local variations in the depth are present, the model can successfully recover heading in the presence of rotation.

Royden (1997) demonstrated that this model could account for effects of independently moving objects discussed previously and proposed an explanation similar to that presented here. Our simulations indicate that biases caused by a planar moving object would be shown by any model that attempts to estimate translation and rotation under an assumption of rigidity, if applied to these kinds of nonrigid stimuli.

A distinguishing feature of a model based on differential motion is that local variations in depth are required to successfully recover heading in the presence of rotation. Our model does not share this requirement, nor do the models of Lappe and Rauschecker (1993) and Perrone and Stone (1994, 1998). Depth variation within the scene as a whole is required for any model to decompose heading and rotation. Use of differential motion further requires that significant variation is present in local regions of an image. Psychophysical evidence indicates that local differential motion is not necessary to observe heading biases due to a moving object (Duijnhouwer, Beintema, Van den Berg, & Van Wezel, 2006), but that the effect is much larger when local contrast is present (Duijnhouwer et al., 2006; Duijnhouwer, Van Wezel, & Van den Berg, 2008; Royden & Conti, 2003).

Addition of extra-retinal information can be modeled in various ways. Our model is a likelihood computation, so integration of optic flow and extra-retinal information corresponds to simply multiplying likelihood functions (or adding log-likelihood functions). A template model with templates selective for translation and rotation could implement an extra-retinal signal in a similarly direct manner. In the neural network model of Lappe (1998), extra-retinal information is integrated by means of “pursuit” cells that encode eye movement, which respond to both optic flow and extra-retinal cues. Beintema and Van den Berg (1998, 2001) propose a “gain field” mechanism to construct optic flow templates that are dynamically modulated by eye movement signals, analogous to neural mechanisms involved in head-centric position coding. Either of these approaches could be used to implement an approximation to the ideal observer simulated here.

Crowell and Banks (1996) presented an ideal observer model for analysis of purely radial optic flow. Crowell and Banks did not attempt to model estimation of heading from optic flow in the presence of rotation, which was our goal. Rather, their analysis focused on the precision of heading estimation from optic flow and how precision depends on eccentricity and location of the FOE. We adopted a similar sensory noise model as Crowell and Banks (1996), and consequently, our model would exhibit similar qualitative performance for the special case of purely radial optic flow.

Circular paths

Moving along a curved path is a naturally occurring situation that produces optic flow with a rotational component that is not due to pursuit eye or head movements. In the previous examples, we have discussed simulated rotation as a cue conflict condition: the effect of pursuit eye movement is simulated without an actual eye movement. However, rotation could also arise naturally from turning of the body, as when traveling on a curved path. When driving a car along a curve, for example, one's body rotates smoothly relative to the outside environment. Body rotation contributes a rotational component to optic flow that is separate from the rotation due to any pursuit eye or head movements. Instantaneous rotation (R) in this case would not be equal to rotation indicated by extra-retinal signals (E).

Because one's body typically rotates with change in heading, the difference between R and E provides a potential cue to instantaneous change in heading, or path curvature. Body orientation is closely coupled to heading direction when driving, and probably when walking as well, so this would often be a reliable cue. Previous studies have found that simulating rotation while traveling on a straight path can produce an illusory percept of traveling on a curved path (e.g., Ehrlich et al., 1998; Royden et al., 1992). This is consistent with perceptual interpretation of rotation that is not due to eye movements, R − E, as rate of path curvature. Ehrlich et al. (1998) found that path judgments for this case were very similar to when travel on a circular path with equivalent rotation was simulated, consistent with this hypothesis.

Our model can be applied to estimating path curvature based on this rotational cue. The model uses combined information from optic flow and extra-retinal signals to estimate rotation, so R is not always equal to the actual amount of simulated rotation. If rotation is strongly specified by optic flow, as in the case of a ground plane with extended depth range (Figure 6a), the estimate of R would differ from E by the amount of simulated rotation. Perception of a curved path would be expected. This models the illusory perception of path curvature in the case of simulated rotation and straight translation.

If rotation were not specified by optic flow, such as in the case of a frontal plane (Figure 3), the estimate of rotation (R) would be determined by the extra-retinal signal (E). There would be no difference between estimated rotation and the rotation indicated by eye movements, R − E = 0. Based on this cue, one would expect no perception of curvature. This is consistent with human judgments for simulations of a curved path toward a frontal plane (Stone & Perrone, 1997). For intermediate situations, where optic flow only weakly specifies rotation, some curvature would be perceived, but less than the amount indicated by simulated rotation.

Our model predicts a relationship between biases in perceived heading and perceived curvature. For example, if the visible region of the ground is reduced, as in Figures 6b or 6c, conditions that produce heading biases would be expected to also produce systematic underestimation of curvature. This is a novel, testable prediction. Most previous studies of path perception were not designed to differentiate between biases in perceived heading and curvature. An exception is an experiment by Ehrlich et al. (1998), in which future path was judged at different distances. Based on these responses, both perceived heading and curvature were biased relative to veridical.

Model assumptions

We made a number of simplifying assumptions to derive the model and perform simulations. In this section, we discuss some of these assumptions in more detail.

Our model treats visual and extra-retinal signals as independent probabilistic cues, which allows the likelihood of combined information to be separated into visual and extra-retinal components (Equation 3). For this purpose, the key property required is that the uncertainties are statistically independent. The likelihood function P({v}∣T, R) depends on uncertainty in local motion estimates. Contributing factors might be local aperture problems, dynamic occlusion, or neural noise in early visual processing. The likelihood function P(E∣T, R) is a function of noise in extra-retinal signals and whether rotation is due to gaze pursuit movements or body rotation. These sources of variability are qualitatively distinct and involve different modalities, so there is no reason to expect strong dependencies.

Visual and extra-retinal cues are likely not independent in the sense of co-occurrence. Some combinations of optic flow and extra-retinal signals probably co-occur more than others. For example, when translating in a rightward direction it is probably more likely that eye and head movements are leftward (for gaze fixation on an environment point). This type of dependence, however, would be modeled as a nonuniform prior over the space of possible observer motions, P(T, R). Alternate prior functions are discussed in a later section. If noise in sensory measures of {v} and E were correlated, we could not decompose the joint likelihood function as in Equation 3. However, if co-variation in {v} and E were solely due to correlations in T and R, this can be captured by prior P(T, R).

Another simplifying assumption was to model the optic flow field as a collection of distinct motion vectors with independent noise. As discussed previously, this leads to unrealistic behavior when the number of motion vectors is increased. If motion estimates were truly independent, arbitrary precision in heading estimation could be achieved by simply increasing density. For human heading judgments, increasing density of motion (i.e., the number of dots in a random dot motion display) has diminishing benefit (Crowell & Banks, 1996; Warren et al., 1988). A more realistic model would treat the visual input as a dense field of motion estimates that have correlated measurement noise over some neighborhood around each sample location.

However, a more realistic model of local motion input would have little effect on the ability of our model to simulate the reported effects. Sampling density affects the spread of likelihood functions computed from optic flow but not the shape. When optic flow is ambiguous, likelihood would be concentrated around the family of translation and rotation combinations that correspond to approximate rigid interpretations, regardless of amount of noise. For example, decreasing the number of motion estimates would increase the spread of the likelihood function shown in Figure 3a, but likelihood would still be concentrated around the same 1D family of rigid interpretations. The spread of likelihood functions depends on another set of free parameters in our model: the assumed noise in estimates of local motion direction and speed. These parameters trade off. Increasing density of sampling has approximately the same effect as decreasing uncertainty in velocity estimates. Our approach was to choose a combination of parameters that results in a realistic precision in heading estimation for the simple case of a frontal plane with no rotation.

Another simplifying assumption was that all depths are equally likely, i.e., a uniform prior on depth. A prior that assigned lower probabilities to larger distances would better reflect ecological experience. We informally tested some other possible depth priors, P(z) ∼ 1/z and P(z) ∼ e−kz, and found that the prior had little effect on simulation results. If a motion vector is consistent with a combination of translation and rotation, it generally has high likelihood only for a small range of depths. Consequently, the effect of a nonuniform prior is primarily to scale the overall likelihood of points with near vs. far depths, which does not alter the shape of the combined likelihood distributions nor the final estimates of observer motion.

The choice of depth prior would have more effect for an alternate formulation of Bayesian estimation that marginalizes over depth. Our model fits a depth to each sample point and uses this depth to determine the likelihood of the corresponding motion vector. This corresponds to simultaneously estimating observer motion parameters and a scaled depth map. If the goal were to estimate only the observer motion parameters, an alternate approach would be to compute the likelihood of each motion vector by integrating over all possible depths:

P(v⁢|⁢T,R)=∫P(v⁢|⁢T,R,z)P(z)d⁢z.

(5)

We tested such a model and found that the choice of prior could have a significant effect on performance. For example, depending on the prior, the likelihood function from optic flow could be strongly weighted toward either low rotation rates or high rotation rates. However, if depths are inferred from the optic flow, as in our model, or if depths were accurately specified by other static cues, the choice of depth prior would have little effect.

Observer motion prior

During normal experience, some combinations of observer translation and rotation would be more common than others. Considered separately, some translation directions might be more frequent in general (e.g., straight ahead), and similarly some rotations may be more frequent overall (e.g., low rotation rates). There is likely some interdependency as well; some combinations of translation and rotation might be more common than would be expected based on the frequencies of translation and rotation considered separately. Such statistical regularities can be represented as a prior distribution over the set of possible observer translations and rotations.

The prior distribution P(T, R) describes the relative probabilities of different combinations of observer translation and rotation prior to considering the sensory information available at a specific instance, i.e., it represents information from previous experience. By definition, the prior is independent of sensory cues. It is also reasonable to assume that the prior on observer translation and rotation is independent of the prior on depths. The prior therefore contributes to a Bayesian estimate as if it were an additional, independent cue.

For our simulations, we assumed for simplicity that all translations and rotations are equally likely (i.e., that P(T, R) is a uniform distribution), so the prior function could be ignored. If P(T, R) were nonuniform, it would be multiplied with the likelihood function from combined sensory cues to obtain the estimated posterior probability function. This could have significant effect on estimates of observer motion. Figure 10 illustrates some priors that represent different statistical regularities in our experience of translations and rotations.

If we tend to look at objects in the general direction that we are moving, then translation directions near the direction of gaze would be more likely than eccentric translation directions. Figure 10a illustrates a prior function corresponding to this regularity. When combined with sensory information, this prior would tend to bias estimated heading toward the gaze direction. Human heading judgments often show an overall center bias (Ehrlich et al., 1998; Hanada & Ejima, 2000; Van den Berg, 1996; Warren & Saunders, 1995), which is consistent with this predicted effect.

During normal experience, rotations of different rates would also be expected to vary in frequency. A large portion of observer rotation is likely due to pursuit eye and head movements to maintain gaze fixation. The rotation rate required for pursuit depends on the distance of a fixated point and its direction relative to observer translation. If we often look at objects in our future path while moving, which would tend to be distant and near the heading direction, then low rotation rates would be proportionally more common. Figure 10b illustrates a prior that assigns higher probability to low rotation rates than high rotation rates.

When combined with sensory information, such a prior would act like a weak cue specifying zero rotation. In the situation in which extra-retinal information specifies nonzero rotation (actual eye movements), the combined influence of the prior and extra-retinal input would be equivalent to a single cue specifying a reduced rotation rate. For example, combining the likelihood function P(E∣T, R) from Figure 3b with the prior shown in Figure 10b results in a distribution concentrated around a rotation rate between E and zero. If such a prior were assumed, any compensation for pursuit eye and head movements based on extra-retinal information would therefore be incomplete.

For human motion perception, there is evidence that eye movement compensation is only partial (Freeman & Banks, 1998; Sumnall, Freeman, & Snowden, 2003; Van den Berg & Beintema, 2000) with an extra-retinal signal gain of about 50% (Bradley, Maxwell, Andersen, Banks, & Shenoy, 1996; Shenoy, Bradley, & Andersen, 1999; Warren, 2004). One way to model these results would be to simply assume an inaccurate extra-retinal signal. The peak of the likelihood function P(E∣T, R) would then be at R = aE rather than R = E, for some a < 1. However, a prior function P(T, R) that is weighted toward zero rotations, as in Figure 10b, has an equivalent effect, and thus provides a means to model partial compensation without assuming inaccurate extra-retinal information.

Partial compensation for active eye movements also provides another potential explanation for center biases in heading judgments. In situations where optic flow is ambiguous, extra-retinal information is essential to accurately estimate heading during active eye rotations (Figure 3c). If compensation for eye movements were only partial, due to the conflict between the extra-retinal signal and the prior, the result would be a heading bias toward the fixation point. When fixation is near the center of the screen, as would be common, such heading biases would be toward the center.

A prior on observer translation and rotation could also represent interdependency between observer translation and rotation. Because we typically fixate stationary points while moving, direction of translation and rotation would tend to be related: rightward translation would tend to co-occur with leftward rotation, and vice versa. Similarly, there is likely a relationship between heading eccentricity and magnitude of rotation. The correlation between translation and rotation would not be perfect. For rotations due to gaze pursuit, the expected relationship between T and R depends on distance of the fixation point, and body rotations due to path curvature may be largely independent of translation direction. Thus, a prior distribution that represents this interdependency would remain broad but would have proportionally higher probability for combinations of translation direction and rotation corresponding to straight travel while fixating.

Figure 10c shows an example of a prior function that incorporates interdependency between T and R. Overall, the probabilities of different translation directions are nonuniform and centered around zero, as in Figure 10a. The overall distribution of rotations is also nonuniform and weighted toward zero, as in Figure 10b. However, for this prior, the relative probabilities of different rotations depend on translation direction as well as rotation. In situations that are representative of typical experience, the influence of this prior would lead to less bias than a prior that was solely a function of rotation. In atypical situations, the prior could produce larger biases.

Additional vestibular information

Our model includes an extra-retinal cue E representing proprioceptive and efferent signals conveying eye and head movements used for active gaze pursuit. These extra-retinal signals have been the focus of most previous studies of self-motion perception from optic flow. However, the vestibular system provides another important source of nonvisual information about self-motion (Israel, Chapuis, Glasauer, Charade, & Berthoz, 1993; Telford, Howard, & Ohmi, 1995). Vestibular input can be treated as providing additional independent cues, which are easily incorporated in a Bayesian model.

Two components of vestibular information can be distinguished: cues to linear acceleration of the head provided by the otoliths (HT) and cues to rotation of the head provided by the semicircular canals (HR). The information provided by these cues can be modeled as additional likelihood functions, P(HT∣T, R) and P(HR∣T, R), which would be combined with the likelihood functions from optic flow and an extra-retinal gaze pursuit signal (Equation 4).

The likelihood function from linear acceleration cues, P(HT∣T, R), would depend on observer translation but not rotation. Considering solely the translational component of observer motion, the general effect of this additional information would be to shift the estimate of T from combined cues toward the translation direction specified by HT. The degree of shift would further depend on the relative uncertainty of HT and other sensory cues. “Weighting” of cues according to their reliability is a natural emergent property of a Bayesian estimation model (Knill & Saunders, 2003). This general behavior would be consistent with a recent study of visual-vestibular integration, which found that conflicting visual and vestibular cues to heading are perceptually weighted in a statistically optimal manner by both monkeys and humans (Fetsch, Turner, DeAngelis, & Angelaki, 2009).

Because our model estimates T and R jointly, and these parameters have correlated uncertainty, the addition of conflicting vestibular input could also have more complex effects. An example is illustrated in Figure 11. The likelihood function from combined optic flow and E (left panel, taken from Figure 3) has likelihood that is spread asymmetrically, concentrated around combinations of T and R that are consistent with the optic flow. Suppose that the vestibular cue to translation HT specifies a different heading direction (middle panel). When combined with the other sensory information, the conflicting vestibular cue would affect the estimate of rotation as well as translation. This situation might produce illusory perception of rotation or path curvature. Saunders and Knill (2001) observed an analogous interaction for the problem of perceiving 3D surface orientation from stereo and pictorial cues and used a similar Bayesian framework for interpretation.

Effect of a conflicting vestibular cue to observer translation (HT) in a situation where translation and rotation are not fully specified by optic flow ({vk}) and an extra-retinal eye signal (E). The peak of the combined likelihood function corresponds to a heading direction between the heading specified by the vestibular cue and by optic flow and E. The estimate of rotation would also be biased, due to the shape of the likelihood function from optic flow and E.

Figure 11

Effect of a conflicting vestibular cue to observer translation (HT) in a situation where translation and rotation are not fully specified by optic flow ({vk}) and an extra-retinal eye signal (E). The peak of the combined likelihood function corresponds to a heading direction between the heading specified by the vestibular cue and by optic flow and E. The estimate of rotation would also be biased, due to the shape of the likelihood function from optic flow and E.

Vestibular information about head rotation (HR) can be modeled in a similar manner as an extra-retinal signal specifying eye and head pursuit movements. The likelihood function P(HR∣T, R) would be a function of observer rotation but not translation. An important difference is that HR would be less closely coupled with overall observer rotation than E. For example, in situations where fixation of gaze during observer translation is accomplished mainly through eye movements, observer rotation would be accompanied by little or no physical rotation of the head. Thus, within our formulation, P(HR∣T, R) would be a broad distribution in comparison to P(E∣T, R) and would correspondingly have less influence of perception of observer translation and rotation. This would be consistent with the results of Crowell et al. (1998), who found that active and passive head movements produce differing amounts of pursuit compensation.

While HR might be a weak cue to overall observer rotation, it would conversely be a good indicator of the rotational component due to body rotation and path curvature. If our model were extended to estimate curvature as well as instantaneous translation and rotation, HR could be used as a direct cue. Passive translation and rotation without vision have been observed to produce an illusory perception of a curved path (Ivanenko, Grasso, Israel, & Berthoz, 1997), consistent with HR providing a cue to curvature.

Additional static depth information

Optic flow provides information about the depth structure of a scene, and our model estimates a scaled depth map in addition to observer motion parameters. In natural conditions, additional depth information is typically available from static cues. Our model can easily be extended to incorporate such information.

Suppose that in addition to the velocity of a point, vk, the visual input includes some independent estimate of depth, dk. The model would now estimate zk for a given T and R by maximizing the likelihood from combined visual information:

max⁢P(vk,dk|⁢T,R,zk)=P(vk|⁢T,R,zk)·P(dk|zk),

(6)

where the likelihood function P(dk∣zk) represents the information provided by the additional depth cue. The combined likelihood P(vk, dk∣T, R, zk) would be then used in the place of P(vk∣T, R, zk) in Equation 4. If dk is assumed to be independent, no other modifications to the model are required.

When moving through a rigid environment in natural conditions, information from various depth cues would be consistent, so one would expect the same estimate of depth with or without additional static depth cues. In experiment conditions, however, there are often conflicting depth cues. Most studies of self-motion perception have used monocular viewing, and when random dots are used to simulate motion along a ground plane, texture cues to scene depth are degraded or conflicting. Conflicting static depth cues would generally impair both the accuracy and reliability of our model's estimates.

These results are generally consistent with our model. Static depth information would be expected to provide benefit when it is able to resolve conflicts or ambiguity in the information provided by optic flow alone. The benefit from stereo cues reported by Van den Berg and Brenner (1994) was only in conditions with low signal to noise. Grigo and Lappe (1998) studied the case of heading biases due to superimposed lateral motion. Such heading biases can be interpreted as being due to a rigid interpretation of a nonrigid scene, as in the example shown in Figure 9. Adding stereo information that conflicts with the (erroneous) rigid interpretation would be expected to reduce bias, as Grigo and Lappe (1998) observed.

Ehrlich et al. (1998) studied the illusory perception of travel along a curved path in simulated rotation conditions and found that the magnitude of the illusion did not decrease with addition of stereo cues. In a previous section, we argued that these errors are not due to under-specification of observer translation and rotation but rather the use of R − E as a cue to path curvature. If the biases are not due to incorrect estimation of observer translation and rotation even in the case where static depth cues are conflicting, then it is not surprising that the additional binocular depth cues would have little effect.

Possible neural substrates

Neurophysiological evidence indicates that visual and extra-retinal cues about self-motion are integrated in multiple visual areas. Two brain areas in particular have been identified as functionally appropriate to compute self-motion: areas MST and VIP. These areas would be candidates for a neural implementation of a Bayesian estimation process.

Area MT has recently been found to also receive extra-retinal pursuit input (Nadler, Angelaki, & DeAngelis, 2008; Nadler, Nawrot, Angelaki, & DeAngelis, 2009). However, psychophysical evidence suggests that compensation for pursuit eye movements happens at the level of global patterns of motion (Van den Berg & Beintema, 2000), which would implicate areas with large receptive fields, like MST and VIP, rather than MT or earlier visual areas.

Recent studies have investigated the neural basis for heading perception from optic flow combined with vestibular translation cues. Responses of MST neurons during multimodal heading perception were consistent with weighted linear summation of different cues (Gu, Angelaki, & DeAngelis, 2008; Morgan, DeAngelis, & Angelaki, 2008; see Angelaki, Gu, & DeAngelis, 2009 for an overview of multisensory integration at the neural level), as would be expected from a Bayesian model of cue integration. This suggests that, even at the neural level, cue integration can be appropriately modeled as a Bayesian estimation process.

Conclusion

A Bayesian ideal observer that estimates observer translation and rotation, based on optic flow and an extra-retinal signal, was able to reproduce a variety of effects observed in human heading perception. The influence of extra-retinal information was shown to depend on the depth structure of the environment, as in human results. We were also able to model biases in perceived heading due to the presence of a moving object and how these biases depend on object motion.

The Bayesian framework used here is well suited to modeling integration of multiple sensory cues with varying degrees of ambiguity. The model is conceptually simple. Likelihood functions are computed for each cue based on generic assumptions and multiplied to simulate cue integration. Correlated heading and rotation errors, and the variable influence of the extra-retinal signal, are natural consequences of the structure of the likelihood functions. Our ideal observer model could be further extended to incorporate additional sensory information, such as vestibular input or static depth cues, as well as nonuniform prior assumptions.

Acknowledgments

This research was supported by a grant from the Hong Kong Research Grants Council (HKU 750209H). We thank two anonymous reviewers for their helpful comments and suggestions.

Klam F.
Graf W.
(2003). Vestibular signals of posterior parietal cortex neurons during active and passive head movements in macaque monkeys. Annals of the New York Academy of Sciences, 1004, 271–282.[CrossRef][PubMed]

Schaafsma S. J.
Duysens J.
(1996). Neurons in the ventral intraparietal area of awake macaque monkey closely resemble neurons in the dorsal part of the medial superior temporal area in their responses to optic flow patterns. Journal of Neurophysiology, 76, 4056–4068.[PubMed]

Tanaka K.
Saito H.
(1989). Analysis of motion of the visual field by direction, expansion/contraction, and rotation cells clustered in the dorsal part of the medial superior temporal area of the Macaque monkey. Journal of Neurophysiology, 62, 626–641.[PubMed]

Wilkie R. M.
Wann J. P.
(2005). The role of visual and nonvisual information in the control of locomotion. Journal of Experimental Psychology: Human Perception and Performance, 31, 901–911.[CrossRef][PubMed]

Two situations that would produce the same retinal optic flow. In the situation shown in the left panel, the observer is translating toward a frontal plane with heading 10° to the left while making a 5°/s pursuit eye movement to keep the center point fixated. The resulting optic flow field (right panel) has a focus of expansion at the fixated point in the center. The same instantaneous optic could be produced by translation straight ahead toward a slanted plane with no eye rotation (center panel). Thus, observer translation and rotation are ambiguous from this optic flow field.

Figure 1

Two situations that would produce the same retinal optic flow. In the situation shown in the left panel, the observer is translating toward a frontal plane with heading 10° to the left while making a 5°/s pursuit eye movement to keep the center point fixated. The resulting optic flow field (right panel) has a focus of expansion at the fixated point in the center. The same instantaneous optic could be produced by translation straight ahead toward a slanted plane with no eye rotation (center panel). Thus, observer translation and rotation are ambiguous from this optic flow field.

Illustration of the computation of likelihood functions from optic flow. (a) The goal is to compute the likelihood for all possible combinations of observer translation T and rotation R given a set of optic flow vectors, {v1, v2, …}. The scaled depths of flow vectors, {z1, z2, …}, relative to observer speed, are also free parameters. (b) Likelihood of a (T, R) combination. Flow vectors are assumed to be independent, so the likelihood of an optic flow field {v1, v2, …} given (T, R) is the product of the likelihood of each individual flow vector vk. The figure shows examples of observed and expected flow vectors for a given T and R. In our simulations, we sampled points on a hex grid with 4° spacing. (c) Likelihood of an individual flow vector vk. The flow vector expected from translation and rotation in a rigid environment can be computed given T, R, and the scaled depth of the point zk. With further assumptions about sensory measurement noise, the likelihood of an observed vector vk can be computed from how much it deviates from an expected vector. The estimate of scaled depth zk was taken to be the value that maximized the likelihood of vk.

Figure 2

Illustration of the computation of likelihood functions from optic flow. (a) The goal is to compute the likelihood for all possible combinations of observer translation T and rotation R given a set of optic flow vectors, {v1, v2, …}. The scaled depths of flow vectors, {z1, z2, …}, relative to observer speed, are also free parameters. (b) Likelihood of a (T, R) combination. Flow vectors are assumed to be independent, so the likelihood of an optic flow field {v1, v2, …} given (T, R) is the product of the likelihood of each individual flow vector vk. The figure shows examples of observed and expected flow vectors for a given T and R. In our simulations, we sampled points on a hex grid with 4° spacing. (c) Likelihood of an individual flow vector vk. The flow vector expected from translation and rotation in a rigid environment can be computed given T, R, and the scaled depth of the point zk. With further assumptions about sensory measurement noise, the likelihood of an observed vector vk can be computed from how much it deviates from an expected vector. The estimate of scaled depth zk was taken to be the value that maximized the likelihood of vk.

Likelihood functions computed for observer translation toward a frontal plane under different rotation conditions. (a) Translation with no gaze rotation. The instantaneous optic flow forms a radial pattern with focus of expansion (FOE) at the heading direction. The likelihood function computed from optic flow, P(vk∣T, R), has likelihood concentrated along a family of combinations of translation and rotation, corresponding to approximately rigid interpretations of the velocity field. When combined with likelihood from an extra-retinal eye movement signal P(E∣T, R), the result has a peak at the correct interpretation. (b) Translation with leftward rotation due to a pursuit eye movement. The instantaneous flow radiates from a point offset from the heading direction. The velocity field is again consistent with multiple combinations of translation and rotation. The likelihood from the extra-retinal signal is centered at the correct (nonzero) rotation rate, so the combined likelihood then has a peak at the correct interpretation. (c) Translation with simulated leftward rotation that is not due to an eye movement. In this case, the eye movement signal provides a conflicting cue that rotation is zero, rather than the actual simulated rotation. When this information is combined with the ambiguous likelihood function from optic flow, the result has a maximum at an incorrect interpretation of leftward heading and no rotation.

Figure 3

Likelihood functions computed for observer translation toward a frontal plane under different rotation conditions. (a) Translation with no gaze rotation. The instantaneous optic flow forms a radial pattern with focus of expansion (FOE) at the heading direction. The likelihood function computed from optic flow, P(vk∣T, R), has likelihood concentrated along a family of combinations of translation and rotation, corresponding to approximately rigid interpretations of the velocity field. When combined with likelihood from an extra-retinal eye movement signal P(E∣T, R), the result has a peak at the correct interpretation. (b) Translation with leftward rotation due to a pursuit eye movement. The instantaneous flow radiates from a point offset from the heading direction. The velocity field is again consistent with multiple combinations of translation and rotation. The likelihood from the extra-retinal signal is centered at the correct (nonzero) rotation rate, so the combined likelihood then has a peak at the correct interpretation. (c) Translation with simulated leftward rotation that is not due to an eye movement. In this case, the eye movement signal provides a conflicting cue that rotation is zero, rather than the actual simulated rotation. When this information is combined with the ambiguous likelihood function from optic flow, the result has a maximum at an incorrect interpretation of leftward heading and no rotation.

Likelihood functions for observer translation and simulated rotation toward two planes at different depths. (a) Narrow range of depth (1.5:1). The three middle panels show likelihood functions computed from only the near plane, only the far plane, and from motion vectors from both planes. The likelihood from aggregate motion has a peak at the correct interpretation, but there remains a range of other interpretations that have likelihood. When combined with a conflicting extra-retinal signal, the resulting estimate is biased. (b) Wide range of depth (3:1). The optic flow from near and far planes are more distinct, and the likelihood from aggregate motion vectors is more concentrated around the correct interpretation. Integration with an extra-retinal signal results in less bias.

Figure 4

Likelihood functions for observer translation and simulated rotation toward two planes at different depths. (a) Narrow range of depth (1.5:1). The three middle panels show likelihood functions computed from only the near plane, only the far plane, and from motion vectors from both planes. The likelihood from aggregate motion has a peak at the correct interpretation, but there remains a range of other interpretations that have likelihood. When combined with a conflicting extra-retinal signal, the resulting estimate is biased. (b) Wide range of depth (3:1). The optic flow from near and far planes are more distinct, and the likelihood from aggregate motion vectors is more concentrated around the correct interpretation. Integration with an extra-retinal signal results in less bias.

Heading bias due to simulated rotation as a function of depth range. Simulations were the same as in Figure 4, except that the distance of the far plane was parametrically varied. Depth range is expressed as the ratio of the time to contact of the near and far planes. Heading bias is expressed as a proportion of the bias expected from the near plane alone (i.e., if the near plane FOE was interpreted as heading). The solid line plots model simulation results. The open circles and square plot human results from Li et al. (2009) and Stone and Perrone (1997), respectively. The human data were normalized with respect to the FOE of the nearest points, and the tau ratios were based on the time to contact of the nearest and farthest points.

Figure 5

Heading bias due to simulated rotation as a function of depth range. Simulations were the same as in Figure 4, except that the distance of the far plane was parametrically varied. Depth range is expressed as the ratio of the time to contact of the near and far planes. Heading bias is expressed as a proportion of the bias expected from the near plane alone (i.e., if the near plane FOE was interpreted as heading). The solid line plots model simulation results. The open circles and square plot human results from Li et al. (2009) and Stone and Perrone (1997), respectively. The human data were normalized with respect to the FOE of the nearest points, and the tau ratios were based on the time to contact of the nearest and farthest points.

Likelihood functions for observer translation and simulated rotation along a ground plane, with varied regions of the ground visible. (a) Wide range of depth (9–50 m). The likelihood from optic flow is concentrated around the correct interpretation, so addition of a conflicting extra-retinal signal produces little bias. (b) Only near regions of the ground (5–9 m). Although the likelihood from optic flow has a peak at the correct interpretation, it remains highly spread out. A conflicting extra-retinal signal would produce larger bias. (c) Only far regions of the ground (9–50 m). Rotation rate is better specified by optic flow when distant regions are visible, so a conflicting extra-retinal signal produces less bias than when only near regions are visible.

Figure 6

Likelihood functions for observer translation and simulated rotation along a ground plane, with varied regions of the ground visible. (a) Wide range of depth (9–50 m). The likelihood from optic flow is concentrated around the correct interpretation, so addition of a conflicting extra-retinal signal produces little bias. (b) Only near regions of the ground (5–9 m). Although the likelihood from optic flow has a peak at the correct interpretation, it remains highly spread out. A conflicting extra-retinal signal would produce larger bias. (c) Only far regions of the ground (9–50 m). Rotation rate is better specified by optic flow when distant regions are visible, so a conflicting extra-retinal signal produces less bias than when only near regions are visible.

Likelihood functions for observer translation toward a stationary frontal plane and an independently moving planar object that is approaching in depth. Middle panels show likelihood from background motion, object motion, and combined motion. In each case, the likelihood from combined motion has a concentrated peak that corresponds to an approximate rigid interpretation. (a) Object approaching faster than the background with heading 5° to the right. The bias based on the rigid interpretation is rightward, opposite the direction of object motion. (b) Object approaching slower than the background with heading 5° to the right. The bias based on the rigid interpretation is leftward, in the direction of object motion.

Figure 7

Likelihood functions for observer translation toward a stationary frontal plane and an independently moving planar object that is approaching in depth. Middle panels show likelihood from background motion, object motion, and combined motion. In each case, the likelihood from combined motion has a concentrated peak that corresponds to an approximate rigid interpretation. (a) Object approaching faster than the background with heading 5° to the right. The bias based on the rigid interpretation is rightward, opposite the direction of object motion. (b) Object approaching slower than the background with heading 5° to the right. The bias based on the rigid interpretation is leftward, in the direction of object motion.

Heading bias due to an approaching moving object as a function of the object's rate of approach. The simulated object motion conditions were chosen to match those of Royden and Conti (2003). The moving object's FOE differed from the background FOE by 5°, and its rate of approach relative to the background, expressed as a time-to-contact ratio, varied between 0.1 and 3. The solid line plots heading estimates of the model, and open circles plot human results from Royden and Conti (2003).

Figure 8

Heading bias due to an approaching moving object as a function of the object's rate of approach. The simulated object motion conditions were chosen to match those of Royden and Conti (2003). The moving object's FOE differed from the background FOE by 5°, and its rate of approach relative to the background, expressed as a time-to-contact ratio, varied between 0.1 and 3. The solid line plots heading estimates of the model, and open circles plot human results from Royden and Conti (2003).

Likelihood functions for observer translation toward a stationary frontal plane and an independently moving planar object moving laterally relative to the observer. Middle panels show likelihood from background motion, object motion, and combined motion. In each case, the likelihood from combined motion has a concentrated peak that corresponds to an approximate rigid interpretation. The motion from the laterally moving object contributes acts like a cue for rotation, resulting in a leftward bias.

Figure 9

Likelihood functions for observer translation toward a stationary frontal plane and an independently moving planar object moving laterally relative to the observer. Middle panels show likelihood from background motion, object motion, and combined motion. In each case, the likelihood from combined motion has a concentrated peak that corresponds to an approximate rigid interpretation. The motion from the laterally moving object contributes acts like a cue for rotation, resulting in a leftward bias.

Effect of a conflicting vestibular cue to observer translation (HT) in a situation where translation and rotation are not fully specified by optic flow ({vk}) and an extra-retinal eye signal (E). The peak of the combined likelihood function corresponds to a heading direction between the heading specified by the vestibular cue and by optic flow and E. The estimate of rotation would also be biased, due to the shape of the likelihood function from optic flow and E.

Figure 11

Effect of a conflicting vestibular cue to observer translation (HT) in a situation where translation and rotation are not fully specified by optic flow ({vk}) and an extra-retinal eye signal (E). The peak of the combined likelihood function corresponds to a heading direction between the heading specified by the vestibular cue and by optic flow and E. The estimate of rotation would also be biased, due to the shape of the likelihood function from optic flow and E.