Abstract

Human observers localize events in the world by using sensory signals from multiple modalities. We evaluated two theories of spatial localization that predict how visual and auditory information are weighted when these signals specify different locations in space. According to one theory (visual capture), the signal that is typically most reliable dominates in a winner-take-all competition, whereas the other theory (maximum-likelihood estimation) proposes that perceptual judgments are based on a weighted average of the sensory signals in proportion to each signal’s relative reliability. Our results indicate that both theories are partially correct, in that relative signal reliability significantly altered judgments of spatial location, but these judgments were also characterized by an overall bias to rely on visual over auditory information. These results have important implications for the development of cue integration and for neural plasticity in the adult brain that enables humans to optimally integrate multimodal information.

Schematic illustration of single-modality and multimodality trials. The standard stimulus is shown on the left and the comparison stimulus is on the right. For simplicity, the comparison stimulus is shown only at one of the seven possible locations at which it could appear. (a) Auditory-only trial. (b) Visual-only trial. (c) Visual–auditory trial.

Results for one subject on the auditory-only trials. The horizontal axis shows the comparison locations (in degrees of visual angle away from the center of the workspace), and the vertical axis shows the percentage of trials in which the subject judged the comparison stimulus as depicting an event located to the right of the event depicted in the standard stimulus. The curve fitted to the data points is a cumulative normal distribution.

Results for one subject on the visual-only trials. The solid and dashed curves are cumulative normal distributions fitted to the data points in the lowest-noise and highest-noise conditions, respectively.

Results for one subject on the visual–auditory trials. The solid and dashed curves are cumulative normal distributions fitted to the data points in the lowest-noise and highest-noise conditions, respectively.

Average PSE over all ten subjects on the visual–auditory trials. The horizontal axis represents the visual noise level (1, lowest level; 5, highest level), and the vertical axis gives the average PSE in degrees of visual angle (the error bars give the standard errors of the means).

Average visual weights over all ten subjects on the visual–auditory trials. The horizontal axis represents the visual noise level (1, lowest level; 5, highest level), and the vertical axis gives the average visual weight (the error bars give the standard errors of the means).