Numerous studies of perceptual learning have demonstrated the potential for neural plasticity in adult visual cortex; however, the effect of sensory input from other modalities on such learning has been largely neglected. Considering that the natural environment is largely multimodal, and that inputs from other modalities can affect visual processing as early as V1 (Watkins, Shams et al. 2006), multisensory interactions may play a role in perceptual learning. For example, we recently found that training with sound facilitated coherent motion detection and discrimination (Seitz, Kim & Shams, 2006). In the current study, we trained subjects over five days on a visual motion coherence detection task with either visual, congruent audiovisual, or incongruent audiovisual stimuli. Consistent with our previous findings, when comparing performance on trials containing only visual signals, subjects trained with congruent audiovisual stimuli demonstrated significantly better learning compared to those trained with only visual stimuli. Subjects trained with incongruent audiovisual stimuli, however, did not show such a learning enhancement, and in fact did not demonstrate any significant learning. Thus, congruency between the audio and visual stimulus modulates the effect of sound on visual learning, suggesting that the benefits of multisensory training are not merely due to increased attention or arousal during training, but may result from interactions at a perceptual level.