Some properties can be signaled by multiple sensory modalities. This creates an opportunity to benefit, in terms of heightened sensitivity, from redundancy. Here we show, for audio-visual rate perception, that benefits resulting from cross-modal redundancy involve metacognition. People judged which of two intervals had contained a more rapidly changing stimulus, defined either by luminance flicker, auditory flutter, or both. People then rated decisional confidence, as high or low. Overall, people were more sensitive in audio-visual trials than in either auditory or visual trials. This advantage was not, however, apparent for trials involving equal levels of confidence. High-confidence audio-visual performance was equivalent to high-confidence trials concerning the best uni-modal signal for that participant. Low-confidence audio-visual performance was equivalent to low-confidence uni-modal performance, averaged across presentation modality. As there was a high correlation between performance and confidence, these data suggest cross-modal facilitation was based on metacognitive processes – on accurate and reportable estimates of the precision with which rate had been encoded in either modality on a trial-by-trial basis. This would be advantageous overall, as cross modal presentation would enhance the probability of a disproportionately precise rate estimate having been encoded in one of the two modalities. This advantage would be lost for comparisons of high confidence trials, as audio-visual performance is being compared to uni-modal trials marked by similarly high levels of confidence and performance. Cross-modal benefits would be lost for comparisons of low-confidence trials, as low-confidence signals that no precise estimate has been encoded in either modality during audio-visual presentations. We suggest that cross-modal facilitation in other contexts will also involve metacognitive processes.