Maggie Mae Mell, Thomas Naselaris, Medical University of South Carolina, United States

Abstract:

Encoding models based on feedforward convolutional neural networks (CNN) accurately predict BOLD responses to natural scenes in many visual cortical areas. However, for a fraction of voxels in all visual areas CNN-based models fail. Is the unexplained variance in these voxels just noise? We investigated this using voxel-to-voxel (vox2vox) encoding models that predict activity in a target voxel given activity in a population of source voxels. We found that linear vox2vox models increased prediction accuracy over CNN-based models for any pair of source/target visual areas, and recovered receptive field location even in voxels for which the CNN-based model failed. Vox2vox model prediction accuracy depended critically on the source/target pair: for feedforward models (source area lower in the visual hierarchy than target area) prediction accuracy decreased with hierarchical distance between source and target. It did not decrease for feedback models. In contrast, the same analysis applied across layers of a CNN did not reveal this feedforward/feedback asymmetry. We conclude that the variance unexplained by CNN-based encoding models is shared across visual areas, encodes meaningful information about the stimulus, and may be related to feedback connections that are present in the brain but absent in the neural network.