Abstract

Contrastive learning between multiple views of the data has recently achievedstate of the art performance in the field of self-supervised representationlearning. Despite its success, the influence of different view choices has beenless studied. In this paper, we use empirical analysis to better understand theimportance of view selection, and argue that we should reduce the mutualinformation (MI) between views while keeping task-relevant information intact.To verify this hypothesis, we devise unsupervised and semi-supervisedframeworks that learn effective views by aiming to reduce their MI. We alsoconsider data augmentation as a way to reduce MI, and show that increasing dataaugmentation indeed leads to decreasing MI and improves downstreamclassification accuracy. As a by-product, we also achieve a newstate-of-the-art accuracy on unsupervised pre-training for ImageNetclassification ($73\%$ top-1 linear readoff with a ResNet-50). In addition,transferring our models to PASCAL VOC object detection and COCO instancesegmentation consistently outperforms supervised pre-training.Code:http://github.com/HobbitLong/PyContrast