Abstract：Deep learning has recently made a huge breakthrough in the field of computer vision. What makes it succeed is using a large amount of labeled data for supervised learning with deep neural networks. However, labeling a large-scale dataset is very expensive and time-consuming. To solve the large-scale dataset annotation issue, Apple's Shrivastava team tried to achieve unsupervised learning of simulated images with existing computer simulation techniques and adversarial training methods, thereby avoiding the expensive image annotation process. They had three innovations, namely a ‘self-regularization’ term, a local adversarial loss, and updating the discriminator using a history of refined images so that the real image is generated while retaining the input image features. The experiment results showed that the method can generate highly realistic images. The team also quantitatively analyzed the generated images by training a gaze estimation model and a hand posture estimation model. The results indicated a significant improvement over using synthetic images and achieved the state of the art on the MPⅡGaze dataset without any labeled real data. However, the researchers didn't conduct any experiment in complex scenarios involving multiple objects. The application of the proposed method still has limitations in complex scenarios.