Our contribution has the ability to learn visual categories from fewer images than previous approaches. We do this by modifying the pseudolabel method which augments labelled training images with unlabelled images, to create a method capable of handling labelled training images as well as queried images, which are likely to belong to the desired class. This is achieved by modifying the weighting and selection processes.The presented method adapts the pseudolabel approach to allow the use of web-scale datasets of millions of images. The results are demonstrated on a toy problem&start=0&order=1 devised from the SUN 397 dataset, and on the full SUN 397 dataset expanded with images gathered from Google&#8217;s image search without human intervention.

Abstract

This paper tackles the important unsolved problem of training deep models with small amounts of annotated data. We propose asemi-supervised self-training bootstrap to deep learning which retrieves and utilizes additional images from internet image search.We adapt the pseudolabel method proposed by Dong-Hyun Lee in 2013, previously used on the elementary MNIST handwrittendigit classification task. We show that by suitable modifications to its example weighting and selection mechanisms it can be adaptedto general image classification tasks supported by online image search.The proposed approach does not require any human supervision, it is practical and efficient, and it actively avoids overtraining.The usefulness of the proposed method is demonstrated on the SUN 397 dataset with only 50 training images per category. Whenexploiting results of Google's Image Search, we achieve a significant improvement, with a classification accuracy of 51%, asopposed to 39% without our method.