I'm interested in computer vision and all the learning problems that are
associated with it. In particular, I'm interested weak label and unsupervised learning.
In computer vision, the standard labels we use (e.g. bounding boxes, keypoint annotations)
not only tend to be expensive to collect, but they also tend to be a poor approximation
to what we actually know about images.
Yet some types of labels come cheaply: for example, GPS tags, web text,
and even raw image context. My work aims to show that these cues can provide
roughly the same information as manually collected labels, and allow us to
learn representations that are driven by the data, rather than by annotators.