To Teach Computers to See, Give Them Imaginations

A team of MIT researchers offers a novel solution to an old computer science problem.

Computers regularly beat the human brain when it comes to challenges of speed, logic, and calculation. But when it comes to object recognition, humans are still the state-of-the-art. Eager to understand this uniquely human trait, a team of researchers from MIT is attempting to measure and understand the processes of imagination to see if it might be quantifiable—and whether it could possibly even be taught to a machine.

The human mind's capacity for imagination is pretty incredible. We can imagine an object in striking detail, even if it's something we've never seen, touched, tasted, smelled, or heard. Past experiences and a knowledge of the world around us help shape these realistic imaginative visions, but how this process works is not entirely clear.

The MIT researchers, led by computer scientist and doctoral student Carl Vondrick, began with a pretty simple experiment. They used white noise to produce a set of images that are nothing more than random arrangements of coloured blocks (visual white noise, basically). Then, they showed these white noise images to test subjects and asked them whether or not any of the images reminded them of a specific object.

In one instance, the target object was a car. For the most part, subjects saw nothing, but every once in a while one would identify a car in the white noise image. This image was set aside, and the subject moved on to consider the next white noise image.

Image: Vondrick et al.

The team assessed 100,000 images this way, showing the full set to workers on Amazon's Mechanical Turk and asking them to classify each image as a car or not. Though all of the images were just white noise, some test subjects managed to identify a car. Separate tests found the same result with different objects, like images of people, televisions, bottles, sports balls, and so on.

There were a few subtle, mainly cultural differences that emerged in the tests. When asked to identify a sports ball, for example, the colour of the ball varied in different parts of the world. In the United States, the ball was orangish, like a basketball, whereas in India it was a red ball. But regardless of these minor differences, the selected images were averaged such that a blurry template of the target object emerged.

When dealing with human cognition, this tool is called "classification images" and it's used in psychophysics to estimate the standard object templates used by the human visual system for recognition tasks. In this case, those templates are rooted in people's imaginations.

Because the collected image templates emerged from white noise, the team found that they could use these templates to train a machine's vision algorithm to see the same patterns and identify the same objects, like the car.

A vision algorithm with this imaginative template would be much better at recognizing cars than an algorithm built around recognizing one specific car. And with a large enough human-based data set, machines could be taught to imagine a wide variety of objects without ever having "seen" something itself.

Computers are still a long way from having human-like imaginations, but identifying and qualifying the way humans imagine things is a step in the right direction. The MIT researchers have published their work at the open-access arVix preprint server.