Computer mimics human ability to match images

In this file photo, tourists pose in front of the Eiffel Tower in Paris. A computer can match up photos and paintings based on the uniqueness of features such as the Eiffel Tower.

Every year, thousands of tourists stand in front of Paris' Eiffel Tower to have their picture taken, painted, or sketched. Though every image is different, each contains the sky piercing tower. Now, a computer can match up all those images based on that one identifying feature.

This could be useful, for example, to someone who is wondering how the Eiffel Tower and its surroundings have changed since their grandparents had their picture painted in front of it on their honeymoon. In this case, the computer could find a match to the painting by searching online for a modern match.

The technique differs from photo-matching methods that focus on similarities in shapes, colors, or composition, which work well when searching for exact or very close matches but fail when applied across domains, such as a picture taken in different seasons or a painting and a picture.

"The language of a painting is different than the language of a photograph," Alexei Efros, an associate professor of computer science and robotics at Carnegie Mellon University in Pittsburgh, Penn., said in a news release. "Most computers latch onto the language, not what's being said."

In the video below that explains how this all works, for example, a standard computer algorithm tasked to find images similar to a painting of a temple returns images of clouds and the ground that most closely match the image, not the temple that's of most interest to humans.

The goal of this work is to find visually similar images even if they appear quite different at the raw pixel level. This task is particularly important for matching images across visual domains, such as photos taken over different seasons or lighting conditions, paintings, hand-drawn sketches,

Efros and his colleagues programmed a computer to find the unique element that sets an image apart from others in a sample and then uses that uniqueness to match it with similar images.

The uniqueness is computed based on a dataset of randomly selected images. So, to use the Eiffel Tower example, the person standing in front of the tower is likely similar to other people in other photos and thus given little weight, but the tower itself is unlike anything else in the other photos.

Efros said the approach is the "best approximation" yet achieved to how humans compare images.

The technique can be used for automated image searches, for example, or combined with a GPS-tagged photo collection to determine where a particular painting of a landmark such as the Eiffel Tower was painted.

In the following video, the team shows off how the program can also be used to assemble what they call a "visual memex" — a dataset that explores the visual similarities and contexts of a set of photos. It shows a graph of 200 images of Medici Fountain, another Paris landmark, from various distances.

This video demonstrates Visual-Memex graph traversal. Graph is built using our similarity metric.