NYTimes: A Google Prototype for a Precision Image Search

April 28, 2008
A Google Prototype for a Precision Image Search
By JOHN MARKOFF

SAN FRANCISCO — Google researchers say they have a software technology
intended to do for digital images on the Web what the company’s
original PageRank software did for searches of Web pages.

On Thursday at the International World Wide Web Conference in Beijing,
two Google scientists presented a paper describing what the
researchers call VisualRank, an algorithm for blending
image-recognition software methods with techniques for weighting and
ranking images that look most similar.

Although image search has become popular on commercial search engines,
results are usually generated today by using cues from the text that
is associated with each image.

Despite decades of effort, image analysis remains a largely unsolved
problem in computer science, the researchers said. For example, while
progress has been made in automatic face detection in images, finding
other objects such as mountains or tea pots, which are instantly
recognizable to humans, has lagged.

“We wanted to incorporate all of the stuff that is happening in
computer vision and put it in a Web framework,” said Shumeet Baluja, a
senior staff researcher at Google, who made the presentation with
Yushi Jing, another Google researcher. The company’s expertise in
creating vast graphs that weigh “nodes,” or Web pages, based on their
“authority” can be applied to images that are the most representative
of a particular query, he said.

The research paper, “PageRank for Product Image Search,” is focused on
a subset of the images that the giant search engine has cataloged
because of the tremendous computing costs required to analyze and
compare digital images. To do this for all of the images indexed by
the search engine would be impractical, the researchers said. Google
does not disclose how many images it has cataloged, but it asserts
that its Google Image Search is the “most comprehensive image search
on the Web.”

The company said that in its research it had concentrated on the 2000
most popular product queries on Google‘s product search, words such asiPod, Xbox and Zune. It then sorted the top 10 images both from its
ranking system and the standard Google Image Search results. With a
team of 150 Google employees, it created a scoring system for image
“relevance.” The researchers said the retrieval returned 83 percent
less irrelevant images.

Google is not the first into the visual product search category. Riya,
a Silicon Valley start-up, introduced Like.com in 2006. The service,
which refers users to shopping sites, makes it possible for a Web
shopper to select a particular visual attribute, such as a certain
style of brown shoes or a style of buckle, and then be presented with
similar products available from competing Web merchants.

Rather than relying on a text query, the service focuses on the
ability to match shapes or objects that might be hard to describe in
writing, said Munjal Shah, the chief executive of Riya.

“I think what they’re trying to accomplish is largely impossible,” he
said. “Our belief is, there is not large-scale solutions.”

Mr. Shah said there had been a number of technology demonstrations by
Google Labs researchers, such as a project in 2005 that used machine
learning techniques to recognize the gender of a person in an image.
However, the company has been slow to deploy its research, he said.