Thursday, May 24, 2007

Google Image Search Q&A

I’ve asked Google some questions in regards to their image search engine. Here are the answers by Radhika Malpani, Senior Staff Software Engineer at Google.

Google is confident they have the best web search, if you apply quality metrics through testing. Are you also confident you have the industry’s best image search engine too... and if so, why?

Yes. To create the best image search engine, we focus on comprehensiveness, relevance, and user experience. According to our internal testing, we score the highest when it comes to all 3 factors combined.

I’ve launched a new domain with many images around half a year ago... and half a year was also about the time it took for the first images to show up in Google (using a “site: example.com” query). Isn’t that quite slow? And what factors influence the speed of image indexing?

Many factors are involved before images show up in Google search results. One of the key factors is the PageRank of the site which the image is on.

You do a good job in filtering out adult images for non-adult search queries, whereas Yahoo’s image search is often showing adult stuff on non-adult terms. What approaches do you use to make your results safe?

This has been an important area of focus for us. We have created advanced proprietary technology that checks keywords and phrases, URLs, and the image itself.

Google Image search was temporarily showing in a new design for all users – a design which was revealing certain image information on mouse-hover only. Why did you guys decide to revert back to the old version?

Google is always working to improve search. Any changes we make focus on improving the experience for our users. In this particular case, users did not find the user experience in this experiment as helpful as the previous experience, and so we decided to end the experimental user interface.

In Google Images, you can choose between “large”, “medium”, and “small” images for results. What are the specific pixel dimensions behind these terms, e.g. where does “medium” end and “large” start?

Currently, the dimensions are small < 51 x 51, large >= 1024x768, and medium captures everything in between. As images continue to improve with the advances in digital photography, these dimensions will likely change over time.

What can you tell us about how Google ranks images? For websites, there’s e.g. PageRank... what’s the approach for images?

We use many signals to rank images. Some examples of signals include PageRank of the web site, text describing the image, and the image itself.

Do you have any tips for webmasters in regards to how they can get their images indexed best?

We recommend having high quality images and creating useful websites that accurately describe these images. We suggest that your alt and title tags accurately describe your images as well.

Do you have any suggestions to webmasters on how to use the “alt” and “title” attributes on the img element? Is there more than “just go by what the World Wide Web Consortium suggests"? And what importance do these attributes play in Google Image indexing?

We recommend that webmasters use these attributes to accurately help describe the image.

Yahoo provides a REST/XML API for server-side queries. Google allows people to use the AJAX APIs, but never offered a server-side solution for image results. What are the reasons for this?

We are always evaluating new ways to improve the user experience, but have nothing to share with regard to the use of the AJAX API for image search results at this time.

What do you think of developers screenscraping Google Images content for their own tools?

Do you detect duplicate images? Are there any penalties for duplicates?

We have the capability to detect exact duplicates (e.g, resolution, same size, same bits).

When can we expect image searches like yours to really “understand” image content, through image analysis, not textual keyword analysis? When can I enter “elephant” and Google finds an elephant even though the word “elephant” is not anywhere on that page (or contained in file-names, or backlinks pointing to the page)?

At some point, image search engines will be able to return results based solely on analyzing the content of the image but I think that is still a couple of years away.

How does Google decide when to directly display images in web search results? This happens in a search for e.g. “marissa mayer”, but not in a search for “eric schmidt”...

We have algorithms that determine when we think showing an image would be useful to our users. As with everything else, these algorithms are evolving, and we are continuously working to improve this experience.

How many image searches are performed on Google every month?

As a matter of policy, Google does not release these figures.

What do you think is the next big thing for image search?

The next big thing for image search would be the ability to search based on visual concepts, such as a picture of a house on a mountain with a river in front of it.