As part of our research for our
post
on how we block search engines, we looked into which search engines
support which privacy standards. This information doesn’t seem to exist
anywhere else on the Internet, so below are our findings, starting with
the big guys, and moving towards more obscure or foreign search engines.

Google, Bing

Yahoo, AOL

Yahoo!’s search engine is provided by Bing. AOL’s is provided by Google.
These are easy ones.

Ask, Yandex, Nutch

Ask (known as teoma), and Yandex (Russia’s search engine, known as
yandex), support the robots meta tag, but do not appear to support the
x-robots-tag. Ask’s page on the topic is
here,
and Yandex’s is here.
The popular open source crawler, Nutch, also
supports the robots HTML
tag, but not the
x-robots-tag
header.
Update: Newer versions of Nutch now support x-robots-tag!

The Internet Archive, Alexa

The Internet Archive uses Alexa’s crawler, which is known as
ia_archiver. This crawler does not seem to support either the HTML
robots meta tag nor the x-robots-tag HTTP header. Their page on the
subject is here. I have
requested more information from them, and will update this page if I
hear back.

Duckduckgo, Blekko, Baidu

Duckduckgo and Blekko do not support either the robots meta tag nor the
x-robots-tag header, per emails I’ve had with each of them. I also
requested information from Baidu, but their response totally ignored my
question and was in Chinese. They do have some information
here, but
it does not seem to provide any information on the noindex value for the
robots tag. In any case, the only way to block these crawlers seems to
be via a robots.txt file.