WebmasterWorld Out Of Google & MSN

Well, that didn't take long. I wrote on Monday about how WebmasterWorld head
Brett Tabke decided to
ban all search
spiders including those from the major search engines in an effort to combat
bandwidth loss and server sluggishness due to rogue spiders. Brett figured he
had about 60 days until he'd see pages get dropped. It took two.

As of this moment,
site:webmasterworld.com at Google shows NO pages being listed from the site.
Prior to the ban, about 2 million pages were listed. Oddly, Google's not even
returning the site's home page using the listing out of the Open Directory.

In other words on Monday, as I recall, a search for
webmasterworld
brought up the WebmasterWorld home page with the title and description like this

That title and description was being pulled from the
Open Directory listing as you'll find over
here.

A search today for the same thing doesn't bring up the site at all. Yes,
WebmasterWorld banned Google from spidering it. However, that doesn't prevent
Google from listing at least the home page by making use of the Open Directory
information, which doesn't require spidering the WebmasterWorld web site.

Interestingly, checking the Google
Directory -- which is powered by the Open Directory -- there is no listing
for WebmasterWorld in the same exact
category as you'll see at the Open Directory. It suggests that the
robots.txt ban had the effect of pulling WebmasterWorld not only out of the
Google web search results but Google Directory listings as well. That
would be an entirely new thing I don't recall hearing happening before.

Checking with Dave Naylor, who's been
watching the situation, he suspects that this is indicative of Google
manually pulling everything about the site from Google.

Over at MSN,
site:webmasterworld.com brings back one match, but since it lacks a title
and description, this looks to be a listing of the WebmasterWorld home page
based on the fact MSN sees links to it, rather than having crawled it. Google
can and does do a similar thing, calling these "partially indexed" URLs. It's
not doing that for WebmasterWorld, however.

Should the pages have dropped so quickly? With Google, things might have been
helped along by the fact it has an
automatic page removal system.
Don't panic! It only works if a site has specifically put up a robots.txt file
blocking Google. People just can't come along and remove your pages unless you
yourself have installed such a robots.txt file.

Even if this hadn't happened (submission to the automated Google page removal
tool), I still
thought it
was way overly optimistic to assume a popular site like WebmasterWorld would be
allowed to retain pages after expressly banning spiders. MSN certainly has no
automated page removal system, and it matched Google in dropping pages.