Archive for July, 2008

Because of the hoopla around cuil today, I thought I’d take a peek at this newest search engine’s referrers.

Cuil crawler info. I know I’ve been seeing this bot for the past year or so. Cuil’s crawler is apparently called twiceler (is that a pun?) and the user agent string uses cuill.com which 302 redirects to the cuil.com domain. As of this writing, the
cuil Webmaster info URL has been updated from what is in the bot’s user agent string.

If you happen to see a “&sl=long” appended after the referrer i.e. (http://www.cuil.com/search?q=cleverhack&sl=long), it indicates that the visitor was using the two column layout. If cuil ever gets significant marketshare, you can bet there will be SEO’s stressing about how their sites show in the two column vs. three column layout.

Otherwise, a cuil visitor presents in your visitor logs pretty much as any other visitor from the big search engines. The IP address belongs to the user (not a proxy like ask.com) and so does the user agent.

As for my thoughts about cuil, I am not impressed with the image thumbnails with the search results, as nearly all I have seen so far have been wildly inappropriate for the results. As for information volume, I haven’t done a statistical survey, but google still presents a volume of results as opposed to cuil.

So earlier today I was doing some catching up on Google Alerts for some domains that I manage.

And I kept on finding pages which were unusually formatted.

When I first noticed these pages the middle of last week, I took them for a stupidly overzealous SEO who was planting link farms on sites he owns.

Now, I don’t think so - after examining a number of these rogue SEO pages, it looks like someone is taking advantage of an exploit in Apache to post directories full of these rogue SEO pages, to boost their page rank (while adding outside links on these rogue pages to, I guess, appear genuine).

All of the pages I’ve found are on machines running Apache in shared hosting settings with poorly maintained / designed parent sites. That sure as heck points to exploit.

Since, like I noted before, the site is poorly maintained which means you can go ahead and browse the parent directories. The main Web site seems to be a homepage (created in Microsoft FrontPage) for a concert promoter in Allentown, PA. The hosting provider is E-Commerce, Inc. And this was just one, out of a number of pages that I found hosted by E-Commerce, Inc. I also found other pages on sites hosted by The Planet and, irony abounding, The Institute for Intelligence Studies at Mercyhurst College.