So, I get this unknown to me referrer this morning from a Web site called Competitio.us and I just had to check it out. Competitio.us is another brand tracking Web site service currently in beta. From what I could gather on the Web site, you (as a brand or project manager) sign up for the service and then set up Competitio.us to monitor news and traffic statistics about your…competition. Get it?

As for design, Competitio.us is using XHTML Transitional and their markup looks pretty clean. The underlying technology platform appears to be Ruby on Rails on Linux. Their color scheme and font choices just work. I’m going to give them bonus points for minimizing the use of images on their public pages and their harnessing of the H1 and H2 tags. I think their front page looks really good in terms of how much copy they’ve integrated, although they’re not currently not showing on Google for “competitor research” a term I’d think they’d want to show for. They may want to look at the keywords in their copy and maybe introduce some link anchor text with those keywords. In addition, their images are not alt tagged. The title tag on their pages could use some keywords, along with the fact there are no meta description tags.

As for “hacking” Competitio.us, even with the referrer information I have, I cannot view how or why my blog is being tracked. The referrer only links to this blog’s main URL. For giggles, I even signed up for a Competitio.us account, and tried the URL in the referrer, and only saw an ugly Application error (Rails) message - someone at Competitio.us may want to redirect a request like that to a page with information - so one cannot see URLs not associated with their account.

Since some are speculating about shoppers habits this upcoming weekend, I’d like to chime in. Speaking as a person formerly involved in E-commerce, I can tell you that Black Friday and the upcoming Cyber Monday won’t be all that. Sure, the weekend will be good for online retailers, but these won’t be the busiest days for online retailers during the 2006 Holiday Season.

In fact, the busiest days for online retailers this year will be December 12th and 13th. The reason? Christmas is on a Monday this year, and that pushes up the last day for a package to ship via UPS Ground within the Continental US to Friday, December 15th. (You can spec out a shipment on the UPS Web site to see what I mean.) Many retailers will use the 14th as an extra day for shipping and handling and to sort out issues on their end.

The bulk of e-commerce shoppers are those who would rather not go shopping in brick and mortar stores, hence the waiting until the second week of December to shop, but who also don’t want to be bothered with expedited shipping expenses and/or waiting for a package to arrive at the last minute.

Shipping is a big deal to consumers, as we’ve seen how consumers would rather take a “free shipping” deal over a percentage off deal - even when the percentage off will help consumers save more. This shopper psychology is why sites like Amazon pull stunts like “free expedited shipping” in the last days before Christmas - they’re trying to convince folks to stick around and buy and not worry about shipping.

-My Dad just had to compare Microsoft Virtual Earth vs. Google Earth. He seemed to like the zooming action of both, but thought it was difficult to tilt the view within Google Earth.

-I am administering a Google AdWords campaign which is unfortunately showing ads on a number of those cruddy waste of bandwidth Made for Adsense (MFA) sites. While I don’t want to take the ads off the content network, the only solution I can think of is to Google for the advertised URL and manually remove the MFA sites from the campaign. What a pain.

Speedy Spider is the crawler for the Sweden based search engine Entireweb. I have not seen any referrers from Entireweb, but their Speedy Spider featured a URL to the informative Speedy Spider FAQ. In addition, Speedy Spider is quite polite for a bot, only crawling one or two pages per request.

Oh, wait, just when you thought we were done here with research services for the Google impaired, there is yet another one. Buzzlogic has been sending out their crawler for the past few weeks to this blog and by happenstance currently has a private beta for companies.

What is different about BuzzLogic’s crawler though is that it’s revealing a referrer which really, honestly should not be seen in the Web logs. Also, their crawler does not have any identifying information in the User Agent field. Here’s an example.

The questionable referrer, which I am seeing via Sitemeter looks like this:

If I had to guess, however BuzzLogic compiles the collected data into a static HTML file. I’ve seen that static HTML file change day by day, each with a different time/date stamp for each individual instance it hits my Web server.

This is what I see via my Web logs.
Host: 64.34.246.44 (I was only able to connect this to BuzzLogic through a traceroute of the IP address. The BuzzLogic Web server is hosted on what seems to be a completely different hosting provider.)
/wp-content/plugins/sociable/images/reddit.png (This crawler is hitting my image files for some reason.)
Http Code: 200 Date: Nov 19 10:37:14 Http Version: HTTP/1.1 Size in Bytes: 5943
Referer: -
Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.4 (like Gecko)

Yet another monitoring the Web just for you, your brand and your PR department which can’t use Google service, a bot from Webclipping was spied hitting my RSS feeds recently. Clicking around the Webclipping site (which doesn’t look all that hot in Firefox 2.0), the service seems to be similar to other monitoring outfits including brandimensions. (A side note, brandimensions, which I’ve written about before, charmingly has a Flash-only-I-don’t-really-want-to-be-found-by-search-engines homepage. As you can see, I’m not exactly a fan of a service which compiles my content and doesn’t allow me to see the context.)

A still in private beta service Hoopla purports to be “the next big portal that renders other online news and blog services obsolete.” There’s also an accompanying blog, currently with only one entry.

I found Hoopla via my usual discovery method, my Web site logs where the crawler was hitting my RSS feeds. It appears they need to be crawling the Web and blogosphere for a bit in order to collect content for their portal. I can’t tell if the folks behind Hoopla are American and/or German though. It looks like the anonymous WHOIS registration is for an American company, and the language on the Hoopla parked page is definitely colloquial American English, but the crawler is from a German IP.

FeedSweep provides a way to display syndicated RSS content on your site. So, for example, you could show the cleverhack feed if you really, really wanted to. One gripe, I couldn’t find an Add Feed to FeedSweep button.

Fatcast is an online RSS reader similar to Bloglines. Again, I could not find an Add Feed to Fatcast button. However, the service does allow one to share their list of feeds if they wish, so of course I made one exclusively with cleverhack feeds.

Last, but certainly not without emotion is WeFeelFine. Appears to be a reasearch project which searches the blogosphere for words or phrases on how the blogosphere feels. The applet which displays the emotional information looks quite cool (click on the We Feel Fine link on the home page) and allows one to search via demographics - age, sex, location. (Warning, the applet seems to take a bit of memory in Firefox.) Anyone up for reading about how some emo twentysomethings from Seattle feel?

And how did I find WeFeelFine? Their crawler, which didn’t have any identifying info in the User Agent, but the lookup on the IP provided the domain name.

Today Slashdot had a front page article about how to create a Web spider on Linux. Aside from the fact that the subject matter just totally excites my inner nerd, I wanted make a point especially for those who would be writing a spider, bot or crawler for fun and profit.

I have this true story about how, not so long ago, I was a Webmaster. One very busy morning, I had a crawler that was hitting my site and it was annoying the heck out of me as it was a little too aggressive. I really wanted to ban it, but I saw a URL in the User Agent, and so I tracked down the source. The homepage for the bot at the time looked like this and the site it was crawling for wasn’t live yet. At that point, I had a choice - I could just ban the bot and be done with it or allow the bot to run and hope that the not yet live site would someday provide some benefit.

As it turns out, I held my nose and allowed the bot to run. In fact, a few weeks later, it did slow down and was friendlier - so I didn’t mind it as much. The other part to this story is that the site in question went live in April 2006 - and it did show the crawled content.

In other words, if your bot is legit, identify it or face the chance that you could be banned from the very sites you want to crawl. While the shopwiki example isn’t the best example of a parked page, at least I had some information to go on as a Webmaster.