Meet the Crawlers

Haven't been to one of these sessions in about a year so I decided to check it out again. These sessions are always filled, I think that is why its on the last day (if it was my conference, I would put it on the last day). That is also why I am here, because people want to hear about this session but you normally won't find any 'experts' in this room (besides for the people on the panel - hope I don't insult anyone). Danny Sullivan is moderating this session. Most of the people are first time SES attendees in the room.

Michael Palka from Ask Jeeves is up first to present. He starts off about the number of properties they own, which they purchased over the year. This seems to be their way of saying we are different. They are the number 5 overall search engine out there. He then gives an overview of how search engines work, I'll spare the readers here that part of the presentation (you already know how spiders dig around and eat up all your bandwidth). He then gets into their "subject specific popularity" and "communities." Check out the Ask Jeeves and Teoma forum for my post on this, if your interested. He goes into the problems with crawling the Internet... He then gives up secret sauce on how to rank well; (1) content of page, (2) meta tags, and (3) links. :) He said use a date stamp as to when the site was updated, this helps users and the bot.

Jen Fitzpatrick from Google was next up, she is the director of engineering. She starts off with PageRank and explains it in a theoretical sense. She then talks about text analysis and then how the crawlers work. First looking at news, then fresh content and then the rest of the Web. Then she goes over the Webmaster guidelines, the Google's do's and don'ts. Do make sites content relevant, do submit to directories, do let others link to you, and read google.com/webmasters/. Don't cloak, don't send automated queries to Google, dont hide text and links and don't do other things... She discusses the 301 redirect, using the HTTP If-Modified-Since header: respond 304 Not Modified. and use the robot.txt file. She then gets into AdWords and AdSense a bit. She then adds the Google Search appliance, its a combined hardware and software solution that is meant for corporate America (I thought that product wasn't doing to well). She then talks very briefly about the other Google products; gmail, toolbar, etc. She says Google is very active with Webmasters, i.e. GoogleGuy.

Ken Moss from MSN Search, first time on this panel. He says he is very excited to be here. He brings up a live MSN search page (search.msn.com). He did a search on search engine strategies conference, and he said these results are not MSNBot, they are provided by Inktomi. So why are they developing a new engine? He said because there is a lot of innovation still left in this technology, and in 2 years the industry will be very different. He also showed how ads differ now on the search site, compared to before. He did a search on flowers and it came up with good results but the third result had those >> that Google disallows but Inktomi allows for. He told you to go to snadbox.msn.com, and then skips down to the MSN Search Technology Preview. He does a search on search engine strategies again and you see a ton of duplicate results from the same site (he doesn't say that of course), he says please provide feedback so we can improve it. They want as much feedback as possible, that is how they will achieve. They have a feedback link at the top of the page, they have a little bar next to each result so you can give feedback on the specific result, and you can also email them at msnbot@microsoft.com. This site will be taken down August 8th and then come back later improved. So provide feedback soon please.

Tim Mayer from Yahoo! Search was up next, nice guy - we chatted a bit in the speaker room. They took their search technologies that they purchased and made their own. They power half of all us web searches. Yahoo! has 260+million users. They want to discover all the content on the Web, 99% of the index is free crawled, the other 1% is PFI. Yahoo! has grown significantly, focusing on freshness and volume. They look at freshness in two ways, updated content and new pages. They have CAP. They have a crawl-delay which can tell the bots to not crawl until x seconds. What if I unsubscribe from PFI, and I dropped? If you were in before, then you will be in after - as long as they can get to you through a natural crawl. Yahoo has a RSS support, they recommend adding your RSS feed to My Yahoo!. They support ATOM in My Yahoo! but the bots do not support it.

Q: Anyone using click data from ISPs to rank sites?
A: Google said we don't comment on the rankings algorithms. Google does not purchase ISP data for who clicks on what sites. MSN said privacy is a big concern with them but they do have an opt in data analysis. Yahoo and Ask does the same thing.