Friday, April 11, 2008

Today I'm stuck at home trying to recover from the latest disease Ethan has brought home. Here's my 5 items of interest for the week:

Google recently announced their newest attempt at crawling the Deep Web. When their crawler sees a form on a "high-quality site", it enters words and selects various checkboxes and radio buttons. The form is then submitted, and the resulting page is crawled. Very cool. Google also announced the Google App Engine. It's a service that allows you to run your web applications on Google's infrastructure so it can grow to accommodate a large amount of traffic. It's a free service until it exceeds a set disk space and bandwidth quota. Unfortunately, there's a limit of 10,000 developers, and I wasn't quick enough to sign up, so I'll have to wait until they increase the limit.

Jansen, Booth, and Spink have developed a system which can automatically classify a search query as navigational, transactional, and informational. They used their classifier on a large dataset of search engine queries and were accurate 74% of the time. Based on their results, approximately 80% of all search engine queries are informational, 10% transactional, and 10% informational.

Congratulations to the Harding Programming Team on their first place finish at CCSC-MS. We showed them Razorbacks how to program...