Meet The Blog & Feed Search Engines

I was looking forward to the "Google Print & The Copyright Debate" session but it was cancelled. :( So now I need to go to a session about blog and feed search engines; they are all filled with spam anyway. Detlev is modding up this session. I believe Jeremy Zawodny from Yahoo! was unable to make it, so someone may take his place.

First up Kaushal Kurapati, Senior Product Manager of Search, Ask Jeeves to discuss Bloglines. Bloglines has 1 billion articles indexed, feeds that matter include 1.3 million feed with at least one subscriber and there are 2 to 3 million new articles per day. Bloglines is a free online service for searching, sub, creating and sharing news feeds and blog content. RSS or Atom works. 10 languages supported. Track buzz. Track the future with search subs. New features; hotkeys, package tracking, weekly and monthly horoscopes and winning lottery numbers. He shows screen captures of bloglines. Average bloglines user visits 4x per day - very active audience.

Bob Wyman, CTO and Co-founder, PubSub is up next. Briding Light to the Gray Web: Visible Web, Hidden (Dark) Web, Gray Web (changing web and structured Web). PubSub takes the queries in and indexes the queries and then looks for the documents live (unlike a typical search engine). As they find what you are looking for, they store it for you and tell you about it. He gives an overview of the technical process. The second problem they work on is "structured blogging" where they allow you to specify more information about why you are writing this blog entry and it becomes more structured. They have about 20 million blogs, 50k newsgroups etc. He shows off the "LinkRank" feature which ranks blogs and sites based on how many links on a trend basis (time sensitive). Some cool stats. He actually threatened black hats that if they do think "unnatural" "we" (as search engines) would do "nasty things" to you (black hats).

Scott Johnson, Founder and CTO, Feedster. They are launching a new design soon, we are the first to see it. How does Feedster get data? end user submission, crawler discovery, ping server of our own, monitor industry ping servers (pingomatic, weblogs.com), feedmesh (distributed network of ping servers), and batch data loads (45k plus podcasts). What is a blog? Original assumption; 1 feed equals a blog. No that is not correct! A blog is a non reviewed, non edited publication that is generally the result of a single person's effort. Adopting a tagged data mode; feeds (blogs, news podcast, sale, forum, etc.) every feed has one master tag which defines its nature. They're also all tagged with "everything" tag. Its not just blog search tho; yahoo got it right (kinda), its search across rapidly changing data with easy access to the latest across categories. The all new feedster homepage is now revealed. On the top you see "master buckets" with search "blogs, news, podcasts, etc.") He then showed the new results landing page, which enables you to filter by language. the default area on the left are blogs, there is a tag button to allow you to tag, which flow into a pink box on the top right that says "my tags". They also have a best results box in orange, the brings up the best results based on your search. Then a blue box for "news articles" that has recent news articles. And a green box for podcasts. Everything is taggable, blog searches automatically search blogs, news and podcasts, search zooming (jump from blog search to news search, etc.) and 1 click end user spam reporting and there is more coming.

Nathan Stoll, Product Manager, Google. He said he is the product manager of Google News but he is here to answer questions on Google Blog Search and the new reader (lens) and Google news. You may notice there are multiple blog search UIs, it depends on where you come from, they serve up different user interfaces.

Q & A:
Q: Something about blog spam...
A: Bob Wyman says it will be under control soon, he calls splog generators, "scum." Then Scott Johnson said Bob is arrogant to think that it will be under control, he said just like there is email spam, it will forever be an issue. Bob retracts his statement. Then Kaushal compares it also to search spam, its a never ending battle. Nathan adds that his colleague Matt Cutts recognizes false positives happen as well.