Sunday, February 06, 2005

why dmoz?

I have been perusing parts of the . The book is very good, although I am very stale on some of the math. (Side note: to judge the book by its color, this is one of the best books I held. The color, paper type, weight, size and font are VERY nice. That is why I posted the full-color link.) It got me to think about a few web-related thoughts.
First, the best explaination of Google's PageRank algorithm I heard to date. Basically if you have a random traveler on the graph structure of the web (choosing a random link available from the current location) the page's rank will be proportional to the number of times this random traveler will stop on this page. Cool. For a somewhat different explaination (and a REALLY COOL animation see Doug Gregor's (of boost.org fame) project.
Second, the book mentioned the obvious limitations of human-edited directories such as dmoz.org. The real here killer is coverage - there is no way to get enough volunteer-power to cover a significant chunk of the Web to make the directory approach useful. So why is there still some push behind these? I think one of the big benefectors here is the search engines. Why? As the book shows from some studies even the search engines' coverage is not complete (for technical reasons). So let's put the pieces together: search engine graph traversal is not complete, but their ranking depends of the graph traversal. What to do? Why not seed the search with some nice, non-spammed results of volunteer work? That's what I think, anyway, would love to hear other opinions.
Follow-up: I wrote this up without googling it first. Interesing corraboration from jdMorgan here, and a great thread here.