Thursday, July 7, 2005

Folksonomies vs generated categorizations

Several friends have been pointing me at del.icio.us for quite a while now, which has led me to give some thought to folksonomies.

I'm reminded of a talk at the Web2.0 conference in which Google was demonstrating a research project that performs auto-categorization of search results. The thrust was that when you do a search, rather than just return a flat list of links, Google would examine the context of the search terms within the resulting pages in order to ascertain the category of use. That is, "bismuth" can be a metal or a ship. So Google would return "metal" and "ship" from your search of "bismuth", and you could select the category you meant to get a more precise result.

Folksonomies (aka del.icio.us) are really interesting from the point of view of enabling an architecture of participation around categorization, but will result in a somewhat colloquial categorization on which subsequent viewers can't truly depend. At best. I mean, if we listen to George Carlin ("Toledo Window Box"), we might imagine some folks will tag with "bush", some with "hemp", some with "boo", some with "smoke", some with "weed", some with "guage", some with "grass", some with "tea", some with ... you get the idea. We wind up with a number of variously disjoint islands referring to the same concept. And because the terms are colloquial, what I mean by 'tea' won't be the same as what you mean by 'tea', and you will undoubtably be surprised and amused when you visit some places I've tagged thus.

A folksonomie isn't - can't be - very authoritative. This is probably what Blaise Cronin was referring to: "Undoubtedly, these are the same individuals who believe that the free-for-all, communitarian approach of Wikipedia is the way forward. Librarians, of course, know better." Free-for-all classification is, one suspects, precisely the problem the Dewey Decimal Classification System was designed to address.

All this sets up something of a competition between Google category extraction and del.icio.us. It'll be fun to watch it play out.