Rants, raves (and occasionally considered opinions) on phyloinformatics, taxonomy, and biodiversity informatics. For more ranty and less considered opinions, see my Twitter feed.ISSN 2051-8188 View this blog in Magazine View.

Saturday, December 23, 2006

I [Richard Cave] recently gave a brief presentation at the yearly CrossRef member meeting on unique author identification in scientific publishing. I had gathered information for the presentation from speaking with PLoS staff and online articles, but didn’t put pen to paper until the night before the meeting. Given my procrastination and rambling presentation, I think that it’s a good idea to write down my notes so that they are more understandable.

Saturday, December 02, 2006

The November 2006 issue of D-Lib magazine contains an article by Elaine Peterson entitled "Beneath the Metadata: Some Philosophical Problems with Folksonomy" (doi:10.1045/november2006-peterson). She writes:

The choice to use folksonomy for organizing information on the Internet is not a simple, straightforward decision, but one with important underlying philosophical issues. Although folksonomy advocates are beginning to correct some linguistic and cultural variations when applying tags, inconsistencies within the folksonomic classification scheme will always persist...Most information seekers want the most relevant hits when keying in a search query. Folksonomy is a scheme based on philosophical relativism, and therefore it will always include the failings of relativism. A traditional classification scheme will consistently provide better results to information seekers.

This article is one of the most irritating things I've read in a while, and as much as I like philosophy, it reinforces my prejudice that invoking philosophy is almost always a bad idea. Casting the discussion about folksonomy versus classification as a clash between "Aristotelian categories" and "philosophical relativism" just substitutes name calling for analysis, and the paper makes unsubstantiated claims such as "A traditional classification scheme based on Aristotelian categories yields search results that are more exact", and "A traditional classification scheme will consistently provide better results to information seekers." Er, how do we know this? Do we have data to support this? And, um, what classification scheme does Google use, exactly?

Now, I'm a fan of classifications, and would argue that biological taxonomy has one of the largest, most elaborate classifications that is actively used, complete with detailed rules governing it's maintenance. Indeed, much of this iPhylo blog is about a project to add classification to a database (TreeBASE) that eschews classification (to its detriment). However, classification is problematic — there are competing classifications, and within biological taxonomy there is much discussion about how names relate to classifications (see earlier posts More on names (and frogs) and Synonomy and kinds of name). Despite being armed with one of the best developed classifications around, biologists also use informal names to refer to groups, partly because our knowledge of the real world changes, and hence our classifications change (but often lagging behind the latest research).

Classifications can also constrain the kinds of questions we can ask. For example, NCBI's classification of animals lacks the Ecdysozoa, a group whose existence is controversial, but I guess most zoologists would accept. Despite this broad acceptance, NCBI prevents users asking questions such as "how many sequences have been obtained from members of the Ecdysozoa?" To see this, try typing "Ecdysozoa" as a search term in the NCBI's Taxonomy Browser. If you want to ask this question, you need to construct a complex query that specifies all the groups belonging to the Ecdysozoa. This problem motivated a paper Gabriel Valiente and I wrote (doi:10.1186/1471-2105-6-208) that suggested using edit scripts to modify trees so that users can generate their preferred classification using the NCBI tree as a starting point. The other motivation was that the NCBI tree is continually growing as the NCBI database grows.

Given these issues, the flexibility of folksonomies may offer some advantages. Indeed, I think the notion of "tagging" may prove a useful way to think about taxonomic names. Guy and Tonkin's article "Folksonomies: Tidying up tags?" (doi:10.1045/january2006-guy) offers a rather more sensible perspective:

We agree with the premise that tags are no replacement for formal systems, but we see this as being the core quality that makes folksonomy tagging so useful.