Tags have continued to percolate through the ecosystem after their most auspicious introduction in Delicious.com. (Note the phrase “most auspicious”; tags have always been with us.) It’s great to see them increase both because they are a great way to get use out of the craziness while preserving it in its original form for others, and because there is great value in scaling tags, as Flickr has shown.

Amanda Filipacchi has a great post at the New York Times about the problem with classifying American female novelists as American female novelists. That’s been going on at Wikipedia, with the result that the category American novelist was becoming filled predominantly with male novelists.

Part of this is undoubtedly due to the dumb sexism that thinks that “normal” novelists are men, and thus women novelists need to be called out. And even if the category male novelist starts being used, it still assumes that gender is a primary way of dividing up novelists, once you’ve segregated them by nation. Amanda makes both points.

From my point of view, the problem is inherent in hierarchical taxonomies. They require making decisions not only about the useful ways of slicing up the world, but also about which slices come first. These cuts reflect cultural and political values and have cultural and political consequences. They also get in the way of people who are searching with a different way of organizing the topic in mind. In a case like this, it’d be far better to attach tags to Wikipedia articles so that people can search using whatever parameters they need. That way we get better searchability, and Wikipedia hasn’t put itself in the impossible position of coming up with a taxonomy that is neutral to all points of view.

Wikipedia’s categories have been broken for a long time. We know this in the Library Innovation Lab because a couple of years ago we tried to find every article in Wikipedia that is about a book. In theory, you can just click on the “Book” category. In practice, the membership is not comprehensive. The categories are inconsistent and incomplete. It’s just a mess.

It may be that a massive crowd cannot develop a coherent taxonomy because of the differences in how people think about things. Maybe the crowd isn’t massive enough. Or maybe the process just needs far more guidance and regulation. But even if the crowd can bring order to the taxonomy, I don’t believe it can bring neutrality, because taxonomies are inherently political.

There are problems with letting people tag Wikipedia articles. Spam, for example. And without constraints, people can lard up an object with tags that are meaningful only to them, offensive, or wrong. But there are also social mechanisms for dealing with that. And we’ve been trained by the Web to lower our expectations about the precision and recall afforded by tags, whereas our expectations are high for taxonomies.

Well, here’s an application of some of the ideas in Everything is Miscellaneous that I wasn’t expecting: The US GAAP Taxonomy. A post at the XBRL Business Information Exchange says:

The US GAAP Taxonomy was built by the accounting standards setter, the FASB. It was built by accountants. It is a consensus-based product. Not one SEC XBRL filer uses the US GAAP Taxonomy as is to file with the SEC. Every SEC reorganizes the US GAAP Taxonomy.

But the US GAAP Taxonomy is not built to be reorganized. The structure of the taxonomy is more like a book. Can the US GAAP Taxonomy be reorganized? Of course it can. But it is certainly not optimized to allow for reorganization and reorganization is not even mentioned in the design characteristics. As such, it will cost more and be harder to create and maintain these reorganizations.

So how do you make it easier to reorganize? Many smaller pieces which can be put together as needed is vastly easier for a computer to deal with than having one large piece and trying to break that piece apart. That is one example of what can be done. Another is communicating the metadata which exists in the taxonomy, for example the information modeling patterns employed. A third is to make the existing metadata real metadata, rather than burying it in the labels of the concepts. Another is to add more metadata.

The post points out that it’s not that everything about that taxonomy should thrown into a big pile. There are key data points required by law and to achieve financial integrity. Still, this is not a place I would have thought miscellanizing would help. It seems, however, that I may well be happily wrong.

The disagreement among librarians is, to my mind, itself evidence that there is no one right way to organize physical objects. Classification is pragmatic. You classify in a way that works, but what works depends upon what you’re trying to do. Libraries serve multiple purposes, so librarians have to make hard decisions. If the DDC isn’t the safe and obvious choice, then libraries have to confront the question of their mission. The classification question quickly becomes existential in the JP Sartre sense.

At the end, she quotes from Everything Is Miscellaneous where I say that the Dewey system “can’t be fixed.” I still think that’s right in its context: No single classification system can work for everyone or for every purpose, although they can be better or worse at what they’re trying to do. In that sense, the DDC can be improved, and the OCLC has continuously improved it. But because it’s premised on assigning a single main category to each book, it is repeating the limitations of the physical world that require physical books each to go on a single shelf. Any single classification is going to be inapt for some purposes, and is going to embody biases constitutive of its culture. It’s the job of a library and of a book store to decide which single way of classifying works best for its patrons, with the obvious recognition that no single way works best for all. Books are miscellaneous. Libraries, bookstores, and the shelves over your desk are not.

Anyway, Barbara’s article is a fascinating look at how libraries are trying to do the best for their patrons, working within the constraints of the physical.

Tim Spalding, founder of the estimable LibraryThing, is calling on us all to create an open shelves classification project to replace Dewey and his pals. LibraryThing is a brilliant implementation of a what a library built on a social network of readers can be, so I’m excited about Tim’s new idea.