Becoming a tagging kung-fu master

You’ve heard the hype about tagging. You’ve seen people flocking to sites like Flickr and del.icio.us, where they jump head-first into a pulsing mass of disjointed tags, possibly never to be heard from again. And you’ve wondered: how exactly is tagging worthwhile again?

Any idiot can tag, but you want tags that are useful rather than a disorganized mess. This is not an unreasonable desire, and by completing three simple steps before you start tagging, you too can become a tagging kung-fu master. (Or, if you want more intellectual cred, explicate your personal taxonomy.)

Whether you are tagging in a private, public, or collaborative system, consistency is the byword when tagging. Without a consistent pattern you won’t know what tags to assign items, what tags to search for to find items, or what items you’ll likely get while browsing your tags. The following three steps will help you create a consistent pattern to follow. Even if you’ve been tagging for a while, you may find these steps helpful to refine your knowledge of your own tagging habits and practices. (Please note, however, that these steps are focused on developing a personal tagging system; to optimize your tagging for collaborative use you would need to develop your system somewhat differently.)

Step 1: Know what

Are you tagging PDFs in Yep, notes in Notae, characters in Avenir, or photos in iPhoto ‘08? Whether you’re tagging in one program or several, you need to make a list of the general types of different items that you want to tag.

Tagging many different kinds of items does not make planning a tagging system much more complicated, but because you’ll tag different kinds of items differently you definitely need to think about what you’re going to tag.

Step 2: Know when

Part of knowing your target is knowing what kind of metadata is already available to you (through, say, Spotlight or the Finder) and not duplicating that metadata in your tags. For instance, every file in Mac OS X has a date created and date modified attached to it. As a result, tagging your files with a date is typically a silly idea. Tagging Word documents “word” is also redundant; the system knows which documents are Word documents and finding all of them is only a saved search away. Before you proceed to the third step, you need to make sure you know what information about your target you already have available. You don’t have to write it down if you don’t want to; just be aware.

Although there may be some situations in which you want to tag an item with every possible tag you can think of, most of the time you will want to keep your tags succinct and well-targeted, which means avoiding redundancy. Tags may be extremely flexible but they are the least efficient kind of metadata in some ways because they have no indication what they are marking. When you search the “date modified” field, you know exactly what you’re finding. An “05-31-2007” tag, on the other hand, could be any number of things.

Step 3: Pick your attributes

This is the heart of a consistent tagging system, and can be summed up in a single question: how do you think about the item you are tagging? For instance, when you are filing or searching for a photo, what do you think of? The location of the photo? The subject or people in the photo? The event taking place when you took the photo? Something else entirely?

Write out a list of the attributes that you think of when thinking of your target items. Ideally, you should make this a brainstormed list that includes every attribute you can possibly think of that you might want to tag. As you make the list for your different target items, star the attributes that spring immediately to mind.

Once you have a list, go through it to weed out the attributes that are covered by the item’s non-tag metadata. Then go through it again and pick out what attributes you want to use for tagging. Try to keep it a short, specific list focused on the attributes that sprang immediately to mind. You should also add attributes that didn’t spring immediately to mind, but that you want to make a habit of tagging anyway because they will be useful.

When you have this list of attributes, you are ready to tag. You should probably put your list of attributes somewhere visible, for example a Post-It by your computer or a virtual sticky note on-screen, at least until you’ve either memorized them or developed good tagging habits.

When you’re tagging, try to consistently attach a tag for every one of the attributes that you’ve selected. The more often you can hit all of them, the easier it will be for you to find files later. Additionally, knowing what attributes you are tagging makes coming up with specific tags much easier. Rather than sitting worrying over every photograph you can quickly attach a location, person, and event (or whatever attributes you decide on). Ideally, your attributes and tags should fit into the following sentence: “This [item]’s [attribute] is [tag].” For example, “this photo’s location is New York.”

The specific tags that you use will doubtless shift over time and circumstance, but the attributes that you are tagging should remain much more stable. By defining a standardized set of attributes for each kind of item that you are tagging and only deviating when necessary (or when the way you think about a given type of item begins to change), you will be able to create a consistent tagging system that helps you find items quickly because it matches the way you think.

And more importantly, you will have taken your first steps on the road to becoming a full tagging kung-fu master. Or developing a stream-lined personal taxonomy. Whichever works for you.

I've been thinking my fair bit about tagging, and have come to the conclusion that there are two fundamentally different tagging strategies, depending on whether your goal is to categorize your information, or to make it easy to retrieve a particular item. I'll use the example of bookmarks here, but the same essentially applies to any collection of information.

The "categorize" strategy is essentially the one proposed by Ian. The goal of this strategy is to be able to quickly retrieve all bookmarks belonging to a particular category. It consists of two steps. In the first step, you make a list of tags that you will use, corresponding to your categories. In the second step, you use tags from this list to tag your bookmarks as you add them to your collection. From time to time you might decide to add another tag to your list, if you've found that tag to be useful.

For example, your tags might be microsoft, windows, linux, gentoo, debian, kernel, cplusplus. If you'd want to see all bookmarks about kernels, whether they be about the Linux kernel or the Windows XP kernel, you'd use the kernel tag to find these. Or you might use the combination of kernel and linux to find only bookmarks about the Linux kernel.

The second, alternative strategy I call the "retrieval" strategy. Its goal is to quickly be able to retrieve one particular bookmark. The idea is that if you remember approximately the content of one particular website, you quickly find the right bookmark. For example, you would use the tags gentoo, kernel, compilation, and udev to tag a page about compiling the kernel with Udev support in Gentoo Linux.

The fundamental difference between the two strategies now emerges. If you were using the "categorize" strategy, you would be likely to also tag the Gentoo kernel page with linux, since it clearly falls into that category, but you would not use udev, since you will be very likely to never have another bookmark about Udev, and you want to avoid categories containing single items. On the other hand, if you are using the "retrieval" strategy and you want to find that particular page you remember about compiling Udev into the gentoo kernel, then the tag gentoo makes the tag linux obsolete. Indeed, since Gentoo implies Linux (unless your collection contains bookmarks about a that particular kind of penguin), the bookmarks tagged gentoo are a subset of those tagged linux, and adding the tag linux to your search does not alter your results. The udev tag, though, will be very helpful since it will narrow down your results to the wanted bookmark.

To summarize, the "categorize" strategy asks for relatively broad tag categories (such as linux or windows, but not compiling or udev), otherwise your tag list would quickly be a cluttered mess. On the other hand, for the "retrieval" strategy the tags should be as specific as possible. This will clutter up my tag list, you say? Never mind, if you don't use the tags to browse through your bookmarks.

I am curious if there is a way to reconcile the two strategies, to get the "best of both worlds".

(As an aside, note that the "categorize" strategy is not much different to conventional bookmark handling in a web browser: First you create your bookmark folder hierarchy, then you add bookmarks to these folders as you surf the web. The difference is that with folders you can only make sets and subsets, but you cannot have intersections of sets, since if a bookmark belongs to a particular folder, it cannot also belong to a folder which is not a subset or a superset of the first folder. Another difference between between the two approaches is that there is no tagging system I know of that allows you to define subset-relationships between tasks. For example, you would define gentoo as a sub-tag of linux, so that anything you tag with gentoo would also be tagged with linux. Similarly, you would define linux as a sub-tag of the operating systems tag, and so on.)

43 Folders is powered by Drupal, which rules. The site was designed and made wonderful by the astounding Chris Glass. Ben Durbin is the sine qua non and our personal consigliere. 43f’s web hosting is sponsored by A2.