Safe Search is Off

The index asserts nothing; it only says "There!" It takes hold of our eyes, as it were, and forcibly directs them to a particular object, and there it stops.
— C. S. Peirce

A new plague of time-squandering has descended on the Lounge; it has easily pushed out earlier rivals, like playing Scrabble with friends on Facebook and watching old Patsy Cline videos on YouTube. Now the Loungeurs are in the grip of Google Image Labeler. The devil himself could not have devised a better method of timesucking: you can easily incorporate it into any multitasking scheme; it asks, nominally, only two minutes out of your day; and it's all about words!

The initial thrill of playing the game was the challenge of coming up with perfect labels: words that most succinctly describe a small image appearing on your screen. This thrill quickly disappears, however, because you find that perfect labels don't win points, and points are what it's about: not only because everyone likes scoring, but because you cannot move on to a new image until you and a perfect stranger who is your playing partner in cyberspace land on the same label for the current picture (and thus each score). So you evolve very quickly from trying to find the perfect label (an activity that you could almost justify to yourself since it exercises the depth and breadth of your imagination and vocabulary) to trying to find the lowest common denominator label: the word most likely to come into the mind of your partner, whom you develop a composite picture of after a few games. The composite picture that develops of the typical Google Image Labeler, in our view, is a twenty-something male, steeped in contemporary pop culture but perhaps not much else, who is probably multislacking while at work, or perhaps taking a break from gaming online. Is this who should be entrusted with the monumental task of indexing images on the Internet?

Google Image Labeler, will (according to Google) help "improve the relevance of image search for users like yourself." While in the thrall of it we are often put in mind of Charles Saunders Peirce: he's a late-19th/early-20th century American philosopher, much beloved in the Lounge not only for his clear thought but because he was also a lexicographer: he contributed more than 5,000 definitions to the Century Dictionary. Peirce devoted a lot of thought to signs, by which he meant something intermediary between an object and the mind. His most famous classification of signs is threefold: the ikon, the index, and the symbol. Ikons resemble their objects; any photograph, image, or realistic drawing or painting is an example of an ikon. Indices bear a real relationship to their objects, such that a change in the object would be reflected in a change in the index: an index is an indication of another thing, in the most literal sense. Symbols bear an arbitrary relationship to their objects, and are connected to them only by virtue of usage and convention; most words are symbols.

In Google Image Labeler, players examine ikons. They assign symbols to them (words, or "labels"), with the ostensible view of generating an index. This would be an index in two senses: the conventional one, being an organized list of items with systematic reference to another thing; and a Peirceian index, in that the index as a whole would bear a real relationship to the set of images labeled.

Here's the problem: Google Image Labeler is currently designed to attract a maximum of inferior symbols (labels, in Google's terminology), and a minimum of good ones. A couple of examples:

The second picture here would also attract (and Google would reward) two other labels that we have not listed, referring to features of Ms. Bardot's anatomy — that's the typical level of cognition where minds meet in Google Image Labeler.

The value of an index — as any user of a reference book can attest — is the thoroughness, aptness, and granularity of its contents. Indexers work in probably even more obscurity than lexicographers, but the service they provide is immeasurable: they enable us to find a needle in a haystack. A good index effectively imposes a numbered, three-dimensional grid on the haystack and tells us which numbered box to look in.

Indexing a book, though it is a highly specialized skill, has an inbuilt simplicity: it equates like with like, that is, words with words: words found in an index are overwhelmingly also found in their reference, and thus a book index actually bears an ikonic relationship with its object. Indexing something other than words (images, smells, sounds, and so forth) is inherently more complex: it requires the assignment of words to things that are not words, and mainly do not contain words. This relationship can be symbolic only. Surely then, this is a job that, to be done properly, requires even more specialized skill. So it seems a pretty far stretch to think that Google is going to succeed in generating a valuable word index by crowdsourcing the job to anyone willing to have a crack at it: the results obtained from their labeler seem better suited to collect noise than signal, and everyone who uses search engines can attest that excess noise is already a big part of the problem in trying to find information online.

The proviso in our observation is that we don't know what Google is going to do with the data it generates, and of course it is possible, if not likely, that the marvelous minds there have anticipated or discovered the shortcomings we note and have found a way to deal with them. One thing is clear: they will have no shortage of data, because they have cleverly entrained an army of volunteers who will feed their datastream 24/7.

Some good introductions to Peirce's semiotics, which we recommend as being just as stimulating and much more edifying than labeling images, can be found here:

Google Image Labeler is an example of a gwap: a "game with a purpose." If you're into that sort of thing, you will be able to waste (or put to good use! it all depends on your view) a considerable amount of time here:

Luis von Ahn, Assistant Professor at Carnegie Mellon University, developed a game he called the ESP Game, which is the basis for Google Image Labeler. He gave a fascinating talk about his work to Google employees, in which he addresses some of the questions we raise, while leaving others a bit dangling:

As a library and information science student, these are very pertinent issues for me. Thanks for the article! I played GWAP for a while, as an experiment. You can build a profile and also chat with the other person at the end of a game, but no one ever took me up on it. (The 20 something gamer theory comes to mind--I'm a 50 something grad student). I thought two minds thinking alike (on GWAP; I have not done Google Image Labeler) would be the perfect meeting ground. Perhaps they should develop some sort of tiered membership and place more weight on two qualified people playing the game and bill it as a dating site, as well as get better results. Tags or labels are very much influenced by our associations with an image and our common knowledge of the world and our training, of course. What a great way it would be to meet another like mind. Isn't that what we do, when two people actually meet, play a game of association and find out if the other person knows what you know? It's part of chemistry!

Thanks for indulging this tangent. I tried indexing in one class and I agree, it is incredibly difficult and underappreciated. When I come across a book that has no index, I immediately place the book in a lower category of quality and tend to think these are self-published books.

Thanks for saving me time by not being tempted to go to that site. While doing graduate work-August 2006 to August 2008 I appreciated using Visual Thesaurus since I could find the best word to portray what I wanted to say in a paper or my thesis. And of course the end of grad, work doesn't mean the need to use good words to say what I want to impart is still not important. My time is worth a lot, not enough to do all the things I want to do, so to not engage in something that doesn't really use my time optimally is great.

I'm thinking that Google must already have thought of the flaws in specificity you're pointing out. I tried it a few times to see what it was like. The "off-limits" list makes me think that the same pictures are shown to multiple pairs of players, and each time, more and more commonly-matched labels are added to this list. This forces the process to gradually become more and more specific, until (in the example you showed), two players who happen to know that the person in the photograph is Brigitte Bardot label it as such. Yes, it might take hundreds of "plays" before this happens, but if the pool of players is large enough (and evidently they must think it will be), it's bound to happen sooner or later.

Well, now, don't we agree that the purpose of concordance is the highest probability of semantic agreement? Why isn't this a form of concordance, climbing, as it were, to heights founded on more common stuff?

As someone who frequently looks for appropriate images to use in signs for book displays, I found this very enlightening. I was looking for a woman smoking under a lamppost and was surprised when "woman smoking lamppost" turned up some pictures that were close to what I wanted although nothing was exactly as I had imagined.

Based on your idea for best label I searched for "Lili Marlene" and found some things that were also close but nothing quite right, "woman smoking lamppost" was better.

I have decided that as a user of images, I find the 20-something-gamer is more helpful, unless I have a specific person or place in mind to represent the idea or mood I wish to convey.

I have not used GWAP or looked into Google Image Labeler (GIL). On first read, the story had me dreading the chaos--- the systematic organization of images using index terms devised by a non-distinct group. I indexed a large technical manual once on my own, and appreciate how difficult it is to create an index. I'm also an amateur photographer, and have not even attempted to index my own photos. I find this by far the most interesting idea for indexing content since, oh, 3x5 cards, or the machine-generated permuted index.

Emily O, you said "they should develop some sort of tiered membership and place more weight on two qualified people playing the game." Well, let's see how the collective we (apart from Google) can take another step. Suppose Google licensed or gave away the tools to a community that cares (call them editors) to create their own sandbox of content.

Many useful web sites could come from this. Sorting images, sure, and even a match-making site. Content (search-engine) sites could be created by specialist teams "playing GWAP" on a work on the scale of the largest encyclopedia. Collectors and curators of art could assemble an index of the world's art, AIA an index of world buildings, doctors and medical schools an index of, well, I'm not sure I'd want to browse that site, but doctors would! These search engine web sites could have lasting (content) value, more so than the inevitable "Google collection of babes" (though I suspect Google's advertisers would draw more ad revenue from the latter database than all the former; good for all, if that makes the tools available for free---hah!).

The ThinkMap analogy: a thesaurus is rendered in a dramatically different interactive form, distinct from but based upon the printed thesaurus. I could just as easily see ThinkMap being used to map seven degrees to Kevin Bacon on the E! web site. That application of the tool doesn't debase the underlying tool. GIL as a collaborative tool brings great promise.

Thank you for giving us the opportunity to engage in intelligent discourse. I particularly enjoy the chance to refine my pronunciation of rarely used words. Please excuse my using this otherwise designated space!

First, of all you have a wonderful name and you are an entertaining writer. I appreciate the euphony of your name and the quality of your articles.

The day this article came out I read it and was intrigued by the subject. I am a passionate advocate for good indexes and meaningful icons. But, I was particularly fascinated by your choice of the form of the word ikon instead of icon to mean "a visual representation." Ikon and icon are synonyms on VT, but ikon is a visual representation or a religious painting or panel while icon is primarily used in the sense of a graphical symbol used in a graphical user interface. The American Heritage Dictionary defines icon as (1) a visual representation, (2) a symbol, (3) a person who has become a symbol, (4) the aforementioned GUI symbol. When I look up ikon in the AHD it says variant of icon in the sense of a visual representation. The AHD definitions match my idea of the word.

For years I worked in Silicon Valley and was the director of a team of people who wrote highly technical object-oriented database management systems documentation and training courses for a programmer audience. We also designed a graphical user interfaces for a software product for our Windows users. We had long meetings attempting to determine the best icons (in a teensy size, which further complicated the issue) to show a selected action or topic. My team included, among others, two writers with PhDs in linguistics, an electrical engineer/biologist, a Java programmer, a C/C++ programmer, a graphic designer/illustrator, and a geologist who had become my production person. This multitalented group, after much hard work, came up with some creative and elegant solutions for extremely difficult to label concepts. These design meetings were even more dramatically challenging than our style and standards meetings. I see by your resume that you have some experience in technical documentation so you may know whereof I speak. And, during all of this work we spoke of icons but never ikons, which is why your choice has particular significance to me.

In the days after reading your article I could not get the choice of ikon as a usage out of my head and I tried to find a reason for your choice. I thought maybe you were British and had a classical education at a British public school and you were naturally gravitating towards the Greek form of the word. I also notice that you have written a book about the differences between British and American English usage and was wondering if you choice was an example of that. But, your Web site places your birth in Creede, Colorado and indicates that the last members of your family to be born in Britain experienced that circumstance in the 17th century. Then, I wondered if you were intentionally trying to use the form of the word that Mr. Peirce had used, but I looked up his paper on the subject and he did not use the word at all as far as I can tell. I thought maybe you were deliberately trying to avoid the usage most popular in high tech, but that does not make sense when you are using the terms in reference to Google's Image Labeler, which is obviously in the high tech realm.

So, I am intrigued, why did you choose your form, and upon further reflection, would you choose it still?

Many thanks to all for your comments: it is very gratifying to me to write on a subject I find interesting and find that I have struck a chord in others. A couple of notes:
1) Kcecelia: So sorry for creating a confusion. My intention was to follow Peirce, so the spelling ‘icon’ should have been used. I reread some of his papers (from The Essential Peirce, which I checked out of the library) and took notes the week before I wrote the column; my notes all had ‘ikon,’ but in looking at the same papers in the scanned book online, I can’t imagine why. Your theories about it, however, do you credit for great thoroughness!
2) Others: many of your comments have made me rethink GIL and I’m now a bit more charitably inclined towards its usefulness. It seems likely, as Mary Beth J suggests, that multiple labels for a single image are a more useful indexing tool than single labels, and GIL certainly does generate multiples. As Wood F’s comment implies, Google has the luxury of collecting as much data about a single image as it wants, for there will always be willing players. So one is reminded of the monkeys-at-keyboards-producing-the-bible analogy: if you wait long enough, the data you want will arrive.
3) charles F., I’m with you! Peirce’s writings are fantastic brain food and it’s unfortunate that he is not widely read outside of universities; I hope that the column might bring a few more readers to him.

Thanks so much for responding to the comments generated in response to your article. It is a satisfying feature of VT that it allows such direct communication with the author of a piece. Thanks also for your explanation of your usage of ikon rather than icon. I am relieved to find that the explanation is so straightforward and that the usage of icon is not steeped in some controversy to which I was not privy. I look forward to encountering your next VT topic.

Wordage is so fascinating. Without it we cannot get to the full ignition of a word or thought and I revel in the writer that can bring additional substance to a word, by using one not so common and yet so full that it provides shades not used in de rigeuer writing. Thank you for lifting me out of the doldrums.
Lawrence B.