2005-04-27

Local and Global Taxonomies

The meaning of “birdie”, “Tiger”, and “the Road” could all be mapped within the machine of our example to the numbers it was originally asking for. The use of “shot” in golf is ambivalent, but in our example the machine is looking for the total score of a particular hole and not whether Tiger shot into the woods or onto a bunker. So the value of “shot” is the value of score.

An important detail in the example is the need for certain local and temporal facts that are not part of the generalized corpus of golfing knowledge. Knowing the numerical value of a birdie is dependent upon the par value of the actual hole it is scored on. A birdie is one stroke under par. Tiger’s competition number is not a permanent part of his identity, but rather a designator assigned to him for this particular tournament. “The Road” is the nickname of the 17th hole at St Andrews, the world’s oldest golf course, and would probably have another numerical equivalent if it was used to designate holes at other courses.

In the interest of commonalization and completeness, the taxonomy of golf, both that which has been established by tradition and that which is topical, localized and event specific, could be shared by the human race, just as the human race shares the dictionaries and encyclopaedias of its many written languages.

Though just who should assume responsibility for such a task is debatable, many IT thinkers believe this is the logical extension of the Internet, the next step in the IT revolution. Provided the costs of such an enterprise could be distributed in a feasible manner, the savings would be significant. Though golf might not be their first concern, governments could build infrastructures of taxonomies for utilitarian purposes in order to create efficiency in computer aided transactions.1

Ambiguity is not eliminated in our use of natural language unless we disambiguate words themselves. “Hole” for example has many meanings even in golf. Holes are not just the term that describes a section of the course, holes are everywhere, the actual cup in the centre of a green, wherever an animal decides to dig, in our pockets, etc, so even in the eventual existence of a universal taxonomy on golf, there must be way of discerning what sort of a hole is meant.

One method currently in vogue is to to use double-speak. The general idea is that we would use natural language as it is customary for us to do, and then on top of that, we would add an extra layer of reference pointers for words and phrases to clear up any doubts about their meaning. The reference pointers are addresses to a source of authority. If, for example, we were to write “a kilo of gold” then on top of that we could also write the address of an authority, perhaps somewhere in Paris, where the meaning of “kilo” and “gold” could be resolved. For golf the double-speak score notation could look like this:

This technique called mark-up, has probably been around since the Sumerians discovered that a written language, as great as it was for counting crops, still lacked precision, but it passed a milestone in the 1970s with the invention of standardized mark-up languages such as SGML which will be discussed in detail in other parts of this book. Unfortunately double-speak in natural language and SGML is tremendously burdensome and resource demanding, and consequently only a few large corporations and military establishments have adapted the language for use in their daily activities.

Of course any formalized interaction in the absence of hard-wiring, uses common points of reference: This is, for example, what standards are all about, and the modern successor of SGML, called XML, does so in a clever way. It utilizes the already proven addressing and hyper-linking technologies of the Internet as its unique addresses. But what is at the other end of such an address? If a computer agent busily parsing information came upon the predicate phrase “is the owner of” coupled to the address of some authority for the canonical definition of that phrase – what would it find there, if not a definition written – in natural language? What is a poor machine looking for logic and numbers to think of that?

1This is actually what happens when trading systems such as EDI, Electronic Data Interchange and EDIFACT are created. Though EDI is a commercial initiative, EDIFACT is sponsored by the United Nations. See http://www.itworld.com/Man/3830/CWD010703EDIXML/