The ostrich and the platypus

The representation of “cell wall” in the Gene Ontology. Picture source: screen shot from geneontology.org.

One evening in December I sat in the living room of a friend half an hour south of Paris. We sipped wine and talked about the recent kidnappings of hens from her hen house. She knew what kind of animal was stealing them, but neither of us knew what the French word for it was in English. Words for animals are a great illustration of Zipf’s Law—you know so, so many of them, but the vast majority of those almost never get used. We discussed this fact, and that discussion quickly led to the word ornithorynque: “platypus.”

Why the hell would a couple of computational linguists half an hour south of Paris need to talk about a platypus, or for that matter, an ostrich? Me and my friend both work with things called ontologies. You can think of an ontology as a set of things and a set of relationships between them, where the relationships are generally restricted to either “A is a B” or “X is part of Y.” For example, the Gene Ontology contains the specifications that a cell wall is anexternal encapsulating structure, that an external encapsulating structure is acell part, and that a cell part is part ofa cell. Armed with that information, a computer (or a person) can infer things, such as that a cell wall is part of a cell. This might seem obvious to you, but it’s not obvious at all to a computer. A computer can’t really understand language, and to a computer, cell wall and cell migration both look pretty similar—two nouns in a row, the first of which is cell—but, a cell wall is a part of a cell, and cell migration is not. Ontologies are one way of encoding the kinds of information that we think humans use (and therefore computers presumably need) to understand language—for example, to be able to understand that if I say The children ate the cookies. They were delicious, then they means the cookies, but if I say The children ate the cookies. They were hungry, then they means the children.

Necessary and sufficient conditions for being a cow. The claim of the diagram is that in order to be a cow, you must have four legs, hooves, and no feathers. The claim is also that if you have four legs, hooves, and no feathers, that is enough to establish that you are a cow. Do you buy (translation of buy in this context: “accept the claim of”) this cartoon? Picture source: http://searchengineland.com/how-prototype-theory-influences-a-social-media-strategy-59608.

Ontologies are great ideas, but in practice, it isn’t that easy to get them to work. Let’s take mammals, since it’s a mammal that was stealing my friend’s chickens. In an ontology, in order for something to be fully defined, you have to state the necessary and sufficient conditions for something to belong to a category. That is, the conditions that must be met to belong to the category—the necessary conditions–and the conditions that, if they are met, are sufficient to let you belong to that category. In French, we call these les conditions nécessaires et suffisantes, or CNS. Let’s think about the necessary and sufficient conditions to be a mammal. Nurse your young; three middle ear bones; hair; neocortex; endotherm; give live birth. Damn–what about the platypus? The platypus is a mammal, but it lays eggs. That’s why the platypus—l’ornithorynque (n.m.)—came up in our conversation. The fact that things like the platypus exist is a problem for ontologies (and ontologists). Ontologies have to assume these really rigid boundaries for semantic categories, established by conditions nécessaires et suffisantes, and in practice, people don’t seem to think about semantics that way.

How do people think about semantics, then? There’s decent evidence for what’s called the prototype theory. The prototype theory posits that we have representations in terms of some prototypical member of the category. Other things might be closer to the prototype, or other things might be farther from the prototype, but we can accommodate all of them within the category, since it doesn’t require rigid boundaries. If you have feathers, and you’re bipedal, and you lay eggs, and you fly, then clearly you’re a bird–you’re like the prototype for a bird. But, even if you don’t fly, you can still be a bird—and that’s how an ostrich gets into the conversation. Last summer I was giving a talk about semantic representations, and I was reviewing prototype theory. The ostrich is a classic example to use when you’re talking about prototype theory—unlike a prototypical bird, it doesn’t fly, but it’s still a bird. I couldn’t remember the word for ostrich, which I constantly confuse with the word for Austria. Mercifully, my host was sitting in the front row, and he told me: autruche.

If you’re interested in reading about this kind of stuff in French, I’m a big fan of the book Initiation à l’étude du sens, “Introduction to the study of meaning,” by Sandrine Zufferey and Jacques Moeschler. I don’t know of any book in English that’s better.