May 19, 2013

Studies of language acquisition and language understanding display a remarkable lack of attention to the subject matter of the utterances being studied. This is probably because nobody knows how to represent and process meaning whereas the forms of utterances are readily available. Thus "language acquisition" have come to mean the study of learning how to construct utterances "of the right form" and studies of language understanding focus on translating forms of utterances into other symbolic forms equally devoid of the richness and detail of the things the utterance is supposed to convey.

A real theory of language acquisition should study how babies learn to decode form-meaning mappings in an environment where lots of things are going on in addition to what is being said. A real theory of language understanding should study what kinds of rich interconnected concepts and embodied simulations get triggered by words and constructions, how we decide what to simulate given the scant detail in descriptions, and what inferences are made possible beyond what is explicitly stated.

All this is AI-complete you say? Well by limiting ourselves to study language in isolation, we may have come to the end of the line where the ~80% accuracy limit of machine learning based computational linguistics (on almost any linguistic problem you can think of) is preventing us from building truly transformative applications. Maybe we are shooting ourselves in the foot, and maybe, just maybe, some problems that look difficult right now are difficult not because we are missing the right machine learning algorithm or sufficient labeled data but because we are ignoring the constraints imposed by the meaning side of things. We may have finally run out of options other than to try and crack the real problem, i.e. modeling what utterances are ABOUT.