Friday, October 6, 2017

Computation, games, humor, and common sense

New Yorker cartoons are usually captioned these days, with
fewer in the lovely mute style of a William Steig. A general theory of language use should be
able to explain how cartoon captions, a genre of text, are understood. The
cartoons illustrate (sic) the dependence of language comprehension on context
(the one created by the drawing) and background knowledge (about, for example,
rats running mazes, guys marooned on islands, St. Peter’s gate, corporate
culture, New Yorkers). The popular Caption Contest is an image-labeling task,
generating humorous labels for an incongruous scene.

A bit later:

The weekly caption contest has yielded a massive amount of
data that is being analyzed using NLP and machine learning techniques. (The contest: Entrants submit captions for a
cartoon; from the 5000 or so entries the editors pick three finalists; readers
pick the winner by voting on-line.) Just
think of the studies that can be done with this goldmine of a data set! Identify the linguistic properties that
distinguish the winning captions from the two losers. Build a classifier that
can estimate relative funniness from properties such as word choices,
grammatical complexity, affective valence (“sentiment”), readability, structure
of the joke, etc. Use the classifier to
predict the winners on other weeks. Or the rated humorosity of other cartoons.

Heavy hitters from places like Microsoft, Google, Michigan,
Columbia, Yale, and Yahoo have taken swings at this ... The results (from the
few published studies I found) have been uninspiring.

Liberman then
asks a crucial question: “Is labeling New Yorker cartoons harder than playing
Go?” – you may recall that not too long ago Google’s DeepMind made a media
splash when one of its programs defeated the best human expert on Go.
Liberman’s answer to his question is, of course, yes, labeling cartoons is
harder.

He explains:

Go has a conventionalized set-up and explicit rules. A
captioning model has to figure out
what game is being played. Captioning is
a type of scene labeling but that requires recognizing what’s in the scene
which in this case is, literally, ridiculous: exaggerated, crude, eccentric,
stylized renderings of the world. Quite different from the naturalistic scenes
that have been the focus of so much attention in AI.

He goes on to
observe:

... humor turns on a vast amount of background knowledge. That
feeling when you just don’t get it happens when we either don’t know the relevant
stuff or can’t figure out what’s relevant to that cartoon. A deep learning network might well acquire
the requisite knowledge of the world but not from 80,000 drawings: insufficient
data. Same for analyzing the captions:
it’s necessary to know the language.
People have acquired most of the knowledge that's required by other
means.

That, common
sense background knowledge, is one of the problems that squelched classic
symbolic processing several decades ago. Researchers realized that in dealing with
even very simple language, we draw on a vast back of common sense knowledge.
How are we doing to hand-code all that into a computer? And once we've done it,
what of combinatorial explosion?

According to
Wikipedia Go
is a combinatorial game, like Chess and Checkers, but also like
Tic-tac-toe. The latter, of course, is trivial, while the former three are not.
The moves in such games can be represented as a tree, where each possible game
is a path from the root of the tree to some leaf. For each such game the number
of such paths is necessarily finite. In the trivial case of Tic-tac-toc the
number of paths is relatively small. In the other cases it is not, and the game
tree for Go is by far the largest one.

The fact that
each of these games is finite means that, in
point of abstract principle, none of them is any more complex than
Tic-tac-toe. But that’s only in principle, because the principle is indifferent
to computational resources. The principle, in effect, supplies all the
necessary resources.

Real
computation is not like that. Real computation always takes place with finite
resources. Go requires more resources than Chess.

But humor, as
in the task of supplying captions for New
Yorker cartoons, does not appear to be finite even in point of abstract
principle. It is a beast of an altogether different kind.

It may be the
case that, at any moment in time, each one of us has only a finite stock of
common sense knowledge available. I haven’t got the foggiest idea of how many
such bits that might be, nor does it much matter. But the collective stock of
common sense knowledge must surely be unbounded for all practical purposes –
assuming when can define it in some reasonable way. The field available for
humor is thus unbounded as well. And that’s different from merely being very
very very large, albeit finite.

What about
arithmetic? Is it more like those finite games or more like common sense
knowledge and humor? If my understanding of Gödel’s proof, and similar
mathematical beasts, is reasonable, then arithmetic may look “simple” like
Chess or Go, but it is in fact unbounded, like humor.