Computer scientists Chloé Kiddon and Yuriy Brun are interested in how humans recognize double entendresand whether machines can learn to do the same. Spotting double entendres requires "both deep semantic and cultural understanding," they write (PDF). As Kiddon explained in an interview, a double entendre is really a type of metaphor that brings together two conceptual realms: one straight-laced and one raunchy. So "that's what she said" jokes aren't just crude, cheap ways to get a laugh-they're also fertile testing ground for whether computers can be trained to "think" metaphorically about language, the way humans do.

Advertisement

Kiddon and Brun define a TWSS as a sentence that is funny when followed by the phrase "That's what she said." Telltale TWSSmarkers include 1) the presence of nouns that are often euphemisms for more sexually suggestive nouns and 2) syntactical structures common to X-rated literature. The researchers givebanana as one example of a seemingly respectable noun that could moonlight in porn writing. For racy syntax, they offer "[subject] stuck [object] in" and "[subject] could eat [object] all day."

To train their computer program, DEviaNT (Double Entendre via Noun Transfer), which assesses the TWSS potential of individual statements, Kiddon and Brun gathered 1.5 million sentences from erotic literature and 57,000 from more mainstream texts, such as Barry Goldwater's 1961 essay "A Foreign Policy for America" (which is just chock-full of euphemistic eroticism, we're sure). By analyzing big swathes of lexical content, DEviaNT began to learn which terms frequently appear together in risqué contextsthus indicating a potential TWSSand which tend to cluster in more decorous settings. The program then honed its skills on 2,000 sentences from twssstories.com, an online forum for "That's what she said" jokes; more practice came courtesy of fmylife.com, textsfromlastnight.com, and wikiquotes.

DEviaNT uses an "adjective sexiness function," or AS(a), and a "verb sexiness function," or VS(v), to gauge the likelihood that a given sentence will work as a TWSS. For example, suck has a high VS(v), while sweep has a low one. Kiddon and Brun created the functions by studying the modifiers and verbs that typically orbit a pool of 76 explicit, predetermined nouns, most of which describe sexual body parts. (A sentence that actually contained one of those 76 words is unlikely to be a TWSS, though, since it wouldn't be euphemistic enough.)

Now that it's been properly trained, you can feed DEviaNT a string of text, and it can tell you whether tacking on the four magic words will result in hilarity. At the moment, the program has about the same dirty-joke-telling ability as a 12-year-old boy: The researchers write that it can pick out TWSS set-ups with around 70 percent accuracy. But that number would increase to 99.5 percent, Kiddon explains, if every sentence had a 50 percent likelihood of being a TWSS, thus giving the computer more positive results to work with. (As it is, a very small fraction of spoken and written statements make the cut). Now that even computers are engaging in frat-style mockery, we're just waiting for an iPhone app that automatically adds "between the sheets" to our fortune cookie fortunes.

For a look at TWSS in action, check out this highlight reel from The Office.