Sorry, Internet, a Computer Didn't Actually 'Pass' the Turing Test

A computer tricked a single human into thinking it was actually a 13-year-old Ukrainian boy—big whoop.

If you believe the headlines, a major milestone in the rise of the machines was passed this weekend. The basic gist, from a Press Association report: “A ‘super computer’ has duped humans into thinking it was a 13-year-old boy to become the first machine to pass the Turing test, experts have said.” Of course, other outlets have interpreted this news in their own enthusiastic ways; the sensationalistic Daily Mail, for instance, dialed it up a level, insisting the result showed that the computer "can 'think.'"

But what really happened? A computer program called Eugene Goostman took part in an event at the Royal Society in London, held on the 60th anniversary of Turing’s death, in which five machine intelligences held text-based conversations with a panel of three judges with the intention of tricking them into believing they (the machines) were human. The idea of the challenge, famously devised by Alan Turing in 1950, is that if a machine can have a meaningful conversation with a human, it suggests very strongly that it may be sentient—that it can "think" on some level.

The Goostman software posed as a young Ukrainian boy who wrote in somewhat broken English, successfully convincing one of the three judges, or 33 percent. On this basis, the organizer—cybernetics showman professor Kevin Warwick—declared that they had passed a threshold of 30 percent required to "pass" the test, opening up a new era in artificial intelligence. “This milestone will go down in history as one of the most exciting,” Warwick told the press.

It does seem exciting, yes, but there is a serious problem at play here: The machine didn't actually pass Turing’s test. An AI is said to pass the Turing Test if it can reliably fool human interrogators. As Turing put it in his original paper:

We now ask the question, "What will happen when a machine takes the part of [the test subject] in this game?" Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, "Can machines think?"

The key words here are "reliably" and "often." Turing didn’t ask whether a machine could ever, on a single occasion, convince a human judge that it too is human; he asked whether a machine could do so reliably. In this instance, a single judge was fooled—an impressive achievement, sure, but not exactly the kind of robust, repeatable science described in Turing’s paper.

Another problem is how the judge was fooled. In this respect, the approach Eugene Goostman’s developers took is brilliant—a textbook case of using lateral thinking to exploit loopholes and ambiguities in the rules of a contest.

The program itself is fairly standard stuff, a type of chatterbot that uses some combination of language processing, keyword matching, and large databases of text to choose appropriate responses to text input, thus simulating a conversation. Chatterbots can do some pretty cool and interesting things—Apple’s Siri is one example—but they don’t generally fool humans.

To win this contest, Goostman’s developers added one key ingredient. Rather than try to beat the challenge head on, they tricked a judge into making allowances for their software, according to Wikipedia, by portraying their machine as “a 13-year-old boy from Odessa, Ukraine, with a father who works as a gynecologist, and who owns a pet guinea pig.” Obvious fuck-ups that would normally have given the game up in seconds—nonsense sentences or inappropriate replies to questions—could then be explained by the "boy’s" poor English skills or youth.

It’s clever, and hats off to them, but I doubt it's what Alan Turing had in mind when he devised the test, and it raises all sorts of questions about the rules. Is there a minimum age limit for AIs taking part? Why not have an AI that pretends to be an eight-year-old, or a toddler? What about someone with the English skills of a recently discovered Brazilian tribesman? We shouldn't deny Eugene its achievement, of course—as an exercise in fooling humans, Goostman is fascinating—but it has no real understanding of what it’s saying. It’s a superb piece of engineering, but it isn’t a machine that can think.

All this makes you wonder how useful modern day versions of the Turing test have actually been in driving artificial intelligence research in the last few decades. Researchers in the machine learning field often talk about "strong AI" versus "weak AI." Strong AI is what you’d imagine: a sentient machine, general in purpose and knowledge—think Data from Star Trek, or HAL from 2001, or the Machine from Person of Interest. In contrast, weak AI is more narrow; it has no real intelligence or awareness and relies on fairly specific tricks and techniques to solve a particular problem—think Siri, or predictive texting, or Google’s news clustering algorithms.

Turing devised his test with strong AI in mind. He believed that sentience and information integration in some sort of "conscious" mind would be necessary for a computer to achieve a meaningful dialogue with a human, and that this mind would need to be connected to some way of experiencing the world, perhaps through a mechanical body. “In the process of trying to imitate an adult human mind, we are bound to think a good deal about the process which has brought it to the state that it is in,” he wrote.

In contrast, weak AI has dominated the modern incarnations of his test, and many of the competitors have been little more than browser games—chatterbots designed and engineered specifically to try to pass a pretty low threshold. Eugene Goostman is the strongest competitor to date, and a fantastic bit of work, but I’m sure the developers would be the first to admit that it has little value in terms of strong AI research, while commercially it’s less interesting than more focused applications, such as Siri.

Arguably, researchers have made a lot more progress pursuing other challenges. IBM’s Watson supercomputer, which famously beat human contestants on Jeopardy, is still very much a weak AI—though, since it has something like 80 teraFLOPS of processing power, you wouldn’t say that to its face. It's also far in advance of anything taking part in Turing tests in terms of its ability to integrate information and extract meaning from it, one of the key measures of sentience. Watson may not be able to chat as fluently as Goostman, but it is capable of understanding far more.

Perhaps the most important thing we’ve learned from this is that achieving a 33 percent score on the Turing test isn’t as big a deal as we thought it was, and doesn’t bring us any closer to birthing our future machine overlords. Decades from now, a machine will arrive that can beat Turing’s challenge reliably, and that day will be mildly terrifying. For now, though, I’m not convinced that contests like Professor Warwick’s— with generous terms and a focus on quick wins and PR—are going to bring that day any closer.