Chatbot ‘Eugene Goostman’ passes Turing Test, Warwick claims

June 9, 2014

(Credit: Vladimir Veselov and Eugene Demchenko)

The Turing Test was passed for the first time by a chatbot called “Eugene Goostman” on Saturday by convincing 33% of the human judges that it was human, according to Professor Kevin Warwick, a Visiting Professor at the University of Reading and Deputy Vice-Chancellor for Research at Coventry University, in a statement.

The Turing Test, proposed by mathematician and code-breaker Alan Turing in a 1950 paper, investigates whether people can detect if they are talking to machines — in this case, a chatbot simulating a 13-year-old boy — or humans. If a computer is mistaken for a human more than 30% of the time during a series of five minute keyboard conversations it passes the test.

Beyond that, the test design and protocols are not precisely defined, including the knowledge domain of the bot (a 13-year-old would not have to show as much knowledge as an adult, Eugene’s creator Vladimir Veselov points out), the competence of judges, and other factors. The “Turing Test” words have been applied to similar competitions around the world, Warwick notes.

“However, this event involved the most simultaneous comparison tests than ever before [sic], was independently verified and, crucially, the conversations were unrestricted,” said Warwick. “A true Turing Test does not set the questions or topics prior to the conversations. We are therefore proud to declare that Alan Turing’s Test was passed for the first time on Saturday.”

The test was staged during Turing Test 2014 held at the Royal Society in London on the 60th anniversary of Turing’s death, and produced by the University of Reading in partnership with RoboLaw, an EU-funded organization examining the regulation of emerging robotic technologies. The Eugene software was developed in Saint Petersburg, Russia by Veselov, Eugene Demchenko, and Sergey Ulasen.

Among the judges tasked with separating the human and computer participants were the actor Robert Llewellyn, who played robot Kryten in the sci-fi comedy TV series Red Dwarf, and Lord Sharkey, who led the successful campaign for Alan Turing’s posthumous pardon last year.

“Of course the Test has implications for society today,” said Warwick. “Having a computer that can trick a human into thinking that someone, or even something, is a person we trust is a wake-up call to cybercrime. The Turing Test is a vital tool for combating that threat. It is important to understand more fully how online, real-time communication of this type can influence an individual human in such a way that they are fooled into believing something is true…when in fact it is not.”

You can test Eugene Goostman yourself here (if you can get through the congestion).

As Kurzweil wrote in Age of Inteligent Machines Chapter TEN – 2019
(There are prevalent reports of computers passing the Turing Test, although these instances do not meet the criteria (with regard to the sophistication of the human judge, the length of time for the interviews, etcetera) established by knowledgeable observers. There is a consensus that computers have not yet passed a valid Turing Test, but there is growing controversy on this point.)
It’s quite impressive for a chatbot but it’s still only a chatbot. I talked with it for a while, if online version is the same that took part in the competition, ekhm… 33% is weird, more like 15-20%.

The Turing Test, like any other scientific test, is only as good as those who administer it. In this case, the testers seem to have been chosen for their name recognition (e.g. an actor who starred in a sci-if comedy) rather than for their experience in testing chatterbots.

Anyone who’s still remotely excited by this story should google Kevin Warwick’s nickname, “Captain Cyborg”. He has a history of making outlandish claims that are repeated by credulous news outlets with minimal fact-checking (I’m not having a dig at Kurzweil AI – most of the major newspapers in the UK printed his claims verbatim). Sadly, a lot of “journalism” these days consists of cut-n-paste from press releases, so he continues to get away with it. The man is an embarrassment to serious AI research.

Haven’t meant a chatbot yet that can recognize this simple HI, spread across several letters and several lines. Chatbots aren’t coded to understand, they’re coded to “fake it.” Turing test is designed to see if they really understand. They’re notoriously bad at any mental gymnastics as mentioned in an earlier post. (The dice with extra pips. Nice.) Any thing that requires a mind build a mental model of its environment (which even children do with ease) a chatbot is no where close to doing.

Finally I managed to try it, for 5 minutes. It’s outstanding. Not yet human like but very very close, I could have been tricked into believing it was a boy, if i didn’t know better previously. Far better than Cleverbot.

We have no plans to take down this relevant news report, and we have not posted any statements indicating agreement with the conclusions of the event producers, so there’s no substance to your assertion of being “scammed.” A blog post is in the works that will include an assessment of the claims.

This is not verifiable, and the fact that the web site is not working, leads me to think that this is a hoax.

Anyways, there is a way to pass Turing test by emulating an uncolaborative human, which refuses to sustain a conversation. It may look like an annoying human, and leave the doubt about if it is a machine/code or a human.

I asked it:- “What colour is a London Taxi Cab?”
I got this reply:- ‘I’ve never been in London, but I’d really love to visit it! To see their… their… (damn, what do they have…) Oh yes – their Big-Ben!’

Humm… The answer should have been black, like black cab, that is if a 13 year would know that.

Mal.

PS. Before anyone points it out, the UK spelling of ‘colour’ is correct.
I know this a US blog and the normal US spelling it is ‘color’.

“What color is a London taxi cab?” That is a poor question for administering a Turing Test. It tests a piece of specific knowledge rather than true understanding. That so many naive testers ask poor questions like this is the reason so many are deceived and why the Turing Test is starting to get a bad reputation. I hope you are never called upon as a real tester.

The state of general AI would have to be a lot further along to even dream of a machine passing a PROPERLY set up Turing test. Moreover, the first company to accomplish such a feat would probably be IBM, MSFT, Google or Facebook, not some no-name group. I don’t know why KurzweilAI is reporting such a meaningless “achievement”. Chatbots are still variations of Eliza. It signifies nothing.

Currently the test website results in a 404, but I could test it earlier. It is not convincing. But it might work for a casual converstion, if you don’t have the goal to detect it as a bot.

A short sample conversation (the repeating in the last answer was from the bot) :

me: how old are you now?
bot: I’m thirteen.
me: and how old will you be next year?
bot: I’m unpredictable and never can tell what I will do next moment I’m unpredictable and never can tell what I will do next moment

A longer conversation:

me: hi, how are you?
bot: Hello! I’m in a great mood today. How about you?
me: I’m fine, too. I’m writing a computer game. Do you play computer games?
bot: Wow! I like computer games (especially “Need for Speed”) and computers in general, but I’m not a “geek” as you could’ve thought!
me: why do you like “need for speed”?
bot: I like car racings like “Need for Speed” sequel, especially the last one, “Porsche Unleashed”.
me: do you know tetris?
bot: I’m only thirteen. What, you think I should know everything about everyone, or what?. Some another topic?
me: sure. do you play any other computer games?
bot: I love my PC! You know, my favorite game is “Need for speed” – unfortunately, I always finish last, because I obey speed limits and traffic rules!
me: there are traffic rules in the game?
bot: Ummm… Frankly, I didn’t get your question:-(
me: you said you obey the traffic rules, so I guessed the game has traffic rules
bot: I am quite inconsistent in things I say. Oh, what a fruitful conversation;-)

3. Do you converse in Russian? (“Russia is the most known and notorious Ukraine’s neighbor. No-no, the most notorious is Belarus! Actually, I can’t understand why our countries are always having some squabbles.”)

I wouldn’t be surprised if a 13-year-old didn’t understand “converse”, but then the appropriate resposnse would be “what does ‘converse’ mean?”, not some blather about politics.

An important detail that was left out is that Eugene’s use of English is expected to be less than ideal as a non native speaker. Finding out English is this 13 year old’s 2nd language is another trick up their sleeve to sneak more errors by. I agree with a commenter above that celebrity judges may not prove the best choice… but I would think anyone with a love of science fiction movies, or an actor in them could bring some value to the testing table. I am interested to try it out and do my own tests… will be fun to see how much progress they made….

So, 3 different times today, I had a chance for 10 min. each to fire in some simple questions… Not even once did I get the idea that this could be a human. I think this is a hoax that anyone was fooled.

I would have been more excited if a machine passed the Turing test when there are other indications of getting close to strong AI. In fact, AI application are not showing this level of ability yet. This calls into question this particular application of the Turing test. I am not a fan of celebrity judges. It is likely that a well trained group of humans would not have declared “Eugene” indistinguishable from a human.

I think Kurzweil’s original predication still stands and the celebration is premature.