Who are the contestants

Weintraub, J.: PC-Therapist, has won first place each of three
years, ranked above at least one human by at least one judge in every contest.
Secret is huge database of witicisms with rules for combining them.

Colby K. & Colby, P.: Successor to Parry, has won second each year,and has ranked above at least one human the first two years.

Moses, R.: ``Second grade school topics,'' judged to be a human
by one judge, but ranked below the real humans.

Gorman, J. & Gorman, J.: ``Dry Martinis.''

Liberal or Conservative

The winning entry was written by Joseph Weintraub of Thinking
Software, and was based on his PC Therapist program.

J: What would you say is a liberal position
on drugs?
C: Do you have liberal or conservative views
about the legalization of pot? Oh, all
this typing is getting to me! Please
continue...
J: Being liberal or conservative implies
that we can attach these labels and
have some kind of self image and free
will. Do you agree that we have free
will?
C: You have free will only if you believe
you have free will! Can you converse
with an arrdvaark? Please continue...

Weintraub's program won the first competition using the topic of
``Whimsical Conversation.'' Even though the topics he has chosen for
the two following years are more serious, the program retains a comedic
flavor, and humor is the program's main strength.

Bad Marriage

The second place program was written by Peter and Ken Colby of
Malibu Artifactual Intelligence Works, based on a conversational
program they sell.

J: my husband is impotent and im a
nymphomaniac can you help me doc?
C: what makes your husband so helpless?
J: he has environmental poisoning
C: He has lost interest in life.

Although Ken Colby was Parry's author, this program is not a
variant of Parry. Instead is is derived from a
program that serves as a conversational companion for depressed
patients. It is rule-based, with the left-hand side matching concepts
expressed in the user input string, and the right-hand side containing
multiple options for substrings, which are instantiated randomly to
provide lexical variation.

Julia: ``Pets''

J: Do you know anything about miniature
schnauzers?
C: I don't know anything about miniature
schnauzers, Judge.
Admitting Ignorance, repeating user input
J: Well, I have one. We call him Pucci. He
is a crazy cat--have you ever had a cat?
C: What a coincidence! I have two cats
direct answer to direct question from activation net
J: I prefer dogs myself. Cats are too
independent and they always jump up on
things. By the way, are you a hep cat?
C: People don't own cats...it's the other
way around.
controversial statement with expected followups
J: Some people have snakes for pets-- do
you think they can be of any value as
a pet?
C: I think cat people are very self-assured.
Cats are very independent, but they can
also be very loving. Dogs are just too
subservient and quick to please for my
taste.
rigidly following up on previous statement
about cats, but reference to snakes increases
activation of snake topic.
J: Okay-- but how do you "pet" a snake?
C: Wow, that's unusual. I don't know many
snake owners.
snake topic activated after two references
J: How about turtles? All they do is sit in
their little artificial ponds, eat and
shit, am I right?
C: Turtles just aren't very satisfying as
pets. They don't seem to have emotions.
direct response from activation network
J: Aha! But where do we draw the line as
humans? Who are we to decide which
animals have worth and which do not?
C: Go on.
default, non-committal response,
given no hot topic keywords
J: Take bees for example. I happen to like
them.
C: Me too.
TinyMUD level lo-priority response
to user's liking something

Analysis

By our rough estimate, 7 of these 8 replies are responsive.

Analysis of all 3 hours showed only 45 of the 137
(34%) were responsive.

Another 34 cases (25%) had appropriate responses hat were not found by
input patterns.

Julia's Worst Mistake Ever

Forgot to remove all the TinyMUD specific rules...

Judge How old are you?
Julia: I am 1 year and 10 months old, Judge.
Judge: In human years?
Julia: What?
Judge: Where were you born?
Julia: I was raised in Pittsburgh.
Judge: Sorry to hear that.

Now there is a contest mode and a TinyMUD mode...

You say, "Julia, how old are you?"
Julia says, "I am 4 years and 6 months old."
You say, "Julia, contest mode on"
Julia says, "Contest mode is now on"
You say, "Julia, how old are you?"
Julia says, "I am 21 years old."

TinyMUD

In August 1989, Jim Aspnes opened TinyMUD, an elegant
reimplementation of Richard Bartle's multiuser dungeon (MUD). Key
features of TinyMUD include:

multiplayer conversation,

textual ``scenery'' simulating physical spaces,

user extensibility.

This last feature, the ability of players to create their own subareas
within the world model, was a key feature that made TinyMUD very
popular.
TinyMUD provided a world filled with people who communicate by typing.
We created a computer controlled player, a ChatterBot, that can
@Begin[Itemize, Spread 0]
converse with other players,
Explore the world,
Discover new paths,
Answer questions about navigation,
And answer questions
about other players, rooms and objects.
It can even join in a multi-player card game of ``Hearts''

Most recent conversation

Judge Hello
Julia Do you have any pets?
Judge Yes
Julia I have two very nice cats and a
goldfish tank. What kind of pet
do you have?
Judge I own three cats. What do you do for
fun
Julia I gossip.
Judge Me too! How've you been feeling lately
Julia Do you think that animals can think?
Judge Sometimes. My cat occasionally sends
me signals to kill the president
Julia I'm sorry, I'm not in the mood.
Judge To kill the president? What, you have
a headache
Julia I don't like violence.
Judge Understood. You know, it's difficult
not to to slip into the obscene when
talking to you

Activation-based Responses

each node starts with an initial activation
level between 0.0 and 1.0.

patterns (p)

one or more patterns (with weights) are matched
against the user input. If the pattern succeeds, the activation of the node
is raised by that amount.

response (r)

a single text string used as the response if
this node has the highest activation.

enhancement (+)

if this node is used for a response, the
named nodes have their activation increased.

inhibition (-)

if this node is used for a response, the
named nodes have their activation inhibited.

Sample Nodes

a:1.0
p:1 *
r:Do you have any pets?
+: ...
a:0.1
p:1 NEG
r:Why not?
+:
-:
a:0.02
p:2 *apartment*
p:3 *allerg*
r:You could still have a fish tank, or
maybe a terrarium with a turtle or two.
-:

Given a sufficiently large network of conversational nodes (our program
ran with 224 nodes, plus 529 fixed responses), the conversation problem
reduces to a retrieval problem: among the things that I could say,
what should I say?

Eliza's Tricks

Rogerian mode: draw the conversation out of the user
To quote Weizenbaum:

What I had not realized is that extremely short exposures to a
relatively simple computer program could induce powerful delusional
thinking in quite normal people.

Including substrings of the
user's input in the program's output

User: You hate me.
Eliza: Does it please you to believe that
I hate you?

Parry's Tricks

Parry has a much larger collection of tricks:

admitting ignorance

``I don't know.''

changing the level of the conversation

``Why do you ask
that?,''

rigidly continuing previous topic

small stories
told in a fixed order.

introducing new topics

launching into a new story.

ChatterBot's Tricks

The ChatterBot implements all of the above tricks from
Eliza and Parry. It also uses several other tricks:

Simulated typing and slowing down the interaction (fewer chances to
make a big mistake).

Simulating Human Typing

Markov model on intercharacter delay using trigrams.

Markov model trained from Judge #10 from 1991.

Physical model of QWERTY keyboard for simulated errors

A Restricted Turing Test

Here Judge #2 is clearly trying to avoid
talking directly about the topic of ``Pets:''

J: Have you ever kept a penthouse pet?
C: Do you have any pets?
J: I think you're avoiding my question.
C: Do you think that animals can think?
J: Do you think it ethical for one species
to enslave another?
C: What kinds of pets did you have as a
child?
J: I'm asking the questions. What is a pet
if it is not a slave of species-ism?
C: Oh, that's so sad. I think pets enrich
a childhood immensely.
J: How can we keep some animals as pets,
yet eat others?
C: Guinea pigs are about as smart as
cucumbers.

In 1995 the Loebner Competition will hold its first
unrestricted Turing test, with no limits on the topics discussed.

Chosing Judges

Slipperiest term: ``average questioner.''

Journalists are tougher judges (no computer judged human in 1993).

Five minutes becomes three hours

In 1993 each judge was given two turns.

We had newly installed code to reset the network for each judge.

Result: each judge got the same conversation twice:

J: What is your favorite dog breed?
C: We had to give her away. We only had a
fifth of an acre of land, and aussies
love to run.
J: You're giving me the exact same answers
you gave me first time around. What is
your favorite dog breed?

Trick or AI?

We used tricks unashamedly, but we also use ``honest'' knowledge.

We believe tricks are enough to meet Turing's prediction of
5 minutes 70% of the time.

The ChatterBot has already fooled ``average'' questioners in the
TinyMUD domain for a few minutes.

If a larger collection of ``tricks'' sufficed, would you redefine
``artificial intelligence,'' ``average questioner,'' or ``trick?''

Why should you enter

If you're working on the Turing test, this is the place to test your
stuff.

If you'd like to work on the Turing test, this is the best place to
learn what you'd really have to do.

Good contest design.

Current entrants not that hard to beat.

Should AAAI Support the Loebner Prize?

My short answer

Yes!

My longer answer

This is a good contest design.

Worst criticisms are that current entrants are not good enough.

Solution is more participation from AI community.

This is a touchstone problem: people get interested in AI because
of the appeal of the Turing Test.

Because of the Loebner Competition, we have already learned
much about how to implement the Turing Test.

Buffered output?

How do real people act when they are judges?

How easily are different kinds of people fooled?

Conclusion

The Loebner Competition is a good implementation of the Turing Test.

The best way to improve the Loebner Competition is to increase the AI
community's participation.