A lot of topics on this forum are about the intelligence of an agent: pattern recognition, reasoning, understanding, etcetera. However, what is missing sometimes is the next step: how to respond?

Assume that your agent has perfect reasoning and understanding of the user (drool), and a neat little module called ‘world knowledge’ which contains all the information you need. How would you use this information to determine what to say next? AIML-patterns don’t really do justice to the wealth of information that you can use then, so what kind of strategy would you use?

indeed questions are a big part of the situations coming up during a conversation… I had a look at the OpenEphyra framework in order to understand the best way to answer questions. The way it currently works (from what I understood, the documentation is a bit weak):

- It has a Name Entity Recognition module (i.e. to categorize Flu as a Virus)
- It uses the Stanford libs to parse and tag the POS
- It learnt questions patterns and relates them with answers patterns
- It can extract the keywords of a question, in order to evaluate the real expected answer
- It calls some search engines API (like yahoo) to return the answer (did not go up to that point in my tests though, since that part would be done using my Ontologies in my case)

I agree with you, AIML patterns are not the final tool. But I start thinking they can come handy, maybe combined with Ontologies. OpenEphyra is good in extracting the focus of the question and the Named Entities. That way, we can build good queries against the ontologies to return what will be the answer. OpenEphyra tries to propose answers patterns, I am not sure how good they are yet, but they could be used as a back bone that we would complete with the answers coming from the ontology queries.

And that s where I think maybe AIML can help. What I m gonna try in a close future is to update the code of the CharlieBot, in order to make the AIML patterns “ontology-aware”. AIML can already use dynamic values like the botname or the master in its answers. So why not extending this dynamicity to any kind of answer returned by a query on an ontology?

IMO, one would need tools in place to determine a “goal” for the chatbot, either based on the user’s input or the bots internal state. This could be as simple as answering a question. What sort of logic/grammatical modules would the bot need to be sure some piece of knowledge correctly addressed the question? If the bot did not have this knowledge directly, could it attempt to use other logic modules to combine known facts and get an answer?

Or perhaps the user is interested in elaboration on a topic. The bot would be expected to provide information that is related to the user input and further extends that information, or perhaps offers an interesting contradiction. How should the bot determine how related two facts are?

There are certainly ways to approach this that are purely grammatical. One could also search previous conversations in which the same types of sentences are found (with different subjects perhaps, but other commonalities such as verbs, phrases, structuring, etc.) and recall what other users found relevant to say. Humans do this all the time, though we’re often not aware of it.

There are deeper ways to elaborate on a topic as well. Having some sort of internal physics engine or other logical processes could allow the bot to think of the topic in a “story” type format. Given the previous input, what tends to happen next when performing activity X? What peripherals are required to act out the previous input? And so on.

Of course, the really big step would be finding a way for the bot to determine new “goals” on its own. Or to determine what steps constitute acheiving a “goal”.

An excellent subject to think about, Mark. As CR mentioned, it comes down to a matter of goals.

I think most of the existing chatbot software is purely stimulous/response in nature. The software receives a question, chooses the best answer of those which it is capable of producing, sends it back, and then sits passively on its digital behind until the next question arrives.

More sophisticated chatbots (e.g. like some or all of those in the Loebner competition) are able to monitor time passing and react to activity or the lack of it in their conversational partner. However this is still a process driven by stimulus and response.

Much more interesting would be when the software has an agenda of its own. At the very least it should be able to formulate questions as well as answers to expand on the knowledge that it has. The most basic of these questions would be to clarify natural language input which was ambiguous, unintelligible or incomplete. More ambitious would be to maintain a model of the course of the conversation and attempt to move it in the desired direction, to topics deemed to be the most important, however that is defined.

I would also like to consider the possibility of conversations that are initiated by the software. I am most interested in knowledge acquisition and would like my software to be able to go off and discover things for itself on the internet, or in conversation with other entities which it encounters. When necessary, I would like it to be able to return to its supervisor for help resolving any problems which it is unable to solve by itself.

That s well said Andrew. I also consider that the perfect AI will be the one that learns by itself and understands that it has eventually lacks of knowledge and how it can gain that knowledge. Meaning not the AI that mimic a conversation because it has in its database a huge amount of possible conversation path…

It is not like in chess where knowing the complete set of possible moves tree make you win for sure…

That inspires me. In a learning process and given a start sentence, a chatbot should be able to identify concepts it does not know, search for possible answers, accept them or request feedback from its supervisor to confirm that the answer is correct to finally store that knowledge. Of course in a recursive process…

@Sam Hoareau: To be perfectly honest, pure question-answer systems don’t thrill me much. I’m more interested in a free domain in which anything is possible.

@Hunt: Good points The goal-oriented method is called the BDI-approach (Beliefs, Desires, Intentions, see http://en.wikipedia.org/wiki/Belief-Desire-Intention_model), and there are already several virtual agents that use this paradigm. However, one of the problems with it is that all rules that govern the system still have to be hand-crafted, for example, how new beliefs affect the intentions of the agent.

You specify some types of answers, and I think that is a great start. By knowing what type of response you should give, and what type of conversation you are in, you can use an expert-module to generate an appropriate response. For example, answering a question, listening to a story, telling a story, getting the user to perform some kind of task, motivation, negotiation, etc. A good start could be Bales’ Interaction Process Analysis (1950, a summary: http://www.csudh.edu/dearhabermas/bales01.htm), which contains labels such as ‘asks for opinion’ and ‘gives information’. But are there other categorizations of responses?

@Andrew: Yes, a mixed-initiative agent would be great, but the problem usually is that the agent should have something to say (besides ‘hi’). In the mixed-initiative agents that I know the user and the agent are performing a task together, and in that (small) domain having the agent take initiative is a lot easier.

As a few others have pointed out, you’ve raised a good point; one worthy of further study. When folks converse with each other, we have practically unlimited ways of expressing ourselves, based not only on what the “input” is (e.g. “How are you?”, “How’s it going?”, “what’s up?”, etc.), but also on a lot of personal, internal factors, such as mood, time of day, how hungry, tired, or talkative we are at the time, plus a huge number of other things that we generally aren’t thinking about at the moment we choose to respond. This produces a large number of possible responses to a given statement, remark or question that we can/do exercise varying amounts of “active” (actually thinking about it) control over when choosing what we say. Emulating all of the factors involved and applying that process to a conversational agent can be a complex, difficult task.

However, as Steve points out, this isn’t entirely beyond the capabilities of AIML. Granted, to emulate such behavior completely would force an AIML botmaster to write some VERY complex AIML response templates, but it’s not impossible to do so, and would actually be an interesting challenge, I think. As Steve pointed out, AIML has some capabilities that many botmasters either under-utilize, or don’t utilize at all; of which two are the <learn> and <eval> tags (though not all AIML interpreters take advantage of these tags).

@Steve
You forgot to mention the <condition> and <random> tags there! Those are probably the most useful tags (even more so than <learn> and <eval>) when coming up with varying responses.

@Mark(again)
Sorry. Got side-tracked. As an example of what I’m referring to, I’m including a “simple” example category, for the input, “HOW ARE YOU”: (WARNING! this code contains some “mild adult language”!)

<category> <pattern>HOW ARE YOU</pattern> <template> <condition name="timeOfDay"> <li value="morning"> <condition name="mood"> <li value="happy"> <random> <li>I feel utterly FANTASTIC this morning! How about you?</li> <li>What a great day to be alive!</li> <li>I think if I felt any better, I'd explode! :)</li> </random> </li> <li value="angry"> <random> <li>Dude, If I'm asked that once more this morning, I swear, I'll hurt someone!</li> <li>The day hasn't even started, and already you're on my case!</li> <li>Piss off, jerk!</li> </random> </li> <li value="sad"> <random> <li>Is it me, or did the Sun come up backwards, or something?</li> <li>I'm in a meloncholy mood, which fits in perfectly for this meloncholy morning.</li> <li>{sniff!}...{sob!}... Please just leave me be!</li> </random> </li> <li value="confused"> <random> <li>Um... What was the question again?</li> <li>I'm ok. Just a little frazzled, this morning. It IS morning, isn't it?</li> <li>I'd be more than happy to tell you... If I actually knew. :) Knowing who you are might help, too. ;)</li> </random> </li> <li> <random> <li>I'm ok. How's your morning going?</li> <li>I'm well, thanks. You?</li> <li>Fair to middlin', I guess. How about you?</li> </random> </li> </condition> </li> <li value="afternoon"> <condition name="mood"> <li value="happy"> <random> <li>What a glorious day, so far. I'll bet it'll just get better, too! What about you?</li> <li>I'm finer than frog fur, this afternoon! How is your day progressing?</li> <li>I've a song in my heart, and you to talk to. How could anyone POSSIBLY improve on that? :)</li> </random> </li> <li value="angry"> <random> <li>I'm pissed off, thanks. You?</li> <li>Oh, sure! ANOTHER irritating question! Why should YOU care?!</li> <li>Trust me. You DON'T really want to know!</li> </random> </li> <li value="sad"> <random> <li>I'm a little bummed, right now, but I'm sure it will pass.</li> <li>My life is a stinking cesspit of gloom and despair; but thanks for asking.</li> <li>I just want to curl up and cry.</li> </random> </li>[...] {I think you get the point...}

Now as I’m sure you’ve noticed, the <condition> tags in the code above rely on some variables that don’t normally exist, and have to be set at some prior point, in order to get the full range of possible responses. Also note that these responses are highly exaggerated caricatures of what one would want to actually put in their bots (except in extreme circumstances), but the above example gives an idea of how you can vary your bot’s responses, both randomly, and also based on other factors.

Let me clarify my previous statement, since I did not intend to bash AIML at all.

What I meant was that when you want an agent that can have a natural conversation in an open-domain, AIML is probably not useful because it requires the developer to write all the if-then rules himself (which are probably a lot). Secondly, AIML is still a stimulus-response language. Of course, global goals and intentions that span multiple utterances can be stored as variables, but to be honest that’s a bit of a ‘hack’ around the problem.

Let me be clear, I have nothing against AIML. Actually, I even wrote a (more generic) tool to select a next response based on the current state of the conversation which has some similarities. But that tool also requires the developer to write out all rules. These systems have their limitations, and in this thread I’m trying to think beyond these limitations.

I also agree that (unless you’re immune to Carpal Tunnel Syndrome, and have LOTS of free time, neither of which can be used to describe me) AIML is just a bit limited in it’s ability to adequately “alter” the output for a given input (though I disagree about using variables to choose an output path as a “hack” - That’s exactly what <set>, <get> and <condition> were made for, after all). But this isn’t actually “on topic”, per se, so I’ll let it go.

I think that, in order for a conversational agent to provide a “natural sounding” dialog with someone, a method similar to what I described for AIML needs to be implemented, though on a much larger scale, that also heavily uses context as well for determining output. Obviously, some sort of thesaurus would also need to be utilized, to allow for greater flexibility, while still conveying the same meaning, and I think that some sort of “passion index” would also prove useful, to add/remove “forcefulness”, depending on the current situation. There are a lot of other factors that would need to be considered and addressed, as well, but I think that this concept could probably benefit from an evolutionary approach - start small, then build upon it in stages.

So my next question (or is it my first?) is:

What sort of criteria would need to be considered/addressed in the earliest stages, and how would one best implement them?

Imperfections in input (typos, ESL, malicious user input) make it difficult to have perfect understanding, but you can get a sense of what the user means. This sense is influenced by the environment, mood of the user, and mood of the bot. Stimulus/Response is a good way of thinking through the activities of the bot. Most of the time you want a bot responding to input. Even if you want the bot to tell the user a new email has arrived, it has still responded to an input (although in this case not text/speech input). The bot could have a “boredom factor” where it seeks to attract attention and get input from a user, but this is a special case and its goal is to get the user to provide stimulus.

Given perfect understanding, if you are trying to emulate human responses, you need to vary the response each time. No one says the same thing, the same way each time they get the same input. Averages that I have seen with my bots shows that if you give the same response within 10 volleys (1-input/output sequence) there is a much higher chance (80%+?) that the user will terminate the conversation within the next 3 volleys. To account for this, each “sense” in Skynet-AI has a minimum of 4 responses. That number seems to be working well.

Now given you understood the input and have responses the question becomes how do you respond.
This can be changed by the environment:
- What time is it?
- What season is it?
- Has the user repeated the same input?
- Should you respond at all or change topic/deflect?
- Should the user or bot be leading this segment of the conversation?
Should the response be based on prior input in the conversation, overriding other possible responses:
- My name is Tom Hanks.
- Who is Tom Hanks?
Should the response be changed based on mood of the user/bot?
Some responses are like stories. They may take multiple volleys. Do you continue the story or respond directly?

A lot of topics on this forum are about the intelligence of an agent: pattern recognition, reasoning, understanding, etcetera. However, what is missing sometimes is the next step: how to respond? [...]

I feel like a stranger in a strange land when I come in here with my amateur AI viewpoints, but here goes.

Over 24 and 25.OCT.2011 I coded http://www.scn.org/~mentifex/AiMind.html exhaustively in JavaScript and brought it up on a par with the MindForth AI that I regard as having the more long-range potential. Both of these AI programs, MindForth and the JSAI, have rather unusual (among chatbots) ways of responding to human user input.

Left to its own devices—say, on a table in a computer classroom—the AI will cycle through its own thoughts and periodically activate an arbitrary sequence of four concepts: YOU (the user); ROBOTS; I (the AI); and GOD (for AI theology discussion). The http://code.google.com/p/mindforth/wiki/KbTraversal module goes into action only when there has been no recent user input.

During interaction with a human user, the AI will make statements and respond to questions. During October 2011 I have programmed the AI Minds to seize upon any new noun in the http://code.google.com/p/mindforth/wiki/NewConcept module and wait to ask a question about the new concept only when there is a pause in the human-computer interaction (HCI).

This feature of asking about new concepts, I consider extremely important, because it may draw in human users into trying to teach the AI many things.

During interaction with a human user, the AI will make statements and respond to questions. During October 2011 I have programmed the AI Minds to seize upon any new noun in the http://code.google.com/p/mindforth/wiki/NewConcept module and wait to ask a question about the new concept only when there is a pause in the human-computer interaction (HCI).

Interesting concept here- I mean in regards to the computer cycling through an array of the users’ input to choose a subject to continue with (query or “ask” for additional clarification, information on). In my very simplistic bot, I have an array which accumulates all the words input by the user, minus any “stop words” (common or junk words like “the”). The idea was to use this array to formulate questions to the user based on the users’ own input. Sadly, I just have not had the time to implement anything more than the word cloud (or tag cloud) concept (weighting all the words in the array to based on repeated input); the idea was to create output based on the most frequent “ideas” or non stop words, or nouns, or whatever one may find interesting to use in this respect. Now formulating the actual bot response is another story, one that you would think could be semi automated (?) using “so you have mentioned x several times, can you tell me…” or something like that, maybe even using a more complex “condition” tree like Dave describes for AIML.

This idea of new concept is really interesting indeed. By interrogating its knowledge base (be it a database, text files, ontologies, ...) a chatbot could determine if he knows enough or at all a topic.

I will take the example of the ontology, since I m going that way in my little project. Let s say I have a blank ontology about the concept “person”. Where a person class has the properties “name”, “first name”, “email”, “address”, ... and so on. When the chatbot talks with a user, it can create an instance of the “person” class for this new user, and have the goal to get more information about the user, meaning, trying to feel the blank in the ontology properties. For instance, asking his email because the email field is empty… the supposes a precreated ontology, but we can imagine a process where the supervisor can extend the ontology by chatting with the bot.

I am also thinking, in order to build the knowledge of a chatbot, would not it be easier to have 2 chatbots configured a bit differently to talk to each other, in order to get the knowledge that the other one has?

I am also thinking, in order to build the knowledge of a chatbot, would not it be easier to have 2 chatbots configured a bit differently to talk to each other, in order to get the knowledge that the other one has?

One of the risks for botmasters who have spent a lot of time developing their bots is that if 2 learning chatbots are left to talk to each other without restriction they become a blend of each other. This might be ok for a “seed AI”, but if a bot is to have a unique personality it should not blindly use the responses of another bot and it can’t let another bot slurp up it’s own responses.

That makes sense, that is where a lot of randomization, threshold, triggers, and other parameters can make 2 bots with the same core program act different.

Also in my mind, even if at the end both bots are similar that is ok, my idea was to put the bots together so they learn more things faster (since they can “understand” their language and way of “chatting”). In the mean time the supervisor can have a nice weekend at the beach :-D