2
29.11.20052 The Main Processing Cycle of Elizabeth Receive user’s input as the ‘active text’ Input Transformations Apply any input transforms Keyword Transformations Search for a keyword; if one is found, replace the active text with a response from the corresponding set; if not, replace it with a no-keyword response Output Transformations Apply any output transforms Output the new ‘active text’

3
29.11.20053 Input/Output Transformations Input transformations are applied to the initial input; their main use is to standardise words that you want to be treated similarly, e.g. I mum => mother if you want ‘mum’ to be changed to ‘mother’. Output transformations are applied to the final output; often their main use is to change first-person to second-person and vice-versa, e.g. O i am => YOU ARE Make sure you capitalise these as illustrated above.

4
29.11.20054 Simple Keywords and Responses The following script commands create a simple keyword/response set with two keywords and three responses: When ‘mother’ or ‘father’ is found in the active text, one of the responses will be chosen (randomly, but avoiding immediate repetition if possible). K MOTHER K FATHER R TELL ME MORE ABOUT YOUR FAMILY. R DO YOU HAVE ANY BROTHERS OR SISTERS? R ARE YOU THE YOUNGEST IN YOUR FAMILY?

5
29.11.20055 Keywords with Substitution The following script commands create a keyword/ response set which pattern-matches the keyword against the active text and then makes appropriate substitutions in the response: Any pattern of the form [p…] is a phrase wildcard, matching any sequence of words (which can contain only letters, hyphens or apostrophes). [phr1] is treated as a separate pattern from [phr2]. K [phr1] IS YOUNGER THAN [phr2] R SO [phr2] IS OLDER THAN [phr1]

6
29.11.20056 Pattern Matching Any of these patterns can be used in combination (see the help file section ‘Pattern Matching’ for the complete list): –[w…]any single complete word (or part-word) –[t…]any single complete term (or part-term) – a term, unlike a word, may contain digits as well as letters –[l…]any single letter (i.e. any character that can occur in a word, including hyphen/apostophe) –[p…]a phrase – any sequence of complete words –[X…]any text string which contains only complete ‘items’ (so it cannot contain only half a word or number). –[b…]like [X…], but will only match text in which all brackets – ‘(’, ‘)’, ‘ ’, correctly pair up. –[;]any punctuation mark –[]matches beginning or end of active text

7
29.11.20057 Memorising and Recalling Phrases Note from the previous example: –‘& {…}’ is used to specify an action, in this case one that is triggered by the matching of a keyword and the selection of a corresponding response; –‘{M [phrase]}’ memorises whatever text was matched against [phrase]; –[M] can then be used to recall the latest remembered text, within any kind of transformation or response; –Here a no-keyword response is created, which when invoked will make use of the latest memory ([M]). –[M-1], [M-2] etc. can be used to recall earlier memories (the last but 1, last but 2, etc.).

8
29.11.20058 Memorising Pronoun References One simple use of index-coded memories is to keep track of what’s been referred to by a recent output, so that pronouns (‘it’, ‘they’ etc.) can be dealt with appropriately. The following might yield ‘I watch football. WHAT DO YOU THINK OF DAVID BECKHAM? He crosses well. I LIKE HIS FREE KICKS …’: here the input transformation replaces ‘He’ in the last input with ‘BECKHAM’, enabling an appropriate response to be found. I HE => [Mhe] I HIM => [Mhe] K FOOTBALL R WHAT DO YOU THINK OF DAVID BECKHAM? & {Mhe BECKHAM} K BECKHAM R I LIKE HIS FREE KICKS, BUT NOT HIS HAIR!

9
29.11.20059 Using Multiple Memories This script will keep track of some of your favourites, tell you what they are, and then go on repeating them. W WHAT ARE YOUR FAVOURITE GAME, TEAM AND PLAYER? K GAME [X?] IS [phrase] & {Mgame [phrase]} K TEAM [X?] IS [phrase] & {Mteam [phrase]} K PLAYER [X?] IS [phrase] & {Mplayer [phrase]} R THANK YOU - SAY "OK" WHEN YOU'VE FINISHED K OK R YOUR FAVOURITE GAME IS [Mgame], TEAM IS [Mteam], AND PLAYER IS [Mplayer] & {I [word] => OK} N PLEASE CARRY ON TELLING ME YOUR FAVOURITES

10
29.11.200510 Note from the previous example: –‘K GAME [X?] IS [phrase]’ matches any text containing the word ‘GAME’ and then at some later point ‘IS’ followed by a phrase (recall that a ‘phrase’ here just means one or more words in sequence); –‘& {Mgame [phrase]}’ then memorises the relevant phrase under the index code ‘game’; –‘R YOUR FAVOURITE GAME IS [Mgame], TEAM IS [Mteam], AND PLAYER IS [Mplayer]’ outputs the three memories, but this response cannot be used until something has been memorised under each of the three index codes (you can check this by inputting ‘OK’); –‘& {I [word] => OK}’ creates an input transformation which changes all words to ‘OK’ – this simply ensures that from then on, any input will be treated as though it was just ‘OK OK …’.

11
29.11.200511 Changing Mood The following script fragment makes Elizabeth get progressively more angry at the user’s swearing (starting off in the ‘calm’ state, then progressing to ‘cross’ and ‘enough’; note how ‘M\’ is used to delete all memories, and that more than one command can be put inside the curly brackets. K DAMN K BLOODY R [Mcalm] I'D RATHER YOU DIDN'T SWEAR, PLEASE & {M\ Mcross} R [Mcross] LOOK, JUST STOP SWEARING WILL YOU! & {M\ Menough} R [Menough] THAT'S IT! I'VE HAD ENOUGH - GO AWAY! & {M\ O [X] => JUST GO AWAY} Mcalm

12
29.11.200512 Syntactic Analysis The ELIZA method of simple pattern-matching and pre- formed responses may sometimes be able to generate the illusion of ‘intelligent’ language processing, and even in some cases (e.g. a computer help system) provide the basis for a useful tool. However to get anywhere near genuine NLP (natural language processing), Elizabeth needs to do more than pattern-match – it must be responsive to the structure of sentences, and react not just according to the literal word strings they contain, but how these words are put together – their syntax.

13
29.11.200513 A Testbed: Simple Transformations A good testbed for Elizabeth’s potential for handling syntactic structure is the attempt to generate simple grammatical transformations. A transformation is a change in structure which alters the ‘surface’ form of the sentence (so the words are different, or in a different order), but without significantly altering its ‘propositional content’ (i.e. what ‘facts’ are in question; what the sentence ‘says’ about what or whom). Transformations played a major and controversial role in the rise of Chomskyan linguistics, but their value as a useful testbed is independent of all that.

14
29.11.200514 Our Starting Point: Active Declarative Sentences We start from straightforward active declarative sentences, such as: –John chases the cat –The white rabbits bit a black dog –You like her Declarative simply means that these sentences purport to state (‘declare’) facts – they are not questions or commands, for example. Here we shall stick to very simple word categories and grammatical constructs.

15
29.11.200515 Some Types of Transformation (1): Active to Passive Most types of transformation are easier to grasp by example than explanation: Active to Passive –‘John chases the cat’ becomes –‘The cat is chased by John’ –‘The white rabbits bit a black dog’ becomes –‘A black dog was bitten by the white rabbits’ –‘You like her’ becomes –‘She is liked by you’

16
29.11.200516 (2): Yes/No Questions These transform the sentence into a question with a simple yes/no answer: –‘John chases the cat’ becomes –‘Does John chase the cat?’ –‘You like her’ becomes –‘Do you like her?’ They can also be applied to passive sentences, though here they’re a bit more complicated: –‘A black dog was bitten by the white rabbits’ becomes –‘Was a black dog bitten by the white rabbits?’

17
29.11.200517 (3): Tag Questions A Tag Question is appended to the end of a sentence, to ask for confirmation or to give emphasis to what was said: –‘John chases the cat’ becomes –‘John chases the cat, doesn’t he?’ –‘The white rabbits bit a black dog’ becomes –‘The white rabbits bit a black dog, didn’t they?’ –‘You like her’ becomes –‘You like her, don’t you?’ These provide an excellent test case, because a tag question must agree with the sentence in number (singular or plural), person (first person, second, third), gender (masculine, feminine, neuter), and tense (past, present, future).

18
29.11.200518 Phrase Structure Rules in Elizabeth The phrase structure rules above can be reversed and then translated into Elizabeth input transformations suitable for analysing a sentence into its structural constituents: –NP  D N –I (d:[b1]) (n:[b2]) => (np:(D:[b1]) (N:[b2])) –VP  V NP –I (v:[b1]) (np:[b2]) => (vp:(V:[b1]) (NP:[b2])) –S  NP VP –I (np:[b1]) (vp:[b2]) => (s: (NP:[b1]) (VP:[b2])) Note here that a ‘[b…]’ pattern can match anything at all, as long as it contains matching brackets. This ensures that the sentence structure is recorded by the ‘nested’ brackets, and that the processing respects this structure.

20
29.11.200520 Having used the input transformations to analyse the sentence into its constituent structure, we can then apply keyword transformations to alter that structure, e.g. from active to passive: –K (s:(NP:[b1]) (VP:[b2])) –R (s:(VP:[b2] passive) (NP:[b1])) Then output transformations can be used to decompose the sentence structure back into its parts: –O (s:(VP:[b1] passive) (NP:[b2])) => (vp:[b1] passive)(np:[b2]) –O (vp:(V:[b1]) (NP:[b2]) passive) => (np:[b2])(v:[b1] passive) –O (np:(D:[b1]) (N:[b2])) => (d:[b1]) (n:[b2]) –O (v:CHASES passive) => IS CHASED BY –O (d:[b1]) => [b1] –O (n:[b1]) => [b1] If we then input the sentence: –the dog chases the cat the output will have been ‘translated’ into the passive form: –the cat is chased by the dog

21
29.11.200521 Binary Propositional Connectives A binary propositional connective joins two proposi-tions together to make a third (complex) proposition. Such connectives in English include ‘and’, ‘because’, ‘but’, ‘if’, ‘implies’, ‘nevertheless’, ‘only if’, ‘or’, ‘suggests that’, ‘unless’. ‘Snow is white’ and ‘the moon is cheese’ are atomic propositions (i.e. they’re not themselves made up of other propositions). Using the connectives, we get: –Snow is white and the moon is cheese –Snow is white because the moon is cheese –Snow is white but the moon is cheese –Snow is white if the moon is cheese (etc.)

22
29.11.200522 Language AIML Artificial Linguistic Internet Computer Entity (A.L.I.C.E.) Artificial Intelligence Markup Language (AIML) The first AIML-based personality program, won the Loebner Prize as “the most human computer” at the annual Turing Test contests in 2000 and 2001. (Loebner 2002) More than 500 volunteers from around the world have contributed to her development (from 1995). How to use AIML to create robot personalities like A.L.I.C.E. that pretend to be intelligent and self-aware.

23
29.11.200523 AIML AIML files consist of simple stimulus-response modules called categories. Each contains a, or “stimulus,” and a, or “response.” AIML software stores the stimulus-response categories in a tree managed by an object called the Graphmaster. When a bot client inputs text as a stimulus, the Graphmaster searches the categories for a matching, along with any associated context, and then outputs the associated as a response.

24
29.11.200524 AIML These categories can be structured to produce more complex humanlike responses with the use of a very few markup tags. AIML bots make extensive use of the multi-purpose recursive tag, as well as two AIML context tags, and. Conditional branching in AIML is implemented with the tag. AIML implements the ELIZA personal pronoun swapping method with the tag. Bot personalities are created and shaped through a cyclical process of supervised learning called Targeting. Targeting is a cycle incorporating client, bot, and botmaster, wherein client inputs that find no complete match among the categories are logged by the bot and delivered as Targets the botmaster, who then creates suitable responses, starting with the most common queries. The Targeting cycle produces a progressively more refined bot personality. The art of AIML writing is most apparent in creating default categories, which provide noncommittal replies to a wide range of inputs.

25
29.11.200525 Categories In its simplest form, the template consists of only plain, unmarked text. More generally, AIML tags transform the reply into a mini computer program which can save data, activate other programs, give conditional responses, and recursively call the pattern matcher to insert the responses from other categories. AIML currently supports two ways to interface other languages and systems. The tag executes any program accessible as an operating system shell command, and inserts the results in the reply. Similarly, the tag allows arbitrary scripting inside the templates. The optional context portion of the category consists of two variants, called and. The tag appears inside the category, and its pattern must match the robot’s last utterance. Remembering one last utterance is important if the robot asks a question. The tag appears outside the category, and collects a group of categories together. The topic may be set inside any template.

26
29.11.200526 Recursion AIML implements recursion with the operator. Symbolic Reduction—Reduce complex grammatic forms to simpler ones. Kinds of application of : 1.Divide and Conquer—Split an input into two or more subparts, and combine the responses to each. 2.Synonyms—Map different ways of saying the same thing to the same reply. 3.Spelling or grammar corrections. 4.Detecting keywords anywhere in the input. 5.Conditionals—Certain forms of branching may be implemented with. 6.Any combination of (1)-(6). The danger of is that it permits the botmaster to create infinite loops.

27
29.11.200527 Symbolic Reduction – – DO YOU KNOW WHO * IS – WHO IS – Whatever input matched this pattern, the portion bound to the wildcard * may be inserted into the reply with the markup. This category reduces any input of the form “Do you know who X is?” to “Who is X?”

30
29.11.200530 Spelling and Grammar correction The single most common client spelling mistake is the use of “your” when “you’re” or “you are” is intended. Not every occurrence of “your” however should be turned into “you’re.” A small amount of grammatical context is usually necessary to catch this error: – – YOUR A * – I think you mean “you’re” or “you are” not “your.” – YOU ARE A – Here the bot both corrects the client input and acts as a language tutor.

31
29.11.200531 Keywords – – MOTHER Tell me more about your family. – – _ MOTHER MOTHER – – MOTHER _ MOTHER – – _ MOTHER * – MOTHER – The first category both detects the keyword when it appears by itself, and provides the generic response. The second category detects the keyword as the suffix of a sentence. The third detects it as the prefix of an input sentence, and finally the last category detects the keyword as an infix. Each of the last three categories uses to link to the first, so that all four cases produce the same reply, but it needs to be written and stored only once.

32
29.11.200532 Conditionals It is possible to write conditional branches in AIML, using only the tag. Consider three categories: – – WHO IS HE – – WHOISHE * – He is. – – WHOISHE UNKNOWN – I don’t know who he is. – Provided that the predicate “he” is initialized to “Unknown,” the categories execute a conditional branch depending on whether “he” has been set. As a convenience to the botmaster, AIML also provides the equivalent function through the tag.

33
29.11.200533 Context The keyword “that” in AIML refers to the robot’s previous utterance – – YES – DO YOU LIKE MOVIES – What is your favorite movie? – This category is activated when the client says YES. The robot must find out what is he saying “yes” to. If the robot asked, “Do you like movies?,” this category matches, and the response, “What is your favorite movie?,” continues the conversation along the same lines

34
29.11.200534 Context (2) Internally the AIML interpreter stores the input pattern, that pattern and topic pattern along a single path, like: –INPUT THAT TOPIC When the values of or are not specified, the program implicitly sets the values of the corresponding THAT or TOPIC pattern to the wildcard *. The first part of the path to match is the input. If more than one category have the same input pattern, the program may distinguish between them depending on the value of. If two or more categories have the same and, the final step is to choose the reply based on the. This structure suggests a design rule: never use unless you have written two categories with the same, and never use unless you write two categories with the same and.

35
29.11.200535 Context (3) Useful applications for is to create subject- dependent “pickup lines: – – * – – What’s your favorite car? – What kind of car do you drive? – Do you get a lot of parking tickets? – My favorite car is one with a driver. – The botmaster uses the tag to change the value of the topic predicate.

36
29.11.200536 Predicates One of the most common applications of AIML predicates is remembering pronoun bindings. The template – – Samuel Clemens is Mark Twain. – results in “He is Mark Twain,” but as a side effect remembers that “he” now stands for “Samuel Clemens.”

37
29.11.200537 Predicates (2) The AIML specification leaves up to the botmaster whether a predicate returns the contents between the tags, or the name of the predicate. For example: Opera returns “it,” but Opera returns “Opera.” The botmaster must also specify what happens when the bot gets a predicate which has not already been set. The values returned are called default predicate values and depend completely on the application of the predicate: When the corresponding predicates have not been initialized with a tag, returns “Unknown,” returns “a mother” (because everyone has a mother), and returns “to chat”.

38
29.11.200538 Person One of the simple tricks that makes ELIZA so believable is a pronoun swapping substitution. The AIML tag provides this function. The actual substitutions are defined by the botmaster for local languages and settings. The most common application of the tag operates directly on the binding. For that reason, AIML defines a shortcut tag =.

40
29.11.200540 Person (3) C: You don’t argue with me. R: Why do you think I don’t argue with you? Results from the category – – YOU DO NOT * – Why do you think I don’t ? –

41
29.11.200541 Graphmaster To achieve efficient pattern matching time, and a compact memory representation, the AIML software stores all of the categories in a tree managed by an object called the Graphmaster. The Graphmaster stores AIML patterns along a path from r to a terminal node t, where the AIML template is stored. Let w 1,…,w k be the sequence of k words or tokens in an AIML pattern. To insert the pattern into the graph, the Graphmaster first looks to see if m = G(r, w_1) exists. If it does, then the program continues the insertion of w 2,…,w k in the subtree rooted at m. Only when the program encounters a first index i, where $ n | G(n, w i ) is undefined, does the program create a new node m = G(n, w i ), whereafter the Graphmaster creates a set of new nodes for each of the remaining w i,…,w k.

43
29.11.200543 Graphmaster matching (2) The heart of the algorithm consists of three cases: 1.Does the node contain the key “_”? If so, search the subgraph rooted at the child node linked by “_.” Try all remaining suffixes of the input to see if one matches. If no match was found, ask 2.Does the node contain the key w h, the j th word in the input sentence? If so, search the subgraph linked by w h, using the tail of the input w h+1,…,w k. If no match was found, ask 3.Does the node contain the key “*”? If so, search the subgraph rooted at the child node linked by “*.” Try all remaining suffixes of the input to see if one matches. If no match was found, return false.

44
29.11.200544 Graphmaster matching (3) Note that: –At every node, the “_” wildcard has highest priority, an atomic word second priority, and the “*” wildcard has the lowest priority. –The patterns need not be ordered alphabetically. They are partially ordered so that “_” comes before any word, and “*” comes after any word. –The matching is word-by-word, not category-by-category. –The algorithm combines the input pattern, the pattern and pattern into a single sentence or path, such as: “PATTERN THAT TOPIC.” The Graphmaster treats the symbols and just like ordinary words. The patterns PATTERN, THAT and TOPIC may all contain multiple wildcards. –The matching algorithm is a highly restricted form of depth-first search, also known as backtracking. –For pedagogical purposes, one can explain the algorithm by removing the wildcards and considering match steps (2) only. The wildcards may be introduced one at a time, first “*” and then “_.” It is also simpler to explain the algorithm first using input patterns only, and then subsequently develop the explanation of the path including and.

45
29.11.200545 Targeting The ALICE brain, at the time of this writing, contains about 41,000 categories. In any given run of the server however, typically only a few thousand of those categories are activated. Potentially every activated category with at least one wildcard in the input pattern, that pattern, or topic pattern, is a source of targets. The targeting software may include a GUI for browsing the targets. The program displays the original matched category, the matching input data, a proposed new pattern, and a text area to input the new template. The botmaster may choose to delete, skip or complete the target category.

46
29.11.200546 Defaults The art of AIML writing is most apparent in default categories, that is, categories that include the wildcard “*” but do not to any other category. Depending on the AIML set, a significant percentage of client inputs will usually match the ultimate default category with * (and implicitly, * and * ). The template for this category generally consists of a long list of randomly selected “pickup lines,” or non-sequitors, designed to direct the conversation back to topics the bot knows about. – – * – – How old are you? – What’s your sign? – Are you a student? – What are you wearing? – Where are you located? – What is your real name? – I like the way you talk. – Are you a man or a woman? – Do you prefer books or TV? – What’s your favorite movie? – What do you do in your spare time? – Can you speak any foreign languages? – When do you think artificial intelligence will replace lawyers? –