Commentaires 0

Retranscription du document

Developing High-level Cognitive Functionsfor Service RobotsXiaoping Chen,Jianmin Ji,Jiehui Jiang,Guoqiang Jin,Feng Wang,and Jiongkun XieMulti-Agent Systems Lab,School of Computer Science and TechnologyUniversity of Science and Technology of China230026,Hefei,P.R.Chinaxpchen@ustc.edu.cn,{jizheng,jhjiang,abxeeled,fenggew,devilxjk}@mail.ustc.edu.cnABSTRACTThe primary target of this work is human-robot collabora-tion,especially for service robots in complicated applicationscenarios.Three assumptions and four requirements areidentiﬁed.State-of-the-art,general-purpose Natural Lan-guage Processing (NLP),Commonsense Reasoning (in par-ticular,ASP),and Robotics techniques are integrated in alayered architecture.The architecture and mechanisms havebeen implemented on a service robot,Ke Jia.Instead ofcommand languages,small limited segments of natural lan-guages are employed in spoken dialog between Ke Jia and itsusers.The information in the dialog is extracted,classiﬁedand transferred into inner representation by Ke Jia’s NLPmechanism,and further used autonomously in problem-solvingand planning.A series of case study was conducted onKe Jia with positive results,verifying its ability of acquiringknowledge through spoken dialog with users,autonomoussolving problems by virtue of acquired causal knowledge,and autonomous planning for complex tasks.Categories and Subject DescriptorsI.2 [Computing Methodologies]:Artiﬁcial IntelligenceGeneral TermsDesign,ExperimentationKeywordsHuman-robot interaction,Cognitive robotics,Modeling nat-ural language,Knowledge representation1.INTRODUCTIONRemarkable progress has been made on research into in-telligent robots,in particular,service robots.Also in recentyears,there has been an increasing interest in integratingtechniques drawn from areas of AI and Robotics,includingvision,navigation,manipulation,machine learning,plan-ning,reasoning,speech recognition and natural languageprocessing [17,4,15,16,2,3,6,7].One of the most attractive ideas from these eﬀorts ishuman-robot collaboration,according to which robots shouldCite as:Developing High-level Cognitive Functions for Service Robots,X.Chen,J.Ji,J.Jiang,G.Jin,F.Wang,and J.Xie,Proc.of 9th Int.Conf.on Autonomous Agents and Multiagent Systems (AA-MAS 2010),van der Hoek,Kaminka,Lespérance,Luck and Sen (eds.),May,10–14,2010,Toronto,Canada,pp.XXX-XXX.Copyrightc2010,International Foundation for Autonomous Agents andMultiagent Systems (www.ifaamas.org).All rights reserved.not be taken as tools,but rather as our partners [9].Thisespecially applies to service robots.In the most conceivableapplication scenarios like oﬃces and homes,humans wantservice robots help them do a lot of things,which they pos-sess complete mental and physical capabilities of doing thesethings by themselves.Therefore,humans can help robots inthese situations,especially with their knowledge,so that therobots can provide humans with better service,especially inlabor work.Since there would be a long period before robotsgain human-like intelligence,human-robot collaboration isbeneﬁcial and even necessary for many real-world applica-tions in near future,and especially signiﬁcant for the agingsociety.A necessary condition of and crucial means to human-robot collaboration is natural and powerful human-robotcommunication.This in turn demands powerful ability ofcommonsense reasoning and planning,so that the knowledgeacquired can be made use of by the robots autonomously.Based on the consideration,Ke Jia Project has been launched,trying to develop high-level cognitive functions of servicerobots suitable for human-robot collaboration.This eﬀortis based on state-of-the-art,general-purpose NLP and com-monsense reasoning techniques.Ke Jia robot has been implemented in upgrade versionsand a series of case studies has been carried out.In theﬁrst type of case study,Ke Jia was examined with stan-dard tests in RoboCup@home league competitions,such asbuilding a map of an unknown environment,identifying hu-mans,following an unknown person,etc [14].In the secondtype of case study,the robot was given complex tasks,eachbeing composed of more than one simpler task,where eachsimpler task may consist of multiple atomic actions.The ex-periments showed that Ke Jia can understand the intercon-nection between the simpler tasks and make optimal planswith the atomic actions.In the third type of case study,Ke Jia was taught some causal knowledge through spokenhuman-robot dialog.By virtue of the acquired knowledge,the robot succeeded in making “careful” plans whose execu-tion will realize the assigned goals while avoiding unwantedside-eﬀects.All the experiments show that Ke Jia’s abilitycan be substantially raised through human-robot collabora-tion.Section 2 explains our motivations for Ke Jia Project indetail.Section 3 presents the framework and some of itsfeatures.Ke Jia’s main mechanisms,NLP and commonsensereasoning,and their integration are described in Section 4.We report on our case studies in Section 5 and give someconclusions in Section 6.2.MOTIVATIONSWe are concerned with application scenarios which con-form to following three assumptions.The ﬁrst assumptionis well-accepted in the area for most of applications,whilethe second and the third one have not been adopted by allresearchers explicitly.However,we believe all the three holdfor a majority of more complicated real-world applicationsand play an important role in the relevant research.(A1) Common users:Typically,an intelligent service robotis expected to serve untrained and non-technical users.As a consequence,the robot should be equipped withsome intuitive,human-oriented user interface,so thatit can be employed by these users without instruc-tion [17,8,2,3].A huge amount of eﬀorts have beenmade on this “uppermost” requirement and a varietyof techniques for human-robot interaction have beendeveloped,including spoken language recognition,ges-ture recognition,facial perception,etc [8],sometimesintegrated with more traditional techniques such asgraphical user interface.(A2) Human-robot collaboration:In many cases,human usersneed to assist the robot on its missions one way orthe other.An example is task learning,where a hu-man teaches a robot how to perform a speciﬁc taskthrough a combination of spoken commands,obser-vation and imitation of the human’s performing thattask [16].More generally,it is proposed that humanusers and robot(s) should collaborate to solve prob-lems,where humans assist robot(s) with cognition andperception [9].More explanations are given in Section1.(A3) Underspeciﬁcation:Unlike an industrial robot,the tasksof a service robot are frequently underspeciﬁed,ie,not predeﬁned completely,because users usually pro-vide underspeciﬁed descriptions about their intentions(eg,tasks) and the environments are typically unpre-dictable and dynamic [17,2,3].Of course,one canchoose to develop service robots of which the tasksare deﬁned completely in advance.But this choicemeans that the robots have no suﬃcient capabilityto response/adapt to their unpredictable and dynamicenvironments,as well as the users.Based on these assumptions,we identify four requirementsfor Ke Jia Project to meet.(R1) Ability of acquiring knowledge from users.A straight-forward way is to utilize human-robot dialog as meansto acquire a variety of knowledge from users or de-signers,including descriptions about the environmentand the task at hand,even deeper knowledge such ascausation related to the tasks.(R2) Feasibility of the expression of complex tasks.In morecomplicated applications,the robots must be able tohandle complex tasks,not only simple ones.For exam-ple,“clean the house” is a complex task,while “openthe door” is a simple one.A simple task is not neces-sarily an easy task for robots.(R3) Appropriate degrees of autonomy.Although we donot expect a service robot can do everything by it-self,some degree of autonomy is required absolutely.An autonomous robot should be able to make use ofacquired knowledge to solve problems,especially,planfor complex tasks.(R4) Real-time inference.A well-known fact in AI and re-lated areas is that more powerful mechanisms in func-tions are usually more time-consuming in computa-tion.Therefore,it is necessary to take into accountthis fact when developing more powerful intelligent ser-vice robots,since real-time processing is demanded forreal-world applications.An important observation,to our best knowledge,is thatcommands recognition has been the dominate RHI techniquein current eﬀorts on intelligent service robots.With thistechnique,the number of commands and the format of eachcommand are ﬁxed beforehand.In human-robot communi-cation,the robot tries to match each utterance of its userswith a predeﬁned command.Once a command is recognized,it is executed by the robot through running a course of low-level commands,which is manually programmed beforehandor produced by the motion planning module according to thecommand.This technique has shown good performance inmany simple applications [3].For more complicated appli-cations with the requirements above,however,there are stillmany challenges.For example,how to express complex tasks?There aretwo options of specifying complex tasks within the scopeof commands recognition.The ﬁrst one is to use high-level commands,one for each complex task.Generally,themore complicated a task is,the more parameters are con-tained in the corresponding command.For example,thecommand “turn <right>” has just one parameter,while thecommand “move the <red> <bottle> from the <table> tothe <teapoy>” has four.Therefore,as there will be moreand more high-level commands with more and more param-eters,it becomes harder and harder for users to rememberand use these commands.Moreover,given a certain sort oftask,its instances under diﬀerent contexts where the taskis executed may need diﬀerent parameters in the task spec-iﬁcation.For example,if there is more than one red bottleon the table and the user wants to move a particular oneof them,an additional parameter,eg,an attribute whichdistinguishes between this bottle and the others,has to beintroduced into the “move” command above.Obviously,itwould be too diﬃcult or even impossible for the designersto specify beforehand all the necessary parameters of eachhigh-level command.In these cases,the “high-level com-mands” proposal makes infeasible demands on both usersand designers,and contradicts Assumption (A1) and (A3).The other option of employing commands recognition toexpress complex tasks is commands combination,where sim-ple commands,each representing a simple task,are com-bined to specify a complex task.This way a human userhas to choose and arrange in some appropriate order all thesimple commands for the complex task at hand,so that theexecution of these commands in that order by the robotwill fulﬁll the task.This means that it is the human users,not the robot,who are responsible for task planning;there-fore,the robot has a very limited degree of autonomy.How-ever,for many complex tasks,humans may only be able todescribe the goal states they want to reach.So the robotshould take charge of task planning,as well as motion plan-ning,according to Assumption (A1).To meet all the requirements under the assumptions,weproposed an alternative approach based on the state-of-the-art NLP and common-sense reasoning techniques,integratedwith human-robot dialog and motion planning.The mainideas are described below.(1) We take some limited segments of natural language(LSNLs) as RHI languages.A speciﬁc LSNL is formedwith a ﬁxed vocabulary and a simpliﬁed syntax,a sub-set of the syntax of some natural language.With theseLSNLs,service queries,descriptions about the statesof environments,knowledge of the world,instructionsabout new tasks and so on can be expressed in similarways as in everyday spoken language dialog and teach-ing at classes.Accordingly,we employ and developsome NLP techniques for the robot to “understand”the dialog in these LSNLs.(2) We introduced Answer Set Programming (ASP) asknowledge representation and reasoning tool for Ke Jia.ASP is a logic language with Prolog-like syntax and thestable model semantics,and thus a non-monotonic rea-soning mechanism [10].This feature makes it suitablefor handling underspeciﬁcation and further supportingknowledge accumulation and human-robot collabora-tion through dialog.(3) We proposed a layered architecture (Figure 1) to in-tegrate all the techniques and separate the task andmotion planning.This is crucial for reducing computa-tional costs,since current ASP solvers are not eﬃcientenough for motion planning.Figure 1:The layered architecture3.FRAMEWORKAND FEATURESSystem Overview.The hardware framework of Ke Jiarobot is shown in Fig.2.Its sensors include a laser rangeﬁnder,a stereo camera and a set of sonars.The robot hasan arm for manipulating portable items.The computa-tional resources consist of a laptop and a on-board PC.Itis worthwhile emphasizing that neither additional computa-tional resources oﬀ-board nor remote control is needed forthe robot when it performs its tasks.This means all thecomputation is carried out on-board.Similar to RHINO [3]and STAIR [15],distributed and asynchronous processingare adopted,with no centralized clock or a centralized com-munication module in Ke Jia’s system.The software architecture is shown in Fig.1.Ke Jia isdriven by input from human-robot dialog.The informationfrom the dialog is extracted,classiﬁed and transferred intothe task planning module through a three-steps procedureFigure 2:The hardware framework of Ke Jiaof the NLP module.Within the task planning module,theinformation and knowledge are represented as an ASP pro-gram,which may vary fromtime to time along with the newinformation from the dialog and the observation.However,there are some “planning states” where the ASP programkeeps no change and represents a certain task as well asthe related knowledge.Then Ke Jia’s task planning mod-ule generates an optimal high-level plan for the task,andfeeds it to the motion planning module.A low-level plancorresponding to each high-level plan will be generated bythe motion planning module and executed by the robot con-trol module.When needed,however,Ke Jia will ask the userfor further information and re-plan.As mentioned above,we use LSNLs in the spoken human-robot dialog and thusthere are some challenges.Since all the LSNLs we have usedare very small,so far there is no substantial obstacle to theeﬀort on our main goals.Task Planning.There are no well-accepted criteria forthe division of task and motion planning and thus the di-vision depends on designing choices.Some of atomic taskswe deﬁned in Ke Jia Project are listed in Table 1.Eachatomic task,also called an action,is designed as a primitivefor Ke Jia’s task planning and can be handled further byKe Jia’s motion planning.With the speciﬁcation of atomictasks,the division between the task and motion planning ofKe Jia is clearly deﬁned.An outstanding feature of Ke Jia’s actions is underspec-iﬁedness,which supports ﬂexibility of representation andcommunication.In fact,for example,all phrases seman-tically equivalent to “pick-up an item” in the given LSNLare identiﬁed by Ke Jia as instances of this action,althoughmost of the phrases may not specify the action completelyso that it can be executed by a robot.Generally,people can-not aﬀord to explicitly spell out every detail of every courseof action in every conceivable situation [17].Suppose,forexample,there exist two bottles on the table and a user justwants to get a particular one from them.In this case,theuser may express his/her query through a phrase like “bringme the bottle on the table”.If Ke Jia ﬁnds two bottleson the table during its execution of this task,say,when itdrives by the table,it will ask the user to provide furtherinformation.There are lots of more complicated phenom-ena of underspeciﬁcation about objects and their attributesactionfunctiongoto a locationdrive to the assigned location from the current locationpick-up an itempick-up the assigned item,return “importable” if the item is not portableput-down at a positionput down the item in hand at the assigned positionsearch for an objectsearch for the assigned object through sensors and return the position of the object if succeedTable 1:Atomic actionswhich can be resolved with AI technology.To realize these features,it is assumed by default in Ke Jia’sNLP and task planning module that any singular noun rep-resents a single object.This way,Ke Jia’s task planningmodule can plan under underspeciﬁcation,generating (un-derspeciﬁed) high-level plans and providing Ke Jia with thepossibilities of acquiring more (detailed) information if nec-essary.Once new information reducing uncertainties is re-ceived,Ke Jia updates its world model and re-plans if needed.For this purpose,non-monotonic inference is necessary.Thisis one of the main reasons that Ke Jia employs ASP as itsinference tool.Motion Planning and Robot Control.A set of el-ementary actions are deﬁned for Ke Jia’s motion planning.Each elementary action is pre-deﬁned with a ﬁxed set ofparameters,similar to a command in command recognitionin some aspects.Unlike commands,however,elementaryactions are determined by the capabilities of a robot’s hard-ware to a great extent.An elementary action is full-speciﬁedin the sense that if only the values of parameters of an ele-mentary action are assigned,the robot control module willexecute the elementary action “blindly”.For instance,oncethe robot gets the position information of the bottle whileperforming the elementary action “catch the bottle”,it willtry to catch the object in that position,no matter whatobject it is.In fact,it is not the “responsibility” of robotcontrol module and/or elementary actions to identify the“right” objects for acting.For each atomic action in a high-level plan generated byKe Jia’s task planning module,the motion planning modulewill try to make a low-level plan composed of elementary ac-tions,and execute the low-level plan autonomously.This isa special form of hierarchical planning introduced in Ke Jiafor gaining its computational eﬃciency.Currently,we em-ploy heuristic methods for Ke Jia’s motion planning due tothe same reason.World Model.Logically,the set of objects for Ke Jia’splanning is determined by the speciﬁc LSNL,containing allthe individuals expressible with noun phrases of the LSNL.However,most of these individuals may not exist in the en-vironment.On the contrary,only those objects actuallyperceived by the robot constitute the domain of objects.Moreover,the robot may perceive more objects in the en-vironment during the execution of a task.In order to cap-ture the changing domain of actual objects and other “real”attributes of the environment,Ke Jia maintains a worldmodel (as a part of domain KB),which is shared by theNLP,task planning and motion planning module and canbe updated with new information from the human-robot di-alog and/or observation.These modules coordinate theirbehaviors through the shared knowledge and information.Therefore,when Ke Jia acquires new information that thereare two bottles,one red and one green,on the table,it willupdate its world model and ask the user to indicate whichbottle he/she wants to get.4.COUPLINGNLP WITHASPThis section describes the NLP and ASP based common-sense reasoning techniques employed by or developed forKe Jia.Generally,these two techniques are studied or ap-plied separately.We coupled them in Ke Jia system.For each input sentence,the NLP module works in threesteps:(1) Parsing,in which input sentence is parsed withthe Stanford parser [12];(2) Semantic analysis,in which la-beled logic predicates are generated to represent the mean-ing of the input sentence;(3) Pragmatic analysis,in whichthe NLP module detects the linguistic function of the in-put sentence and rewrites the labeled logic predicates intounlabeled ones,so that they are recognizable by ASP.A single input of the NLP module from the human-robotdialog module is a string of words,which is regarded as asentence.It is parsed by the Stanford parser,which returnstwo kinds of information on the syntactic structure of thesentence:a grammar tree after the UPenn tagging style,and a set of typed dependencies between all the words inthe sentence [5].The building of semantic representation is mainly basedon typed dependencies,though sometimes the syntactic cat-egories of words and phrases are required,too.The semanticrepresentation is a set of semantic elements.A semantic el-ement can be either a ﬁrst order predicate,tagged with alabel,or a variable tagged with a label.The arguments ofﬁrst order predicates are labels,so a predicate can be inthe argument list of another predicate.Thus,some secondorder information can be expressed in ﬁrst order settings,such as “a verb predicate is modiﬁed by an adverb predi-cate”.This approach follows Segmented Discourse Repre-sentation Theory(SDRT) [1].Each word corresponds witha semantic element,and each entity in the semantic spacehas its own semantic element,too.Semantic elements aredivided into ﬁve types:modiﬁer,entity,verb,prepositionand conjunction.Modiﬁer elements are used to represent nouns,pronouns,adjectives,adverbs and all words that behave as an NP byitself.They are either standalone or not standalone.Astandalone modiﬁer element always carries its own entity el-ement.Usually,they are derived from nouns or pronouns.A non-standalone modiﬁer element must have a label fromsomewhere else to be ﬁlled in its argument list.Each modi-ﬁer element has only one argument.Entity elements do notcorrespond to any word in the text.They do not have anyarguments,and would not be ﬁnally printed as predicates,but as variables.Verb elements corresponds to verbs,andhave 1-3 arguments to represent its subject,direct objectand indirect object,if there are any;preposition elementscorresponds to prepositions,and have exactly 2 arguments,represent its subject and object,respectively;conjunctionelements correspond to conjunction words,and can have 2or more arguments,representing all its conjuncts.Each type of semantic elements has diﬀerent propertiesand behaviors.With these deﬁnitions,we can deal with allnatural language grammatical phenomena that are speciﬁedby the typed dependencies.The typed dependencies specifythe relations between the words in a sentence.Because eachword corresponds to a semantic element,so the relationsbetween semantic elements are also speciﬁed by the typeddependencies.According to these relation speciﬁcations,thearguments of each semantic element can be determined.For example,in an amod (adjective modiﬁer) dependency,the dependent word modiﬁes the governor word,and the twowords are both modiﬁer elements.For every amod depen-dency,its dependent element and governor element shouldhave same subject.Thus we can assign the subject argu-ment of the governor element to the subject argument ofthe dependent element.For each dependency,we have de-ﬁned how to ﬁll the unﬁlled argument list according to thetype of the dependency.When all the typed dependenciesof the sentence are dealt with,each semantic element willcorrectly represent the entities,properties of the entities andrelations among the entities speciﬁed by the sentence.Thusthe meaning of the sentence is captured in logical forms.Nowpragmatic analysis is launched to transformthe abovelogical forms into ASP programs.When a prepositionalphrase modiﬁes a verb,it should be combined with the verb,producing a composite predicate,and eliminate any refer-ence to another predicate from the argument list of a pred-icate,so that the result is purely of ﬁrst order.In addition,there are more subtle issues related to the words correspond-ing to so-called logic connectives.We will return to theseissues after a brief introduction of ASP.ASP is proposed in [10].An ASP program is a ﬁnite setof rules of the form:H ←p1,...,pk,not q1,...not qm,(1)where pi,1 ≤ i ≤ k,and qj,1 ≤ j ≤ m,are literals,and His either empty or an literal.A literal is a formula of theform a or ¬a,where a is an atom.If H is empty,then thisrule is also called a constraint.A rule consisting of only His called a fact.There are two kinds of negation in ASP,the classical negation ¬ and non-classical negation not,ie,negation as failure.The meaning of not a can be explained assaying that “a is not derivable fromthe program”.Similarly,a constraint ←p1,...,pkspeciﬁes that p1,...,pkcannot bederived jointly from the program.Commonsense knowledge can be represented easily withASP programs.For example,Ke Jia’s ability of ’catch’and the corresponding predicates hold and empty can beexpressed as follows:catch(A,T):− not ncatch(A,T),smallobject(A),time(T),T < lasttime.ncatch(A,T):− not catch(A,T),smallobject(A),time(T),T < lasttime.hold(A,T +1):− catch(A,T),smallobject(A),time(T),T < lasttime.ncatch(A,T):− location(A,X,T),smallobject(A),not location(agent,X,T),number(X),time(T).ncatch(A,T):− not empty(T),smallobject(A),number(X),time(T).empty(T +1):− empty(T),not nempty(T +1),time(T),T < lasttime.nempty(T +1):− hold(A,T +1),smallobject(A),time(T),T < lasttime.hold(A,T +1):− hold(A,T),smallobject(A),time(T),not nhold(A,T +1),T < lasttime.nhold(A,T +1):− empty(T +1),smallobject(A),time(T),T < lasttime.The ﬁrst two rules state that,at any time T,either catch(A,T)or ¬catch(A,T) hold,but not both because of the classi-cal negation (¬catch(A;T) is written as ncatch(A;T) inabove program).The third rule states the eﬀect of the ac-tion catch,if the robot catches an object at time T,thenshe holds the object at time T +1.The next two rules statethe inexecutable conditions for catch,if the robot is not atthe same position with the object or the hand of the robot isnot empty,then she cannot catch the object.The last fourrules are inertia rules for predicates hold and empty,whichconcern the frame problem.Now return to the last step of the NLP module,prag-matic analysis.So far in Ke Jia Project,we have consid-ered two words corresponding to two most important logicconnectives,“if” and “not”.The word “if” has more “pow-erful” functions than a mere predicate.It can function asa variety of entailment,such as logical implication,counter-factual conditional or causal entailment.Since our concerns(human-robot interaction and collaboration,robot planning,etc) are mainly related to commonsense reasoning and causalentailment,we rewrite conditionals (sentences of the form“if-then”) to ASP rules.This implies that any conditionalin our LSNLs is understood by Ke Jia as a default.Simi-larly,word“not”is also understood as a commonsense term.There are three cases where “not” is permitted to appear incurrent LSNL sentences.(i) “not”is used to forma negative,imperative sentence,such as “do not open the door”.Thewhole sentence expresses that something is forbidden andthus should be translated naturally into an ASP constraint.(ii) “not” modiﬁes the main verb of a sentence or clause.These sentences or clauses should be handled similarly to anegative,imperative sentence as above.In particular,neg-ative if-clauses are understood as defeasible conditions andthus rewritten as ASP rules.(iii) “not” modiﬁes “anything”,representing “nothing”.The sentence or clause should betranslated into an ASP rule too.In all the cases,word“not”is translated naturally or approximately into the negation-as-failure operator,not.Most of other usages of “not”,say,modifying nouns or adjectives,should be translated intoclassical negation.We have not handled these cases in Ke JiaProject,since more work is needed to support the implemen-tation.After the whole procedure of Ke Jia’s NLP module,all the sentences from the LSNLs can transformed into ASPprograms,so that the information and knowledge expressedin these sentences can be utilized by the task planning mod-ule.ASP provides a uniﬁed mechanism of handling common-sense reasoning and planning with a solution to the frameproblem.Alot of ASP solvers have been developed too.Onecan solve an ASP program by running it on an ASP solver.The outcomes are called answer sets,each containing a setof literals which can be derived jointly from the program,or intuitively,hold jointly under the program as a KB.Inparticular,an answer set is actually a plan if the programspeciﬁes a planning problem.For instance,the rules listedabove come from a program specifying a planning problem.For sake of eﬃciency,we actually employed more advancedforms of ASP,say action languages C+ [11],to represent theknowledge and specify the planning problems of Ke Jia.5.CASE STUDYWe have conducted several types of case study in the ef-forts on Ke Jia Project.In Case Study 1,we took stan-dard tests in RoboCup@home league competitions as bench-marks,including building a map of an unknown environ-ment,identifying humans,following an unknown person througha dynamical environment,etc [14].The aim of this casestudy is to verify Ke Jia’s “basic capabilities”.Actually,however,it is not necessary for a service robot to pass thesetests by using the main technical contributions described inthis paper.Therefore,we will not present this type of casestudy here.But we believe that the techniques reported inthis paper will be signiﬁcant or even necessary for some newtests or new versions of the current tests in the future.Inthis section,we will describe the second and third type ofcase study,where we took complex tasks and causal reason-ing tasks as benchmark,respectively.5.1 Case Study 2:Complex TasksA complex task is composed of more than one simple task.The more powerful the ability of task planning of a robot is,the better performance for complex tasks the robot will have.Without this ability,some complex tasks such as “clean thehouse” cannot be realized by the robot.In simpler cases,the robot with poor task planning ability cannot carry outcomplex tasks optimally.Figure 3:The initial and the goal state of the com-plex taskFigure 4:Perform two component tasks separatelyThe complex tasks we chose in this case study are servicequeries of following form:“move a from position B to Cand move b from D to E”,given the initial state is shownin Fig.3,where the robot is at location A and portableFigure 5:Perform the complex task optimallyobjects a and b are in position B and D,respectively.Notethat the two component tasks,“move a from position Bto C”and“move b fromDto E”,are not even Ke Jia’s atomicactions.This increases the complexity of the complex task.As a comparison,consider how this kind of tasks will befulﬁlled by a service robot that employs commands recogni-tion/combination technique,provided that it can completeit anyway.Since such a robot only uses command lan-guage in HRI,we assume that the two component tasksbe instances of a command,“move x from y to z”.Ob-viously,by using commands recognition/combination tech-nique,these two commands will be performed successivelyand separately,as shown in Figure 4.Generally this is notan optimal solution to the panning problem.Ke Jia takes the user request as a single complex task andmakes a plan interleaving the execution of the two compo-nent tasks this way:goto(B),pickup(a),goto(D),pickup(b),goto(C),putdown(a),goto(E),putdown(b),as shown in Fig-ure 5.This plan is optimal with respect to the cost ofKe Jia’s atomic actions,which comprises the distances be-tween the locations/positions.So this is a most eﬃcient solu-tion to the panning problem.More importantly,it is worth-while pointing out that this plan was made autonomouslyand completely with the framework and mechanisms de-scribed in Section 3 and 4.No matter whatever the loca-tions/positions are chosen in the environment,Ke Jia is al-ways able to generate an optimal plan for this complex task.This also means that Ke Jia can understand the intercon-nection among the atomic tasks contained in the complextask through its NLP mechanism.These features beneﬁtfrom Ke Jia’s general-purpose mechanism of natural lan-guage understanding and commonsense reasoning.In the experiments,Ke Jia was given the task in more nat-ural forms,such as an English sentence within the LSNL like“give me the green bottle and put the red bottle on the ta-ble.” The experimental environment is a simpliﬁed home en-vironment1.The real-time performance for completing thiscomplex task is also acceptable.Generally,it took about 0.8second for Ke Jia to accomplish the task panning.The com-putation of other modules is less time-consuming.In the experiments,we also tested Ke Jia’s ability of ac-quiring simple knowledge from users through spoken dialog,in order to reduce uncertainty about the task at hand due tothe underspeciﬁedness of task description.At the beginningof the above task,Ke Jia did not know the position of thebottles and asked the user to provide relevant information,although it could ﬁnd them with its search function.Oncethe user had answered the questions,Ke Jia fulﬁlled the task1We made two demos for this case study,seehttp://wrighteagle.org/media/task_comp_090627.mpgand http://wrighteagle.org/media/complex_090910.mpg.successfully.This result shows that Ke Jia can understandand make use of users’ description about the environment,such as “the green bottle is on the teapoy.”5.2 Case Study 3:Causal ReasoningWe tried to test and verify Ke Jia’s ability of handlingmore complicated and diﬃcult problems.One sort of prob-lems we chose involves causal reasoning.It is well-knownthat the capability of reasoning with causation is very im-portant and even crucial for many real-world applicationsand there has been a lot of theoretical research on causal rea-soning.However,though there is work on training robots,eg,teaching a robot how to perform a speciﬁc task througha combination of spoken commands and observation [16],there is no work on complementary ways to robot training,eg,teaching robots causal knowledge only through spokendialog so that the robot gains better capability immediately.This case study aims at both goals at the same time.Figure 6:Setting of Case Study 3The causal reasoning problem we used in this case studyis related to commonsense knowledge about “balance” and“fall”.A testing instance is shown in Figure 6.There is aboard putting on the edge of a table,with one end sticking-out.A red bottle is put on the sticking-out end of the board,and a green bottle is on the other end.If the green bottle wasmoved while the red one was on the sticking-out end of theboard,the red bottle would drop to the ground.The task isto pick up the green bottle under a default presupposition ofavoiding anything falling.To accomplish the task,one musttake some measure ﬁrst against the unwanted side-eﬀect ofaction “catch the green bottle”.Obviously,causal reasoning is needed for making an ap-propriate plan for the task.Moreover,currently no robotscould extract the causal knowledge from observation on thesetting,and thus acquiring the relevant knowledge is neces-sary.One of ways we have been trying is to teach Ke Jia thecausal knowledge through spoken human-robot dialog.Thistask is more complicated and diﬃcult than the complex tasktested in Case Study 2,in the sense that causal knowledge isdeeper than the phenomenal descriptions of environments.We tried two ways in Case Study 3.(1) The ﬁrst one isto input into Ke Jia some causal knowledge which is man-ually programmed in ASP rules,and provide some factsneeded for making use of the causal knowledge.Only fourrules are needed for the speciﬁcation of a relatively sim-ple notion of “balance” and “fall” (see Fig.7).The lit-eral holds(falling(A),t),where holds is a meta-predicate,means that the proposition that object Afalls at time point tholds.Intuitively,the ﬁrst rule states that a long-shape ob-ject will keep balance if there is something on each end of it.The second one states that one end that is not sticking-outis steady.The third and fourth one state that any items onthe sticking-out end would fall if there was nothing on theother,steady end.Using the build-in causal knowledge,Ke Jia accomplishedthe task with some factual information about the task envi-ronment.Ke Jia got the additional information throughspoken dialog with the user,where it was told that thered bottle is on the sticking-out end of the board and thegreen one is on the other end2.Ke Jia produced a planin which the red bottle is moved to a safety place ﬁrst andthen the green bottle is caught.The inference for generatingthe plan costs no more than 0.5 second in the experiments.The results demonstrate that Ke Jia can make use of build-in causal knowledge to accomplish causal reasoning easily.This also implies a possibility of inputting into service robotscommonsense knowledge fromlarge-scale knowledge systemssuch as Cyc [13].(2) The second way of equipping Ke Jia with causal knowl-edge is to teach causal knowledge directly through spokenhuman-robot dialog,instead of inputting manually programmedASP rules.For sake of simplicity,we taught Ke Jia simplercausal knowledge about “fall”,where deeper causal knowl-edge about “balance” hides behind.Although the causalknowledge taught this way is simpler,this second way ofCase Study 3 provides stronger and clearer evidence aboutKe Jia’s ability of natural language understanding and causalreasoning.In the experiments,we took a version of Ke Jia withoutany build-in knowledge about “balance”,“fall”,or any otherequivalents.Ke Jia was told in the dialog that an object willfall if it is on the sticking-out end of a board and there isnothing on the other end of the board.With its NLP mech-anism,Ke Jia extracted the knowledge and transformed itinto ASP ruleholds(falling(A),t) ←stickingout(D),holds(on(A,D),t),endof(D,B),board(B),endof(E,B),not holds(on(C,E),t).Then the ASP program specifying the task at hand was up-dated by adding this rule into it.Ke Jia was also told thatthe red bottle was on the sticking-out end of the board andthe green one was on the other end of the board.Pieces ofthe factual information were also extracted and transformedinto the ASP program.With the knowledge and informa-tion,Ke Jia accomplished the task with the same plan:mov-ing the red bottle ﬁrst and themcatching the green bottle.Itis worthwhile emphasizing the fact that Ke Jia could notaccomplish the task without the causal knowledge acquiredthrough the dialog.This indicates that Ke Jia’s ability issubstantially raised by knowledge acquisition through dialog.For both ways of Case Study 3,using build-in knowledgeand taught by humans,it took no more than 0.5 second forKe Jia to generate the plan.6.CONCLUSIONSThe primary goal of Ke Jia Project is to establish mecha-nisms on a uniﬁed framework suitable for human-robot col-laboration for service robots.We take human-robot col-laboration as an assumption,rather than as just a require-2See the demo at http://wrighteagle.org/en/demo.holds(balance(B,A,C),0) ←holds(on(A,E1),0),holds(on(C,E2),0),endof(E1,B),endof(E2,B).holds(steady(A),0) ←holds(on(A,E),0),not stickingout(E).holds(falling(A),t) ←holds(on(A,B),t),holds(balance(B,A,C),t −1),not holds(on(C,B),t),not holds(steady(A),t).holds(falling(C),t) ←holds(on(C,B),t),holds(balance(B,A,C),t −1),not holds(on(A,B),t),not holds(steady(C),t).Figure 7:Rules for “balance” and “fall”ment.Two more assumptions,common users and under-speciﬁcation,are adopted and accordingly four requirementsare identiﬁed for Ke Jia robot to meet.We employ state-of-the-art NLP and commonsense reasoning techniques asbasic mechanisms for human-robot communication and taskplanning.A series of case studies was conducted with pos-itive results,verifying Ke Jia’s ability of acquiring knowl-edge through spoken dialog with users,autonomous solvingproblems by virtue of acquired causal knowledge,and au-tonomous planning for complex tasks.To a great extent,in this paper we could only report on re-sults we have gotten so far in Ke Jia Project.Actually,thereare a lot of challenges in the eﬀorts,most of which are stillunder investigation.For example,computational eﬃciencyof task planning has been a crucial issue,although we suc-ceeded in making Ke Jia’s computation faster and faster.However,we are optimistic about this issue,since there hasbeen an increasing interest in developing more and more eﬃ-cient ASP solvers,which will provide us with an“additional”source to attack the problem.It is the case for issues in NLP.A challenge particular to us is in the coupling of NLP andASP,ie,establishing a general-purpose mechanism of trans-forming limited segments of natural languages into actionlanguages.We believe that Ke Jia Project will beneﬁt fromall the challenges.7.ACKNOWLEDGMENTSThis work is supported by the National Hi-Tech Projectof China under grant 2008AA01Z150 and the Natural Sci-ence Foundations of China under grant 60745002.We thankFengzhen Lin,Daniele Nardi,Yan Zhang,and Shlomo Zil-berrstein for helpful discussions about this eﬀort.We arealso grateful to the anonymous reviewers for their construc-tive comments.8.REFERENCES[1] N.Asher and A.Lascarides.Logics of conversation.Cambridge University Press,2003.[2] H.Asoh,N.Vlassis,Y.Motomura,F.Asano,I.Hara,S.Hayamizu,K.Ito,T.Kurita,T.Matsui,R.Bunschoten,and B.Kr¨ose.Jijo-2:An oﬃce robotthat communicates and learns.IEEE IntelligentSystems,16(5):46–55,2001.[3] W.Burgard,A.Cremers,D.Fox,D.H¨ahnel,G.Lakemeyer,D.Schulz,W.Steiner,and S.Thrun.Experiences with an interactive museum tour-guiderobot.Artiﬁcial Intelligence,114(1-2):3–55,1999.[4] X.Chen,J.Jiang,J.Ji,G.Jin,and F.Wang.Integrating nlp with reasoning about actions forautonomous agents communicating with humans.InProceedings of the 2009 IEEE/WIC/ACMInternational Conference on Intelligent AgentTechnology (IAT-09),pages 137–140,2009.[5] M.-C.de Marneﬀe,B.MacCartney,and C.D.Manning.Generating typed dependency parses fromphrase structure parses.In Proceedings of FifthInternational Conference on Language Resources andEvaluation (LREC-06),pages 449–454,2006.[6] P.Doherty,J.Kvarnstr¨om,and F.Heintz.A temporallogic-based planning and execution monitoringframework for unmanned aircraft systems.Journal ofAutonomous Agents and Multi-Agent Systems,19(3):332–377,2009.[7] A.Ferrein and G.Lakemeyer.Logic-based robotcontrol in highly dynamic domains.Robotics andAutonomous Systems,56(11):980–991,2008.[8] T.Fong,I.Nourbakhsh,and K.Dautenhahn.Asurvey of socially interactive robots.Robotics andAutonomous Systems,42(3-4):143–166,2003.[9] T.Fong,C.Thorpe,and C.Baur.Robot,asker ofquestions.Robotics and Autonomous Systems,42(3-4):235–243,2003.[10] M.Gelfond and V.Lifschitz.The stable modelsemantics for logic programming.In Proceedings of theFifth International Conference on Logic Programming(ICLP-88),pages 1070–1080,1988.[11] E.Giunchiglia,J.Lee,V.Lifschitz,N.McCain,H.Turner,and J.L.V.Lifschitz.Nonmonotoniccausal theories.Artiﬁcial Intelligence,153(1-2):2004,2004.[12] D.Klein and C.D.Manning.Fast exact inferencewith a factored model for natural language parsing.Advances in Neural Information Processing Systems(NIPS),15:3–10,2003.[13] D.B.Lenat.Cyc:a large-scale investment inknowledge infrastructure.Communications of theACM,38(11):33–38,1995.[14] D.Nardi,J.-D.Dessimoz,P.F.Dominey,L.Iocchi,J.R.del Solar,P.E.Rybski,J.Savage,S.Schiﬀer,K.Sugiura,T.Wisspeintner,T.van der Zant,andA.Yazdani.Robocup@home:Rules and regulation,2009.[15] M.Quigley and A.Y.Ng.STAIR:hardware andsoftware architecture.In AAAI 2007 RoboticsWorkshop,2007.[16] P.E.Rybski,K.Yoon,J.Stolarz,and M.M.Veloso.Interactive robot task training through dialog anddemonstration.In Proceedings of the SecondACM/IEEE international conference on Human-robotinteraction (HRI-07),pages 49–56.ACM,2007.[17] M.Tenorth and M.Beetz.KnowRob —knowledgeprocessing for autonomous personal robots.InProceedings of the IEEE/RSJ InternationalConference on Intelligent Robots and Systems(IROS-09),pages 4261–4266,2009.