Commentaires 0

Retranscription du document

Multi-Layered Learning Systemfor Real Robot Behavior AcquisitionYasutake Takahashi and Minoru AsadaDepartment of Adaptive Machine Systems,Graduate School of Engineering,Osaka UniversityYamadagaoka 2-1,Suita,Osaka,565-0871,Japan{yasutake,asada}@ams.eng.osaka-u.ac.jpAbstractThis paper presents a series of the studies of multi-layered learning system for vision-based behavior ac-quisition of a real mobile robot.The work of this sys-temaims at building an autonomous robot which is ableto develop its knowledge and behaviors from low levelto higher one through the interaction with the environ-ment in its life.The system creates leaning moduleswith small limited resources,acquires purposive behav-iors with compact state spaces,and abstracts states andactions with the learned modules.To show the valid-ity of the proposed methods,we apply them to simplesoccer situations in the context of RoboCup (Asada etal.1999) with real robots,and show the experimentalresults.IntroductionOne of the main concern about autonomous robots is howto implement a system with learning capability to acquireboth varieties of knowledge and behaviors through the inter-action between the robot and the environment during its lifetime.There have been a lot of work on different learning ap-proaches for robots to acquire behaviors based on the meth-ods such as reinforcement learning,genetic algorithms,andso on.Especially,reinforcement learning has recently beenreceiving increased attention as a method for behavior learn-ing with little or no a priori knowledge and higher capabilityof reactive and adaptive behaviors.However,simple andstraightforward application of reinforcement learning meth-ods to real robot tasks is considerably difcult due to its al-most endless exploration of which time is easily scaled upexponentially with the size of the state/action spaces,thatseems almost impossible froma practical viewpoint.One of the potential solutions might be application of so-called mixture of experts proposed by Jacobs and Jordan(Jacobs et al.1991),in which a set of expert modules learnand one gating systemweights the output of the each expertmodule for the nal systemoutput.This idea is very generaland has a wide range of applications.However,we have toconsider the following two issues to apply it to the real robottasks:Copyrightc 2004,American Association for Articial Intelli-gence (www.aaai.org).All rights reserved.• Task decomposition:how to nd a set of simple behav-iors and assign each of them to a learning module or anexpert in order to achieve the given initial task.Usually,human designer carefully decompose the long time-scaletask into a sequence of simple behaviors such that theone short time-scale subtask can be accomplished by onelearning module.• Abstraction of state and/or action spaces for scaling up:the original mixture of experts consists of experts andand gate for expert selection.Therefore,no more abstrac-tion beyond the gating module.In order to cope with com-plicated real robot tasks,abstraction of the state and/oraction spaces is necessary.Connell and Mahadevan (Connell & Mahadevan 1993)decomposed the whole behavior into sub-behaviors each ofwhich can be independently learned.Morimoto and Doya(Morimoto & Doya 1998) applied a hierarchical reinforce-ment learning method by which an appropriate sequence ofsubgoals for the task is learned in the upper level while be-haviors to achieve the subgoals are acquired in the lowerlevel.Hasegawa and Fukuda (Hasegawa & Fukuda 1999;Hasegawa,Tanahashi,& Fukuda 2001) proposed a hier-archical behavior controller,which consists of three typesof modules,behavior coordinator,behavior controller andfeedback controller,and applied it to a brachiation robot.Kleiner et al.(Kleiner,Dietl,& Nebel 2002) proposed ahierarchical learning system in which the modules at lowerlayer acquires lowlevel skills and the module at higher layercoordinates them.However,in these proposed methods,thetask decomposition has been done by the designers verycarefully in advance,or the constructions of the state/actionspaces for higher layer modules are independent from thelearned behaviors of lower modules.As a result,it seemsdifcult to abstract situations and behaviors based on the al-ready acquired learning/control modules.A basic idea to cope with the above two issues is thatany learning module has limited resource constraint,and thisconstraint of the learning capability leads us to introduce amulti-module and multi-layered learning system;one learn-ing module has a compact state-action space and acquires asimple map from the states to the actions,and a gating sys-tem enables the robot to select one of the behavior modulesdepending on the situation.More generally,the higher mod-ule controls the lower modules depending on the situation.The denition of this situation depends on the capability ofthe lower modules because the gating module selects one ofthe lower modules based on their acquired behaviors.Fromthe another viewpoint,the lower modules provide not onlythe rational behaviors but also the abstracted situations forthe higher module;how feasible the module is,how closeto its subgoal,and so on.It is reasonable to utilize such in-formation in order to construct state/action spaces of highermodules from already abstracted situations and behaviorsof lower ones.Thus,the hierarchical structure can be con-structed with not only experts and gating module but morelayers with multiple homogeneous learning modules.In this paper,we show a series of studies towards theconstruction of such hierarchical learning structure develop-mentally.The rst one (Takahashi & Asada 2000) is auto-matic construction of hierarchical structure with purely ho-mogeneous learning modules.Since the resource (and there-fore the capability,too) of one learning module is limited,the initially given task is automatically decomposed into aset of small subtasks each of which corresponds to one ofthe small learning modules,and also the upper layer is re-cursively generated to cover the whole task.In this case,theall learning modules in the one layer share the same stateand action spaces although some modules need the part ofthem.Then,the second work (Takahashi & Asada 2001)and third one (Takahashi &Asada 2003) focused on the stateand action space decomposition according to the subtasks tomake the learning much more efcient.Further,the forthone (Takahashi,Hikita,& Asada 2003) realized unsuper-vised decomposition of a long time-scale task by nding thecompact state spaces,which consequently leads the subtaskdecomposition.We have applied these methods to simplesoccer situations in the context of RoboCup (Asada et al.1999) with real robots,and show the experimental results.Multi-Layered Learning SystemThe architecture of the multi-layered reinforcement learningsystem is shown in Figure 1,in which (a) and (b) indicatea hierarchical architecture with two levels and an individ-ual learning module embedded in the layers,respectively.Each module has its own goal state in its state space,and itlearns the behavior to reach the goal,or maximize the sumof the discounted reward received over time based on theQ-learning method.The state and the action are constructedusing sensory information and motor command,respectivelyat the bottomlevel.The input and output to/fromthe higherlevel are the goal state activation and the behavior activa-tion,respectively,as shown in Figure 1(b).The goal stateactivation g is a normalized state value1,and g = 1 whenthe situation is the goal state.When the module receivesthe behavior activation from the higher modules,it calcu-lates the optimal policy for its own goal,and sends actioncommands to the lower module.The action command at thebottomlevel is translated to an actual motor command,then1The state value function estimates the sum of the discountedreward received over time when the robot takes the optimal policy,and is obtained by Qlearning.ModuleLearningStateActionGoal State ActivationBehavior ActivationBehavior ActivationGoal State ActivationModuleLearningModuleLearningModuleLearningStateActionStateActionLearningModuleSensorMotor SensorMotorEnvironmentAssignmentTaskTaskAssignmentnarrower scopewider area(a) A whole systemRewardState ActionQ-Learninginstruction from higher levelto execute learned policyBehavior Activation :Goal State Activation :"closeness to its own goal state"normalized state valueActivationBehaviorGoal StateActivation(b) A behavior learning moduleFigure 1:A hierarchical learning architecturethe robot takes the action in the environment.Figure 2:A sketch of a state value functionOne basic idea is to use the goal state activations g of thelower modules as the representation of the situation for thehigher modules.Figure 2 shows a sketch of a state valuefunction where a robot receives a positive reward one whenit reach to a specied goal.The state value function can beregarded as closeness to the goal of the module.The statesof the higher modules are constructed using the patterns ofthe goal state activations of the lower modules.In contrast,the actions of the higher level modules are constructed usingthe behavior activations to the lower modules.Behavior Acquisition on Multi-LayeredSystem(Takahashi &Asada 2000)An Experiment SystemFigure 3 shows a picture of the environment where a mo-bile robot we designed and built,a ball,and a goal are in-Figure 3:A mobile robot,a ball and a goalcluded.It has two TV cameras:one has a wide-angle lens,and the other a omni-directional mirror.The driving mech-anism is PWS (Powered Wheels Steering) system,and theaction space is constructed in terms of two torque values tobe sent to two motors that drive two wheels.Architecture and ResultsIn this experiment,the robot receives the information of onlyone goal for simplicity.The state space at the bottom layeris constructed in terms of the centroids of goal region on theimages of the two cameras and is tessellated both into 9 by9 grids each.The action space is constructed in terms of twotorque values to be sent to two motors corresponding to twowheels and is tessellated into 3 by 3 grids.Consequently,thenumbers of states and actions are 162(9×9×2) and 9(3×3),respectively.The state and action spaces at the upper layerare constructed by the learning modules at the lower layerwhich are automatically assigned.Goal State ActivationBehavior ActivationLearningModuleLearningModuleLearningModuleActionLearningModuleActionGoal StateActivationBehaviorActivationLearningModuleLearningModuleStateStateGoal State ActivationBehavior ActivationLearningModuleLearningModuleLearningModuleStateActionStateActionGoal imageomni cameranormal visionturnbackwardforwardMotor commandturnleft rightGoal imageyxyxFigure 4:A hierarchical architecture on a monolithic statespace14038424332301926647921343324223528233114741020406080100120x020406080100y00.20.40.60.81goal state activationFigure 5:The distribution of learning modules at bottomlayer on the normal camera image251527134321111029543045203046020406080100120x020406080100y00.20.40.60.81goal state activationFigure 6:The distribution of learning modules at bottomlayer on the omni-directional camera imageThe experiment is constructed with two stages:the learn-ing stage and the task execution one based on the learnedresult.First of all,the robot moves at randomin the environ-ment for about two hours.The systemlearns and constructsthe four layers and one learning module is assigned at thetop layer (Figure 4).We call each layer from the bottom,bottom,middle,upper,and top layers.In this ex-periment,the system assigned 40 learning modules at thebottomlayer,15 modules at the middle layer,and 4 modulesat the upper layer.Figures 5 and 6 show the distributionsof goal state activations of learning modules at the bottomlayer in the state spaces of wide-angle camera image andomni-directional mirror image,respectively.The x and yaxes indicate the centroid of goal region on the images.Thenumbers in the gures indicate the corresponding learningmodule numbers.The gures show that each learning mod-ule is automatically assigned on the state space uniformly.1004 25451527133092674119161915611716LowestLayerMiddleLayerUpperLayerFigure 7:Arough sketch of the state transition on the multi-layer learning systemFigure 7 shows a rough sketch of the state transition andyxGoal imageperspective cameraxyGoal imageomni camerayxperspective cameraBall imagexyBall imageomni cameraLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMLMlevel1stlevel2ndlevel3rd4thlevelball omni ball pers.goal pers.goal omniball pers+omniball pers. x goal pers.ball pers.+omni ball x goalball x goalgoa pers.+omnigoal pers.+omniFigure 8:A hierarchy architecture on decomposed statespacesthe commands to the lower layer on the multi-layer learn-ing system during navigation task.The robot was initiallylocated far from the goal,and faced the opposite directionto it.The target position was just in front of the goal.Thecircles in the gure indicate the learning modules and theirnumbers.The empty up arrows (broken lines) indicate thatthe upper learning module recognizes the state which cor-responds to the lower module as the goal state.The smallsolid arrows indicate the state transition while the robot ac-complished the task.The large down arrows indicate thatthe upper learning module sends the behavior activation tothe lower learning module.State Space Decomposition and Integration(Takahashi &Asada 2001)The system mentioned in the previous section dealt with awhole state space from the lower layer to the higher one.Therefore,it cannot handle the change of the state variablesbecause the systemsupposes that all tasks can be dened onthe state space at the bottomlevel.Further,it is easily caughtby a curse of dimension if number of the state variables islarge.Here,we introduce an idea that the system constructs awhole state space with several decomposed state spaces.Atthe bottom level,there are several decomposed state spacesin which modules are assigned to acquire the low level be-havior in the small state spaces.The modules at the higherlevel manage the lower modules assigned to different statespaces.In this paper,we dene the term layer as agroup of modules sharing the same state space,and the termlevel as a class in the hierarchical structure.There mightbe several layers at one level (see Figure 8).Figure 8 shows an example hierarchical structure.At thelowest level,there are four learning layers,and each of themdeals with its own logical sensory space (ball positions onthe perspective camera image and omni one,and goal po-sition on both images).At the second level,there are fourlearning layers.The  ball pers.×goal pers. layer dealswith lower modules of  ball pers. and  goal pers. lay-ers.The arrows in the gure indicate the ows fromthe goalstate activations to the state vectors.The arrows from the1stlevel2ndlevel3rdlevel9910110000 10111ballpers.omniballpers.goalgoalomni4115 111600160001917019190goal pers.xball pers.pers.+omnigoalpers.+omniballball x goal0000000000101110050100150200stepFigure 9:A sequence of the behavior activation of learningmodules and the commands to the lower layer modulesaction vectors to behavior activations are eliminated.At thethird level,the systemhas three learning layers,again.After the learning stage,we let our robot do a couple oftasks,for example,chasing a ball,moving in front of thegoal,and shooting a ball into the goal,using this multi-layerlearning structure.When the robot behaves chasing a ball,the system uses  ball pers. and  ball omni layers at 1stlevel, ball pers.+omni at 2nd level,and  ball pers.+omniat 3nd level.When the robot behaves moving in front ofthe goal,the system uses  goal pers. and  goal omni lay-ers at 1st level, goal pers.+omni at 2nd level,and  goalpers.+omni at 3nd level.And when the robot shoots a ballinto a goal,the system uses all 4 layers at the 1st level,all3 layers at 2nd level, ball x goal layer at the 3rd level,and the layer at the 4th level.All layers at the 1st level and ball pers.+omni and  goal pers.+omni layers are reusedamong the three behaviors.In the case of the shooting be-havior,the target situation is given by reading the sensor in-formation when the robot pushes the ball into the goal;therobot captures the ball and goal at center bottom in the per-spective camera image.As an initial position,the robot islocated far fromthe goal,faced opposite direction to it.Theball was located between the robot and the goal.Figure 9 shows a sequence of the behavior activation oflearning modules and the commands to the lower layer mod-ules.The down arrows indicate that the higher learningmodules re the behavior activations of the lower learningmodules.Action Space Decomposition andCoordination (Takahashi &Asada 2003)Figure 10 shows a picture and a top view of another soccerrobot for middle size league of RoboCup we designed andbuilt.The driving mechanism is same as the last one,and itequips a pinball like kicking device in front of the body.Ifone learning module has to manipulate all actuators simulta-neously,the exploration space of action scales up exponen-tially with the number of the actuators,and it is impracticalto apply a reinforcement learning system.kicking devicewheelFigure 10:A Robot:it has a PWS system vehicle,pin-ball like kicking devices,and a small camera with a omni-directional mirrorFortunately,a complicated behavior which needs manykinds of actuators might be often generally decomposed intosome simple behaviors each of which needs small numberof actuators.The basic idea of this decomposition is thatwe can classify them based on aspects of the actuators.Forexample,we may classify the actuators into navigation de-vices and manipulators,then the some of behaviors dependon the navigation devices tightly,not on the manipulators,while the others depend on manipulators,not on the navi-gation.The action space based on only navigation devicesseems to be enough for acquisition of the former behaviors,while the action space based on manipulator would be suf-cient for the manipulation tasks.If we can assign learningmodules to both action spaces and integrate them at higherlayer,much smaller computational resources is needed andthe learning can be accelerated signicantly.Architecture and ResultsEnvironmentyellow goalimageball imagepotentio Lpotentio RWheel Layergo tocenter circleStateActionGoal State Activationball chaseStateActionGoal State ActivationKick Layerball catchStateActionGoal State Activationkickingdevicewheelblue goalimageStateCarry ball toCenter CircleGoal State ActivationActionActivated module IDof Kick LayerActivated module IDof Wheel LayerUpper LayerhomepositionStateActionGoal State ActivationFigure 11:A hierarchical learning system for the behaviorof placing the ball in the center circle (task1)We have implemented two kinds of hierarchical systemsto check if the basic idea can be realized.Each system hasbeen assigned a task (Figures 11 and 12).One is placing theball in the center circle (task 1),and the other is shooting theball into the goal (task2).We have prepared the following subtasks for the vehicle:Chasing a ball,Looking at the goal in front,Reach-Environmentyellow goalimageball imagepotentio Lpotentio RWheel Layerlookyellow goalStateActionGoal State Activationball chaseStateActionGoal State ActivationKick Layerball catchStateActionGoal State ActivationkickStateActionGoal State ActivationkickingdevicewheelhomepositionStateActionGoal State ActivationStateShootGoal State ActivationActivated module IDof Kick LayerActivated module IDof Wheel LayerUpper Layergo toyellow goalStateActionGoal State ActivationActionFigure 12:A hierarchical learning system for the behaviorof shooting the ball into the goal (task 2)ing the center circle,and Reaching the goal.We havealso prepared the following subtasks for the kicking device:Catching the ball,Kicking the ball,and Setting thekicking device to the home position.Then,the upper layermodules integrates these lower ones.After the learner ac-quired low level behaviors,it puts new learning modules athigher layer as shown in Figures 11 and 12 and learn twokinds of behaviors.We let our robot learn the behavior forthe task 1 (placing a blal in the center circle) rst.The robotacquired Chasing a ball and Reaching the center circlebehaviors for the vehicle,and Catching the ball and Set-ting the kicking device to the home position behaviors forthe kicking device.Then the robot learned the behavior forthe task 2 (shooting the ball into the goal).It reused the be-haviors of Chasing a ball,Catching the ball,and Set-ting the kicking device to the home position,and learnedthe other new behaviors.Figure 13 shows a sequence ofshooting a ball into the goal with the hierarchical learningsystem(see also Figure 12).Figure 13:A sequence of an acquired behavior (Shooting)Task Decomposition based onSelf-Interpretation of Instructions by Coach(Takahashi,Hikita,&Asada 2003)When we develop a real robot which learns various behav-iors in its life,it seems reasonable that a human instructs orshows some example behaviors to the robot in order to ac-celerate the learning before it starts to learn.We proposeda behavior acquisition method based on hierarchical multi-module leaning system with self-interpretation of coach in-structions.The proposed method enables a robot to1.decompose a long term task into a set of short term sub-tasks,2.select sensory information needed to accomplish the cur-rent subtask,3.acquire a basic behavior to each subtask,and4.integrate the learned behaviors to a sequence of the be-haviors to accomplish the given long termtask.Figure 14:Basic concept:A coach gives instructions to alearner.The learner follows the instruction and nd basicbehaviors by itself.Figure 14 shows a rough sketch of the basic idea.Thereare a learner,an opponent,and a coach in a simple soccersituation.The coach has a priori knowledge of tasks to beplayed by the learner.The learner does not have any knowl-edge on tasks but just follows the instructions.In Figure14,the coach shows a instruction of shooting a ball into agoal without collision to an opponent.After some instruc-tions,the learner segments the whole task into a sequenceof subtasks,acquires a behavior for each subtask,nds thepurpose of the instructed task,and acquire a sequence of thebehaviors to accomplish the task by itself.When the coachgives new instructions,the learner reuses the learning mod-ules for familiar subtasks,generates new learning modulesfor unfamiliar subtasks at lower level.The systemgeneratesa new module for a sequence of behaviors of the whole in-structed task at the upper level.The details are described in(Takahashi,Hikita,&Asada 2003).ExperimentsFigure 15 (a) shows the mobile robot.The robot has anomni-directional camera system.A simple color image pro-cessing is applied to detect the ball area and an opponent one(a) A real robotand a ballOpponent agentBallLearning agent(b) Top view of theeldFigure 15:Real robot and environmentin the image in real-time (every 33ms).Figure 15 (b) showsa situation with which the learning agent can encounter.The robot receives instructions for the tasks in the orderas follows:Task 1:chasing a ballTask 2:shooting a ball into a goal without obstaclesTask 3:shooting a ball into a goal with an obstacleFigures.16 (a),(b),and (c) show the one of the examplebehaviors for each task.Figures17 showthe constructed sys-tems after the learning of each task.First of all,the coachgives some instructions for the ball chasing task.The sys-temproduce one module which acquired the behavior of ballchasing.At the second stage,the coach gives some instruc-tions for the shooting task.The learner produces anothermodule which has a policy of going around the ball until thedirections to the ball and the goal become same.At the laststage,the coach gives some instructions for the shooting taskwith obstacle avoidance.The learner produces another mod-ule which acquired the behavior of going to the intersectionbetween the opponent and the goal avoiding the collision.Figure18 shows a sequence of an experiment of real robotsfor the task.Conclusions and Future WorksWe showed a series of approaches to the problemof decom-posing the large state action space at the bottom level intoseveral subspaces and merging those subspaces at the higherlevel.As future works,there are a number of issues to ex-tend our current methods.Interference between modules One module behaviormight have inference to another one which has differentactuators.For example,the action of a navigation modulewill disturb the state transition from the view point ofthe kicking device module;the catching behavior will besuccess if the vehicle stays while it will fail if the vehiclemoves.Self-assignment of modules It is still an important issue tond a purposive behavior for each learning module auto-matically.In the paper (Takahashi & Asada 2000),thesystem distributes modules on the state space uniformly,trajectory of robottrajectory of bal ll(a) Task 1trajectory of robottrajectory of ball(b) Task 2trajectory of robottrajectory of bal l(c) Task 3Figure 16:Example behaviors for taskshowever,it is not so efcient.In the paper (Takahashi,Hikita,& Asada 2003),the system decomposes the taskby itself,however,the method uses many heuristics andneeds instruction from a coach.In many cases,the de-signers have to dene the goal of each module by handbased on their own experiences and insights.Self-construction of hierarchy Another missing point inthe current method is that it does not have the mechanismthat constructs the learning layer by itself.AcknowledgmentsWe would like to thank Motohiro Yuba and Kouichi Hikitafor their efforts of real robot experiments.ReferencesAsada,M.;Kitano,H.;Noda,I.;and Veloso,M.1999.Robocup:Today and tomorrow  what we have learned.Articial Intelligence 193214.Connell,J.H.,and Mahadevan,S.1993.ROBOT LEARN-ING.Kluwer Academic Publishers.chapter RAPIDTASKLEARNING FOR REAL ROBOTS.Hasegawa,Y.,and Fukuda,T.1999.Learning methodfor hierarchical behavior controller.In Proceedings of the1999 IEEE International Conference on Robotics and Au-tomation,27992804.Hasegawa,Y.;Tanahashi,H.;and Fukuda,T.2001.Be-havior coordination of brachation robot based on bahaviorphase shift.In Proceedings of the 2001 IEEE/RSJ Interna-tional Conference on Intelligent Robots and Systems,vol-ume CD-ROM,526531.Jacobs,R.;Jordan,M.;S,N.;and Hinton,G.1991.Adap-tive mixture of local expoerts.Neural Computation 3:7987.Kleiner,A.;Dietl,M.;and Nebel,B.2002.Towards alife-long learning soccer agent.In Kaminka,G.A.;Lima,P.U.;and Rojas,R.,eds.,The 2002 International RoboCupSymposium Pre-Proceedings,CDROM.Morimoto,J.,and Doya,K.1998.Hierarchical rein-forcement learning of low-dimensional subgoals and high-dimensional trajectories.In The 5th International Confer-ence on Neural Information Processing,volume 2,850853.Takahashi,Y.,and Asada,M.2000.Vision-guided be-havior acquisition of a mobile robot by multi-layered rein-forcement learning.In IEEE/RSJ International Conferenceon Intelligent Robots and Systems,volume 1,395402.Takahashi,Y.,and Asada,M.2001.Multi-controller fu-sion in multi-layered reinforcement learning.In Interna-tional Conference on Multisensor Fusion and Integrationfor Intelligent Systems (MFI2001),712.Takahashi,Y.,and Asada,M.2003.Multi-layered learn-ing systems for vision-based behavior acquisition of a realmobile robot.In Proceedings of SICE Annual Conference2003 in Fukui,volume CD-ROM,29372942.Takahashi,Y.;Hikita,K.;and Asada,M.2003.Incrementalpurposive behavior acquisition based on self-interpretationof instructions by coach.In Proceedings of 2003 IEEE/RSJInternational Conference on Intelligent Robots and Sys-tems,CDROM.LM[(Ab;b):(Max;Front)]Worldsensor motorstate action(a) Task 1LM[(Ab;b):(Max;Front)]Worldsensor motorstate 1 actionstate 2LM[(Ab;b;bog):(Max;Don0t care;Min)]upper layerstateLearningModuleactionb,b1 2g,g1 2lower layerLM[shoot](b) Task 2LM[(Ab;b):(Max;Front)]Worldsensor motorstate 1 state 2LM[(Ab;b;bog):(Max;Don0t care;Min)]upperlayerstateLearningModuleactionb,b1 2g,g1 2lowerlayerLM[shoot]LM[(Aop;op;ogop):(Max;Don0t care;Max)]state 3actionupper layerlower layerstateLearningModuleactionb,b1 2g,g1 2(c) Task 3Figure 17:The acquired hierarchical structure1 2 35 6 7 84Figure 18:A sequence of an experiment of real robots:shooting a ball into a goal with an obstacle (task3)