Commentaires 0

Retranscription du document

Model-Based Learning for Mobile Robot Navigationfrom the Dynamical Systems PerspectiveJun TaniSony Computer Science Laboratory Inc.Takanawa Muse Building,3-14-13 Higashi-gotanda,Shinagawa-ku,Tokyo,141 JAPANtani@csl.sony.co.jp,http://www.csl.sony.co.jp/person/tani.html(Published in IEEE Trans.System,Man and Cybernetics (Part B),Special Issue on Learning Autonomous Robots,Vol.26,No.3,1996,421–436)November 22,2004AbstractThis paper discusses how a behavior-based robot can construct a “symbolic pro-cess” that accounts for its deliberative thinking processes using models of the envi-ronment.The paper focuses on two essential problems;one is the symbol groundingproblem and the other is how the internal symbolic processes can be situated withrespect to the behavioral contexts.We investigate these problems by applying adynamical system’s approach to the robot navigation learning problem.Our formu-lation,based on a forward modeling scheme using recurrent neural learning,showsthat the robot is capable of learning grammatical structure hidden in the geometryof the workspace from the local sensory inputs through its navigational experiences.Furthermore,the robot is capable of generating diverse action plans to reach anarbitrary goal using the acquired forward model which incorporates chaotic dynam-ics.The essential claim is that the internal symbolic process,being embedded inthe attractor,is grounded since it is self-organized solely through interaction withthe physical world.It is also shown that structural stability arises in the interactionbetween the neural dynamics and the environmental dynamics,which accounts forthe situatedness of the internal symbolic process.The experimental results using amobile robot,equipped with a local sensor consisting of a laser range ﬁnder,verifyour claims.1IEEE Trans. on Systems, Man, and Cybernetics Part B: Cybernetics, Vol.26, No.3, pp.421-436, 1996.1 INTRODUCTIONIn recent research on autonomous robots,the majority of interest has shifted from theAI-based approach to so-called behavior-based robotics [5,31].A consensus,that theemphasis on deliberative computation and explicit representation of AI does not providesatisfactory solutions to the scale-up of toy-problems to real-world complex problems,hasencouraged research in behavior-based robotics.The behavioral functions of these robotsare simple,but their reactive-type action-selection mechanism makes them suitable forreal-world environments.Furthermore,the reaction against explicit coding schemes in AIresulted in initiating a new approach of so-called adaptive behavior [33].Adaptive behav-ior focuses on how an “animal” or “animat” can attain an intrinsic function that coordi-nates perception and action solely based on its own behavioral experience.It has beendemonstrated that an autonomous robot can acquire simple behavioral functions such ascollision-avoidance,wall-following,or road-following by various adaptation methodologiesincluding neural learning [28,37],genetic programming [26],reinforcement learning [8,22]and others.These approaches are equivalent in a general sense in that the aim of theadaptation is to self-organize a direct state-action map which allows situated behaviorsof the agent.However,there is a suspicion that the absence of representations (modeling of theworld) in these approaches might restrict the robot’s progress in emulating the equiva-lent cognitive abilities of animals or humans.An intelligent robot should have a certain“symbolic process”,with which it is capable of simulating its own potential action plansﬂexibly using its internal model,before choosing a course of action.Such high-order cog-nitive activities stand on the combinatorial power of symbol systems [15],which enablethe robot to conduct certain grammatical manipulations of knowledge.We consider thatthe Deliberative Thinking Paradigm of AI itself is not misleading at all.However,theparadigm faces two essential problems.One is the “symbol grounding problem” as Har-nad [15] has discussed,namely “How can the semantic interpretation of a formal symbolsystem be made intrinsic to the system,rather than just parasitic on the meanings in ourheads?” This problem asks us how to build the internal representations,and how to usethemwithout generating fatal gaps fromthe physical data obtained fromthe environment.The other problemis how the symbolic processes can be situated in the current behavioralcontext—i.e.how the robot can recognize its situation,which has been determined fromthe history of its interaction with the environment.The aim of this paper is to provide ananswer to the above problems by presenting our novel approach of model-based learningin the domain of mobile robot navigation.The problem of mobile robot navigation has been studied using explicit global repre-sentation.More speciﬁcally,a robot builds an environmental map,represented in globalcoordinates,by gathering geometrical information as it travels [2,10,13].Though a vari-ety of methodologies have been proposed in this context,potential problems still remainespecially in robot localization.The robot’s position is mathematically identiﬁable bycomparing the current local sensory input (typically range image and dead reckoning)with the global map.This process is not always robust enough for realistic “noisy” en-vironments.There is a gap between what the global map represents and what the robotsenses in the real environment.There have been eﬀorts [27,48] to construct a direct sensor-situation (position) mapby utilizing the idea of Kohonen’s Self-Organizing-Map [25].Although this approach,2which is based on a statistical clustering technique,might be able to generate an intrinsicrepresentation of the sensory-situation association,it seems to have potential limitationsin its localization capability.The position cannot always be identiﬁed solely from thecurrent sensory input since the sensory input could be the same in diﬀerent locations.In order to avoid this problem,Krose [27] utilizes global orientation information fromcompass readings,and Zimmer [48] utilizes dead reckoning information.We,however,speculate that the current situation or position may be identiﬁable without introducingglobal information,but instead from context-dependence:by utilizing the past sensorysequence acquired during its travel.The problem of how to construct context-dependentrepresentation seems to be important in this instance.Kuipers [29],Mataric [32] and others have developed an alternative approach usinglandmarks,which is aimed at achieving behavior-based local representation.A mobilerobot acquires a graphical representation of landmark types as it moves in an environ-ment.This representation is equivalent to a ﬁnite state machine (FSM),and comprises atopological modeling of the environment.Thus,it represents the grammatical structureof the environment with obstacles.In navigation,the robot can identify its topologicalposition by anticipating the landmark types in the FSM representation.Although thisbehavior-based local representation scheme is likely to improve robustness in navigation,its stability is not clear if an erroneous landmark-matching happens to take place:a FSMwould halt if fed an illegal symbol.This navigation strategy is susceptible to such a catas-trophe if the landmark type is misread.Although robustness can be enhanced throughimproving the landmark detection scheme by combining,for example,global positioning(as conducted in [32]) or other sensor-fusion techniques,it would remain limited as longas the model is represented symbolically.In this paper,we focus on the dynamical systems approach [3,19] as an alternative,with the expectation that its language can be utilized to build an eﬀective representationaland computational framework for behavior-based robots.Although one may speculatethat highly analog representations by dynamical systems lack the combinatorial power ofsymbolic systems,a recent study of symbolic dynamics [7] showed otherwise.Crutchﬁeld[7] clariﬁed the relations between formal language [16] and nonlinear dynamical systems,in which he showed that chaotic dynamics correspond to a regular or higher languagein the language hierarchy [16].Furthermore,experimental studies on a recurrent neuralnetwork (RNN) [36,44] demonstrated that the network can learn primitive grammaticaldescriptions from examples by generating deterministic chaos.Therefore,it is quite plau-sible that symbolic processes,which account for the cognitive tasks of planning or mentalsimulation,can be constructed as being embedded in chaotic dynamical systems.Upon describing the internal symbolic processes of the robot using the language ofdynamical systems,we become able to analyse its interactions with the physical environ-ment.We focus on the coupling between the internal dynamics and the environmentaldynamics made through the sensory-motor feedback,then we investigate how coherenceis made between these two dynamical systems.We speculate that a key to solving thesymbol grounding as well as the situatedness problems lies in the qualitative understand-ing of the dynamical mechanism of this coherence.Our analysis of this mechanism willexplain the dynamical structure that is essential to building a behavior-based robot thatis characterized by its model-based intelligent activities.The remainder of this paper is organized as follows.Section II deﬁnes the navigationproblem which we study in this paper.In order to clarify the navigational conditions,3the basic navigation architecture is introduced in Section III.Section IV presents our for-mulation of model-based learning using the forward modeling scheme [19,21,46],whichis implemented using a recurrent neural network (RNN) [18,35,36] architecture.Wedescribe the application of chaotic dynamics to the planning process and discuss its quali-tative meaning from the view point of deterministic dynamics.Section V presents a seriesof experiments using the mobile robot in order to test our approach.Section VI showsan analysis of the dynamical structure that accounts for the mechanisms of situatedness.Section VII discusses and summarizes the themes of this paper.2 THE NAVIGATION PROBLEMBefore presenting the detailed formulation,we describe the navigation problem on whichthis paper focuses.We consider that there are two types of approaches to navigation-learning which are fundamentally diﬀerent in how the navigational knowledge is repre-sented and how it is utilized.The ﬁrst type is skill-based learning.In this approach therobot learns skills for achieving a ﬁxed goal such as a homing or cyclic routing task.Therobot has to go home or move into a predetermined cyclic loop,starting froman arbitraryposition in the adopted workspace.Our previous work [42,43] showed that the robot canachieve these tasks by acquiring an adequate state-action map (a map of sensory-basedinternal states to motor commands).The second type is model-based learning,which isthe main subject of this paper.The advantage of model-based learning is that the processof planning using the internal model enables the robot to adapt ﬂexibly to diﬀerent goaltasks.Our model-based learning approach,applied to a real mobile robot,assumes thefollowing conditions and speciﬁcations.• The robot cannot access its global position,but it should navigate based on its localsensory (range image) input.• There are no explicit landmarks accessible to the robot in the adopted workspace.• No apriori knowledge of the workspace geometry is given.The robot should obtainit from its travel experience.• The robot should be capable of planning the shortest route to an arbitrary position.• The robot should be robust against temporary disturbances including noise andtemporary geometrical changes in the workspace.3 NAVIGATION ARCHITECTUREThe YAMABICO mobile robot [17] was used as an experimental platform.Figure 1 showsa picture of YAMABICO.The robot can obtain range images by a range ﬁnder consistingof laser projectors and three CCD cameras.The ranges for 24 directions,covering a 160degree arc in front of the robot,are measured every 150 milliseconds by triangulation.Thespeciﬁable range is 0.2 m to 5.0 m.The main navigation level computation is conductedon a host computer via wireless communication.The robot maneuvers by diﬀerentiatingthe rotation velocity of the left and right wheels,and it normally moves with a speed of0.3 m/s.4CCD camerasLaser proj ect orFigure 1:The YAMABICO mobile robot equipped with a laser range sensor.In our formulation,maneuvering commands are generated as the output of a compos-ite system consisting of two levels.The control level generates a collision-free,smoothtrajectory using a variant of the potential ﬁeld method [24],while the navigation leveldirects the control level in a macroscopic sense,responding to the sequential branchingthat appears in the sensory ﬂows.The control level is ﬁxed;the navigation level,on theother hand,can be adapted through learning.Firstly,let us describe the control level.The robot can sense the forward range readingsof the surrounding environment,given in robot-centered polar coordinates by ri(1 ≤ i ≤N),as shown in Fig.2.The angular range proﬁle Riis obtained by smoothing the originalrange readings through applying an appropriate Gaussian ﬁlter.The maneuvering focusof the robot is the maximum (the angular direction of the largest range) in this rangeproﬁle.The robot proceeds towards the maximum of the proﬁle (an open space in theenvironment).This control scheme is implemented as follows:Vdif= kp∙ θf(1)where Vdifis the diﬀerential rotational velocity between the left and right wheels,θfisthe angular displacement of the focus point from the center,and kpis a constant gain.The navigation level focuses on the topological changes in the range proﬁle as therobot moves.Fig.3 (a) shows a robot trajectory measured in an experimental workspace;Fig.3 (b) shows the corresponding temporal sensory ﬂow.After starting fromthe “start”,the robot moves through the workspace by tracking the maximum.The correspondingrange proﬁle contains a single maximum.(In the shaded sequence,the middle part,corresponding to a larger range,is darker.The sides having a smaller range are brighter.)As the robot moves through the workspace,the proﬁle gradually changes until another512NN12focusri: range readi ngsRi: smoot hed range prof i l eFigure 2:The Range proﬁle is obtained from the frontal side of the robot.The robotmoves by tracking the maximum in the range proﬁle.local maximum appears when the robot reaches location (1) and it perceives a new openarea to the left.At this moment of branching the navigation level decides whether totransfer the focus to the new local maximum or to remain with the current one.(In thisimplementation,the occurrence of branching is not conﬁrmed until a time lag T so thatthe sensing of branching may not be perturbed by noise.) In this example,the robotdecides to transfer the focus to the new maximum and proceeds to the left.In the samemanner,the decision process is repeated at point (2) where the focus transfers to a newlocal maximum on the left-hand side,and at point (3) where it stays with the currentbranch by traveling forwards,and at (4) where it transfers to a new branch to the left.These binary branching decisions generate the trajectory shown.In some instances,therobot maneuvers into a concave dead end from which the robot cannot escape with theabove maneuvering scheme.To avoid this,the navigation scheme is enhanced as follows:when the robot comes extremely close to a dead end,the robot is instructed to turnthrough 180 degrees.Thereafter it proceeds as usual.(The concave dead end is easilydetected by monitoring the range values in the forward direction with respect to a certainthreshold.) Hereafter,we denote this dead end point as the “terminal point”.Although the proposed binary branching scheme simpliﬁes the problem of navigationsubstantially,a technical diﬃculty arises when multiple branching situations take places–i.e.more than two new branches are sensed simultaneously.In such singular situations,the robot takes the right most (or the left most) new branch as the new one,thereby losingthe opportunity to select other new branches.Therefore,with the current navigationscheme,there are cases in which the robot cannot take all the possible paths allowed bythe geometry of the workspace.61234start1234(a)(b)Figure 3:An example of travel and its associated sensory ﬂow.7cont rol l evell ocal t ravel di st anceraw range datanavi gati onl evelKohonen mapV.Q.compressed rangeimagemotor command(branchi ng deci si on)wheel cont rol si gnalFigure 4:The navigation architecture employed here,comprising the control and naviga-tion levels.The navigation level utilizes two types of sensory inputs at branch or terminal points.Those are the current range image and the local travel distance from the previous to thecurrent point as measured by the wheels’ rotational encoders.The ﬁltered range proﬁleconsists of N = 24 range values.Since the pertinent information in the range proﬁle ata given moment is assumed to be only a small fraction of the total,we employ a vectorquantization technique,known as the Kohonen network [25],so that the information in theproﬁle may be compressed into speciﬁc lower-dimensional data.The Kohonen networkemployed here consists of an l-dimensional lattice with m nodes along each dimension(l=3 and m=6 for the experiments with YAMABICO).The range image consisting of24 values is input to the lattice,then the most highly activated unit in the lattice,the“winner” unit,is found.The address of the winner unit in the lattice denotes the outputvector of the network.The virtue of this scheme is that the original topological structureof the input space is well-preserved in the output space,which assures the generality ofthe transformation because the output vector is a smoothly-varying function of the inputproﬁle.Although the real range image exhibits its stochastic distribution at each branchpoint,the Kohonen network is capable of clustering such noisy range image informationwith a topology-preserving map that is self-organized in the oﬀ-line learning phase.The navigation architecture presented in this section is summarized in Fig.4.In thisarchitecture,the navigation problem is simpliﬁed to one of determining the branchingsequence.Hereafter,we focus on how the navigation level achieves this.We use the terms”motor command” and ”motor program” to indicate a branching decision and a sequenceof branching decisions,respectively.4 MODEL-BASED LEARNINGThis section describes howthe robot learns the internal model of the environment,and howsuch a model can be utilized to generate navigation plans—i.e.what motor programs areneeded in order to reach a given goal.Here,we attempt to apply the scheme of forwardmodeling [19,21,46] to the problem.Before presenting our formulation,we show theoutline of the robot’s operation in model-based learning.8First,the robot goes through the learning phase.The robot wanders around anadopted workspace by collision-free maneuvering,making each branching decision at ran-dom.Meanwhile the robot collects the sensory-motor sequence—i.e.the sequence ofbranching decisions and the resulting sequence of sensory input perceived at the branchpoints.Thereafter,the robot attempts to acquire a “topological” model of the workspacein terms of a forward model through oﬀ-line neural learning.After learning has takenplace,we examine how accurately the robot has learned the model of the workspace bymeasuring the robot’s capability in lookahead prediction.If the learning of the model isfound to be insuﬃcient,the above learning process is repeated through sampling moredata.After the learning phase is completed,the navigation phase can be initiated.In thenavigation phase,the robot conducts plan-based navigation.A branch or terminal pointis selected as a goal position,which is speciﬁed to the robot by using the sensory inputthat would be obtained at that position.The robot has to generate a motor program (thesequence of branching decisions) to reach the goal by the shortest route.After completing the oﬀ-line learning,initially the robot cannot recognize its situa-tion/position and therefore it cannot initiate any planning activities.The problem toconsider is how one can situate the robot in the environment.In addressing this problem,we use two distinct operational modes,namely the open-loop mode and the closed-loopmode,and introduce a switching mechanismbetween them.First,the robot travels aroundthe workspace receiving sensory input at each branch.In the meanwhile the robot beginsto recognize the current situation/position from what it has sensed during its travel.Wecall this mode the “open-loop mode” since the internal computation of the robot is cou-pled to the environment via the sensory-motor loop.When the robot becomes situated,itis ready for planning activities.By shutting oﬀ the sensory-motor loop with respect to theenvironment,the robot simulates internally its sensory-motor sequence,then generates amotor program to reach a given goal point.This is the closed-loop mode.Once a motorprogram is generated,it is executed with the robot once again switched to the open-loopmode.4.1 Forward modelingThe idea of forward modeling has been used to explain planning and trajectory controlfor voluntary human arm movements as well as for industrial manipulators [19,46].Inthe motor skill learning for an arm,the transformation of proximal coordinate systems(e.g.,joint torques of the arm) to distal coordinate systems (e.g.,endpoint coordinatesfor the arms) is trained on a feed-forward network which serves as a forward dynamicalmodel for the arm.Once this forward model is acquired,the necessary temporal jointtorque,for a given speciﬁcation in distal coordinates such as the arm endpoint,can becomputed by the network through the forward model.This is called “computation ofinverse dynamics”.This framework can be applied to our problem of navigation learning.By learning the forward model of the workspace,the robot becomes able to conductlookahead prediction of the sensory input sequence for an arbitrary motor program—i.e.it can simulate the resultant travel from a given navigation plan.Also,the acquiredforward model can generate a motor programto reach a goal,speciﬁed by its distal sensoryimage,through computation using inverse dynamics.This sub-section explains how theforward model can be used as a predictor of the sensory input sequence,and how such9: context uni tspn+1pncnxncnFigure 5:Forward model using the RNN architecture.The dotted line indicates the closedsensory loop which is used for lookahead prediction in multiple steps.a forward model can be learned by means of a recurrent neural network (RNN).Thescheme of generating a motor program using inverse dynamics will be explained in thenext sub-section.The RNN architectureThe objective forward model is embodied using a standard discrete time RNNarchitecture[11,18] as shown in Figure 5.This RNN architecture receives the current sensory inputpn,the current motor command xn,then outputs the prediction of the next sensory inputˆpn+1.Here,pnand xnare a vector and a scalar respectively.A standard sigmoid typefunction [38] is employed to compute the activation of each neural unit.The sensory inputp consists of the compressed range proﬁle probtained from the output of the Kohonennet and the local travel distance pl.The motor command xntakes a binary value of 0(corresponding to staying at the current branch) or 1 (corresponding to transit to a newbranch).We employ the idea of the context loop [11,18] which enables the network to obtain acertain temporal internal representation.(In Figure 5,there is a feedback loop from thecontext units in the output layer to those in the input layer.) The current context inputcn(a vector) is a copy of the context output at the previous time step:by this means thecontext units remember the previous internal state.The navigation problemis an exampleof a so-called “hidden state problem” [30]:a given sensory input does not always representa unique situation/position of the robot.Therefore,the current situation/position isidentiﬁable,not by the current sensory input,but by the memory of the sensory-motorsequence stored during travel.Adequate temporal internal representation of the travelhistory,by taking advantage of the context loop,can achieve just such a memory structure.Here,the mapping function of the RNN can be written as;cn+1= fc(pn,xn,cn,Wc) (2)ˆpn+1= fp(pn,xn,cn,Wp)10where fcand fpare the nonlinear maps from the current step to the next step;WcandWpdenote their parameter sets of connectivity weights.These connectivity weights aredetermined through the training of the RNN,the methodology of which will be describedlater.Using the forward modelAs we have described earlier,the forward model represented by this RNN architecture isswitched to the open-loop mode before it is used in the closed-loop mode in the navigationphase.In the open-loop mode,the robot conducts the one-step lookahead prediction (itpredicts next sensory input as the result of the current motor command) while it travels inthe workspace using an arbitrary motor program.The one-step prediction is obtained byinputting the current sensory input and the current motor command to the network.TheRNN,at the beginning of travel,cannot predict the next sensory input correctly since theinitial context value is set randomly.However,the context value can become situated asthe RNN continues to receive the sensory-motor sequence.Then the RNN will begin topredict correctly.(In section VI we will explain the underlying mechanism of situatednessin detail.) This situatedness also accounts for the auto-recovery mechanism of the robotfrom miscellaneous temporary disturbances during its travel.Although the robot mightloose its context by sudden noise or temporary geometrical changes in the workspace,itcan recover the context as long as it continues to travel using the sensory input sequence.After the prediction in the open-loop mode recovers,the RNN can be switched intothe closed-loop mode by stopping the robot at a branch point.A lookahead prediction ofan arbitrary length for a speciﬁed motor program can be made by copying the previousprediction of the sensory input to the current sensory input.(The dotted line in Figure 5indicates how the closed-loop for sensory input is made.) Let us denote the motor programby x∗:(x0x1x2∙ ∙∙).Then the lookahead prediction of the sensory input sequenceˆp∗:( ˆp1ˆp2ˆp3∙ ∙∙) can be obtained by recursively applying x∗ to the RNN mapping function,using the initial values of the context units c0and the sensory inputs p0which have beenobtained in the open-loop mode.Learning the forward modelWe will now describe brieﬂy how to determine the connectivity weights by using thesensory-motor sequence sampled during the actual wandering travel of the robot.Thetraining of the RNN searches for the optimal Wcand Wpsuch that the RNN switched tothe closed-loop mode can make a correct lookahead predictionˆp∗ for the sampled sequencep∗ using the associated motor programx∗.(This search process should also determine thevalue of initial context c0that produces a correct lookahead prediction for the sampledsequence.) Therefore,the network is trained to minimize the cost function J given by:J = 1/2Xn(pn− ˆpn)T(pn− ˆpn) (3)The optimal connectivity weights and initial context minimizing the cost J in (3) arecomputed using the back-propagation through time (BPTT) algorithm [38].The RNNis transformed into a cascade network without loops.The forward computation of thisnetwork with the temporal connectivity weights and the temporal initial context value c0generates the temporal lookahead predictionˆp∗ which corresponds to the motor program11x∗.Then,the error between the sampled sequence p∗ (as a teacher) and the temporallookahead predictionˆp∗ is calculated,which is back-propagated [38] in order to update thetemporal values of Wc,Wpand c0.This computation is repeated until the error (the costfunction J) is minimized.This learning achieves locally optimal mapping functions of fcand fpin (2) by organizing an adequate temporal internal representation in the contextloop.In the actual training,the sequence of the sampled data is broken into smallersub-sequences (15 data units for each sub-sequence in the experiment described later),each of which is used to train the network simultaneously.This technique is used sincethe error due to temporal lookahead prediction over numerous steps can accumulate to asubstantially large value in the middle of training which hampers the smooth convergenceof the learning process.Numerous studies have been conducted of the problem of learning the sequential be-havior of agents [6,30,47] including our prior work on skill-based learning [43,42].Thesestudies have shown that certain temporal internal representation are indispensable tothe solution.Model-based learning,presented here,diﬀerentiates itself in that its learningcomprises not just learning sequences but also extracting grammatical structure hidden inthe sequences.Elman [11] investigated experimentally the capability of RNNs for learninga simple grammar fromcertain letter sequences,and examined the internal representationobtained as a function of time.His study showed that the RNN,after successful learning,becomes capable of following letter sequences since the activation of the context unitsrepresents the hidden state of the target automaton.Learning the forward model of theenvironment is analogous to the learning of a grammar by a ﬁnite state machine (FSM).The robot attempts to extract grammatical regularities hidden in the branching struc-ture of the environment from the sensory-motor sequences sampled so that they can beused to generate the lookahead prediction of the sensory sequence for an arbitrary motorprogram.This objective is achieved when an adequate memory structure is successfullyself-organized in the RNN.4.2 Plan generationThe objective of planning is to ﬁnd a motor program x∗ that generates a path to thedesired branch or terminal points under the condition of minimum travel distance.Weinvestigate how an optimal motor program can be derived from the obtained forwardmodel in an autonomous manner.We consider the following cost function for the motorprogram.E = Eg+γEc+µEm(4)Eg= 1/2(pd− ˆpτ)T(pd− ˆpτ)Ec= 1/2τXn=1(ˆpln)2Em= −τ−1Xn=0Zxn0[φ((x −0.5)/T) −x]dxThe total cost E is deﬁned by the summation of three diﬀerent cost items,Eg,Ecand Em,with their respective weights,,γ,and µ.Egdenotes the norm between the lookaheadprediction of the sensory input at the τth step ˆpτand the desired sensory input pd.Here,τ represents the number of branching steps from the current to the goal point.This12cost item indicates how close the current motor program’s prediction of the distal sensoryinput is to that of the goal.Ecdenotes the cost incurred for the minimum travel distance,which is the mean-square sum of the local travel distanceˆplnover τ steps.The term Emis employed in order to restrict the value of each motor command to a binary value of 0.0or 1.0.(Note that only binary values are legal for the motor commands.) φ is a standardsigmoid function,and T is a parameter deﬁning its steepness.Now,the optimal motor program,which minimizes the cost function,is computediteratively.A diﬃcult point is that the number of steps of the motor program τ is also avariable to be determined,since we cannot tell apriori how many branching steps aheadthe goal point is.In our formulation τ is determined through the iterations.We havedeﬁned the maximum number of future branching points to be considered in the planningby τmax(we took τmax= 15 steps in the experiment described later).As a result,the RNNis transformed into a cascaded feed-forward network consisting of τmaxsteps.The forwardcomputation is conducted on the cascaded network,in which the lookahead prediction ofτmaxsteps for the temporal motor program (x0x1x2∙ ∙ ∙ xτmax) is obtained.On determiningthe temporal value of τ,the cost E is computed with changing τ from 1 to τmaxbasedon (4).The τ which shows the minimum cost is taken as the temporal value of τ.This τrepresents the valid length of the temporal motor program.Next,an update of the motorcommand at each step is obtained.The gradient of the cost function with respect to eachmotor command xn(0 ≤ n ≤ τ −1) is calculated;this indicates the direction of updatefor the motor commands.δxn(0 ≤ n ≤ τ −1) is given by:δxn= −δEδxn(5)= −δEgδxn−γδEcδxn+µ(φ((xn−0.5)/T) −xn)In the second line of this equation,the gradient of the cost function is represented asthe sum of the gradient of each cost item.In obtainingδEgδxn,the error between thedesired sensory input and the lookahead prediction of the sensory input at the τth step iscalculated,then this error is back-propagated [38] through the cascaded network to themotor command unit xnso that the contribution of xnto the error can be estimated.Thisestimate yields the objective gradient value.The value ofδEcδxnis also calculated by usingthe back-propagation scheme.The prediction of the local travel distance at each stepis obtained as an output from the cascaded forward network.Then,back-propagationis applied from the output unit to each motor command unit so that the contributionof each motor command’s value can be obtained in order to minimize the local traveldistance.The third term is the gradient of Em,which is obtained analytically from (4).Since the sequence of motor commands after the τth step does not contribute to the cost,δxn(τ ≤ n ≤ τmax) is set as 0.The exact update for each motor command in the temporalmotor program Δxn(0 ≤ n ≤ τmax) can be obtained by applying the steepest descentmethod to (5) using:Δxn(t +1) = ηδxn+αΔxn(t) (6)where η is the search rate and α is the momentum term.The details for the method ofback-propagation through the forward model are given in ref.[19].One cycle of forwardand backward computation is completed after updating the temporal motor program.The temporal motor program as well as its valid length τ change gradually through13the iteration of this cycle,thereby minimizing the cost.When the cost is minimized,the sequence of motor commands obtained as xn(0 ≤ n ≤ τ − 1) is the desired motorprogram.4.3 Chaotic searchOne might think that the optimal motor program could be obtained easily through it-erative calculations by the steepest descent method described in the prior subsection.However,this is not true.Many researchers [23,24] have studied robot path planningfor the minimum travel in complex obstacle domains,and they have shown that suchplanning cannot avoid a combinatorial explosion.This indicates that the landscape ofthe deﬁned cost function would be quite “rugged” in our formulation,and that the com-putation of xnby the method of steepest descent,for which the search dynamics are thoseof a typical ﬁxed point because of its positive damping term,can easily be trapped by alocal minimum.Such planning processes would halt after generating a sub-optimal plan.(The experiments described later will show this explicitly.) In order to realize an au-tonomous search process which generates various alternatives,it is necessary to introducenonequilibrium dynamics which are capable of avoiding the local minimum problem.In our previous work,we have studied the characteristics of a dynamical system calledthe “chaotic steepest descent” model (CSD) that has a nonlinear resistance which variesperiodically [41].We review this model brieﬂy.Let us consider a dynamical systemdeﬁned on a rugged energy landscape by:m¨x +R( ˙x,ωt) = −κE(x) (7)R( ˙x,ωt) = [d0sin(ωt) +d1] ˙x +d2˙x2sgn( ˙x)where m is an inertia constant,R is the nonlinear resistance function,E(x) is thegradient of the energy function,κ is the gradient constant,ω is the periodicity of theresistance perturbation and d0,d1and d2are the nonlinear resistance coeﬃcients.Theresistance may have a positive or negative damping,depending on the ˙x value.As theresistance characteristics change,by slowly increasing the negative damping part,theresulting state tends to travel from one energy basin to another.On the other hand,when the positive damping is increased,the state tends to converge.Repetition of theseunstable and stable phases generates chaotic state transitions among basins.We employthe CSD model to update xn:m¨xn+R( ˙xn,ωt) = −δEgδxn−γδEcδxn+µ(φ((xn−0.5)/T) −xn) (8)The dynamics run for a pre-determined period,during which various motor programs aregenerated in a stable phase,and the motor program with the minimum cost is selectedas an optimal solution.The question may arise as to why the planning search method uses chaotic dynamicsrather than other alternatives.An easy alternative might be an exhaustive randomsearchin the binary space of the motor programs.Although the random search does work per-fectly for an application of the current experimental size,it would not be scalable to morecomplex tasks.Another alternative is to apply stochastic dynamics to the search process.External additive noise would prevent a search based on the steepest descent method from14entering the local minimum traps.Although there is no present mathematical proof toconﬁrm that the eﬃciency of a chaotic search is better than that of a stochastic search incombinatorial search problems,numerous studies,especially in biological systems,havesuggested its plausibility.Sakarada and Freeman [40],Aihara [1],Tsuda,Koerner andShimizu [45] have suggested that for eﬀective memory searches the biological brain takesadvantage of internal noise,induced by the deterministic chaos which emerges in naturalneural circuits;Nara and Davis [34] stressed that control of the parameter set can “har-ness” a chaotic search into an eﬀective subspace far smaller than the problem domain.They observed that this harnessing of chaos exhibits much more diverse behavior thanthat of stochastic systems involving temperature control.The introduction of chaos hereis based on the hypothesis that cognitive tasks of planning in biological brains use theforward model described by deterministic dynamics.Recent research [39] has found abiological example of computing the inverse dynamics of eye movement in the cerebel-lum.This suggests that biological brains actually compute certain simple motor plansbased on deterministic dynamics using internal models.We believe that planning in thecognitive level utilizes deterministic chaos to solve problems because it must deal withcombinatorial computation.5 EXPERIMENTSWe conducted experiments on the scheme presented above using the mobile robot YAM-ABICO.In the learning phase,the robot repeats cycles of the learning trial with increasingnumber of samples in the training data until statistical tests of lookahead prediction satisfycertain criteria.Then experiments of plan-based navigation are conducted.5.1 Learning and lookahead predictionThe adopted RNN architecture is three-layered,and has 10,12 and 9 units for the input,hidden,and output layers respectively including four context units in the input and outputlayers.During each learning trial the robot wanders around an adopted workspace for acertain period,making each branching decision at random,in order to collect an additionalamount of data (sensory-motor sequences).Thereafter,the data set which has beenaccumulated so far is used for training the RNN.For each trial,the connectivity weights ofthe network are set randomly and trained oﬀ-line using the data.The training of the RNNis conducted for 20,000 iterations,which are repeated if the mean square learning errorper output unit cannot be decreased below 0.01.After the learning error is minimized,thetest of a given lookahead prediction is conducted for 10 diﬀerent travels.Each travel startsfrom an arbitrary position in the workspace using random branching.The robot travelswith the RNN switched in the open-loop mode until the RNN becomes able to predict thenext sensory input (i.e.it becomes situated).The robot is stopped when the predictionerror for all sensory input units becomes less than 0.15 twice in succession.At thismoment,we assume that the robot is situated.Then,lookahead prediction is conducted,with the RNN switched in the closed-loop mode,for an arbitrary motor program whichcomprises seven branching steps.Thereafter,the robot is directed by the motor programin order to compare the actual sensory input with the lookahead prediction.After 10travels,the mean square prediction error per sensory input unit (MSPE) is calculated.If15Table 1:Summary of three trials of learning,which show the number of samples inthe training data set,the iterations required for the training of the RNN,the averagebranching steps necessary to become situated,and the mean square prediction error persensory input unit (MSPE) during navigation over 10 travels for each trial.trialnumber of samples used for traininglearning iterationsavg.stepsMSPE1st.4920,00019.40.1312nd.10220,0009.60.0723rd.19340,0005.20.009the test of lookahead prediction is not satisfactory,we let the robot travel again in orderto sample the data furthermore,which is used to re-train the network.The experimental results of the learning are summarized in Table 1.In the ﬁrst trial,the robot sampled 49 steps of the sensory-motor sequence,then the training process ofthe RNN with the sampled sequence converged after the ﬁrst 20,000 iterations.Thiscomputation took about 20 minutes using a Sony News workstation with R4000 CPU(100MHz).In the travel after this training,many steps were required (the average stepsin ten travels were 19.4) until the RNN in the open-loop mode supplied good predictions(i.e.,became situated).In the ensuing lookahead predictions in the closed-loop mode,theRNN could usually not predict more than three steps ahead.It seemed that the RNNlearned only particular instances of the sampled sequences but not in a more general way.The MSPE calculated was 0.131.In the second trial,the robot sampled further sensory-motor pairs,by which the number of samples in the training data set was raised to 102.The training process of the RNN with the data set converged after 20,000 iterations(it took about 45 minutes).After the training,the necessary steps to become situatedwere shortened (in the average 9.6 steps),and the lookahead prediction often was goodfor several steps.However,once the prediction failed in the middle of a sequence,itcontinued to fail for subsequent steps.The MSPE was reduced to 0.072.In the thirdtrial,the RNN was trained with 193 sensory-motor pairs after 89 pairs were sampled.The training could not converge within the ﬁrst 20,000 iterations,but it converged afteranother 20,000 iterations (it took about 170 minutes as total).After this learning trial,itwas observed that the robot became situated within a few steps (the average steps were5.2),and also that lookahead predictions became accurate except in cases aﬀected bynoise.The MSPE was reduced to 0.009.Since the RNN could correctly predict sequencesit had never exactly learned,it can be said that the RNN succeeded in extracting thenecessary rules in the form of generalized ones.Figure 5.1 shows the distribution of theprediction error for all sensory input units in the third trial.It is shown that the fractionof “good” predictions with an error of less than 0.1 is more than 70 percent.This resultimplies that the robot successfully learned the forward model of the workspace.An example of the comparison between a lookahead prediction and its sensory sequenceduring travel is shown in Figure 5.1.In (a) the arrow denotes the branching point wherethe robot conducted a lookahead prediction using a motor programgiven by 1100111.Therobot,after conducting the predictions,traveled following the motor program,generatingan “eight-ﬁgure” trajectory,as shown.In (b) the left-hand side shows the sensory inputsequence,while the right-hand side shows the lookahead sequence,the motor program160.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.60.00.10.20.30.4predi ct i on errorratioFigure 6:Distribution of the prediction error for all sensory input units in the ﬁnal trialof the learning phase.and the context sequence.The values are indicated by the bar heights.This sequenceconsists of eight branching steps (from the 0th to the 7th step) including the initial one inthe “start” point.It can be seen that the lookahead for the sensory input agrees very wellwith the actual values.It is also observed that the context as well as the prediction ofsensory input at the 0th and the 7th steps are almost same.This indicates that the robotpredicted its return to the initial position at the 7th step in its “mental” simulation.Therobot actually returned back to the “start” point at the 7th step in its test travel.We have stated that the situatedness accounts for the mechanism of the auto-recoveryfrom temporary perturbation.The next experiment demonstrates such an example.Therobot traveled in the workspace while predicting the next sensory input with the RNNswitched to the open-loop mode.During this travel,an additional obstacle was introduced.Figure 8 (a) shows the trajectory of the robot’s travel;Figure 8 (b) shows the comparisonof the actual sensory input and the corresponding one-step lookahead prediction.Thebranching sequence number is indexed beside the trajectory;these numbers correspondto the prediction sequence in Figure 8 (b).The prediction starts to be incorrect oncethe robot passes the second branching point,as it encounters the unexpected obstacle.The robot,however,continued to travel and in the meanwhile we removed the obstaclefrom the workspace (when the robot passes the fourth branch).After the sixth branchingpoint,the prediction returns to the correct value.This indicates that the lost context isrecovered while the RNN receives the regular sensory input sequence.It is noted that thevalues of the context units in this branch are almost the same as those of the ﬁrst branch.This shows that the robot recognized its return to the same branching point because itbecame situated in the behavioral context again.5.2 PlanningIn this section,we demonstrate that our scheme provides a mechanismfor the autonomousgeneration of motor programs.Consider the following experiment.In Figure 5.2,the robotwas stopped at the branch A after it became situated.Then the robot performed itsplanning of the route to the given destination,B.B is one of the dead-end positions,and17actual sequencelookahead sequencepbranching stepstart(a)(b)12345670xcpFigure 7:(a) The robot conducted lookahead prediction for a motor program given by1100111 at the branching point indicated by the arrow,after which it traveled accordingto the motor program,generating the trajectory shown.In (b) the left side shows theactual sensory sequence,and the right side shows the lookahead prediction,the motorprogram and the context sequence.The sensory and the context sequences are shown foreight steps,including their initial values,p0and c0,at the bottom.The motor programis shown for seven steps (x0→x6).18actual sequence lookahead sequencexcp123456789123456789startadditionalobstacle(a)(b)pbranching stepFigure 8:Auto-recovery from the addition of an obstacle.In (a) the trajectory of therobot’s travel is shown.The additional obstacle is indicated by an arrow.The numbersindicate the branching sequence number.(b) shows the sensory sequence on the left andthe one-step lookahead,the motor program and the context sequence on the right.19ABgoalFigure 9:The robot planning for speciﬁed goal B from the current location Aits sensory input has already been given to the robot.The robot conducted its planning byfollowing the dynamics described by (8) with the following parameters: = 1.0;γ = 0.05;µ = 20.0;T = 0.05;m= 1.0;d0= 0.1;d1= −0.11;d2= 3.8;ω = 2π/400;and κ = 0.001.Figure 5.2 shows the resulting time evolution for a planning process involving 2,500iterations.The temporal motor program (for every 10 iterative steps) is shown in thelower part,and the cost of each plan is plotted in the upper part.A temporal motorprogram is indicated by a column consisting of black and white squares,where a whitesquare denotes a persistence in the current branch (0) and a black square denotes transitto a new one (1).Symbol size indicates actual activation value of xn.In this ﬁgure,weonly show the valid length (τ steps) of motor commands in the temporal motor program.Note that the valid length of the temporal motor program changes through the iterations.From Figure 5.2,it can be seen that multiple motor programs with relatively low costare generated at stable phases through successive state transitions.These are 101,01010,110 and 00 as indicated by arrows in Figure 5.2.We tested these motor programs by letting the robot activate them.Figure 11 (a)-(d) shows the resultant travel for each of programs.While program (b) proved to beredundant,generating a fruitless loop,and program (c) pursued the wrong goal,the otherprograms produced acceptable results.Note that the good programs produced slightlylower costs.We examined the (c) case and determined that a false goal was reachedbecause the sensory pattern resembled that of the desired goal.20costmotor programtimecostcostmotor programmotor programtimetimeRow1Row2Row3startEnd101010101100000101Figure 10:The chaotic search process of the motor programis shown in the three rows.Ineach row,the upper part indicates the cost time history,and the lower part the temporalmotor program.The temporal motor programis indicated by a column consisting of blackand white squares,representing motor commands of 1 and 0 respectively.The motorprograms obtained in the stable phase (101,01010,110,00) are indicated by arrows.Time ﬂows from row 1 to row 3.21(a)(b)(c)(d)Figure 11:Travels based on various motor programs.(a) and (d) are almost optimaltrajectories,(b) is redundant,and (c) pursues the wrong goal.(a),(b),(c) and (d)correspond to the motor programs 101,01010,110 and 00,respectively.22654320102030number of identical motor programsd2Figure 12:The number of identical motor programs generated during the search as afunction of d2.5.3 Parameter sensitivity of chaotic searchingIt is interesting to observe how the parameter settings aﬀect the chaotic search.Theparameter d2represents a coeﬃcient of positive damping in the CSD dynamics,thereforeit is assumed that its value will aﬀect the dynamical characteristics substantially.Inthe following,we focus on this parameter,and investigate how the characteristics of thechaotic search vary depending on its value.We conducted this experiment using the sameplanning task described above.The parameter d2was varied between 2.2 and 5.4 with intervals of 0.4,for which thesearch process was computed for 40,000 steps.The motor program was sampled at themost stable phase of each cycle i.e.when ωt = π/2 in (8),which results in 100 samplesof the motor program for each search process.First,we examined how the diversity ofthe generated programs varies with d2.Figure 5.3 plots the number of identical motorprograms generated for each parameter value.It is shown that the number of identicalmotor programs decreases as the value of d2increases.When d2was set to 5.4,onlyone motor program was generated — no state transitions took place.Furthermore,weinvestigated the cost distribution and the frequency of the state transition in order toexamine the detailed structure of the search process.In Figure 5.3,the left-hand sideshows the cost frequency of the programs generated,and the right-hand side shows thefrequency of repeated occurrences of the same program,for d2values of 2.2,3.8,and 5.0.The ﬁgure shows that,for small d2values,the cost tends to be spread over a wide range,and the motor program generated is diﬀerent on almost every cycle.The search proceedsalmost at random in the wide range of the problem’s space.On the other hand,for largerd2values,the cost distribution becomes approximately optimal,and the probability ofrepeating the same motor program increases.The search tends to proceed more preciselytowards optimal and sub-optimal solutions.However for these cases the search becomesmore likely to be trapped in one of the sub-optimal solutions for long periods (the statetransition takes place only intermittently.) The risk of local minimumtraps becomes morepronounced as d2is set to larger values.An important question,therefore,is how to determine the optimal value of d2.We230.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2010203040cost of motor programs0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2010203040(b)1 2 3 4 5 6 7 8 9 10020406080100d2=3.8repeated occurrences of thesame motor programfrequencyfrequency0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2010203040(c)1 2 3 4 5 6 7 8 9 10020406080100d2=5.0repeated occurrences of thesame motor programfrequencyfrequency(a)1 2 3 4 5 6 7 8 9 10020406080100repeated occurrences of thesame motor programd2=2.2frequencyfrequencycost of motor programscost of motor programsFigure 13:Frequency of the cost of generated motor programs (left chart),and the fre-quency of repeated occurrences of the same motor program (right chart),for d2values of(a) 2.2,(b) 3.8,and (c) 5.0.24believe that it is determined by the trade-oﬀ between the time required for planning andthe cost of the motor program thus obtained.If the optimal cost plan is that regardlessof the period of time required,d2might be set to a small value.The resultant randomsearch would ﬁnd the minimum cost plan in the long run,without being captured bythe local minimum traps.On the other hand,if the time spent is crucial,d2should beset to a larger value.The resultant search would ﬁnd a sub-optimal plan quickly,butthe search might not be able to reach the global cost minimum in such a limited time.We believe that the optimal parameter will be determined by considering the real-timerequirements of the planning process.Although the current paper will not include furtherdiscussion of real-time planning issues,the experimental results convince us that the ideaof “harnessing chaos” is potentially applicable in this context.6 THE DYNAMICAL MECHANISMOF SITUAT-EDNESSOur experimental results have shown that the robot can become situated from an arbi-trary initial state through interaction with the environment.This section explores themechanismunderlying this situatedness by investigating the essential dynamical structurethat arises from the coupling of the internal neural system and the environmental system.First,we deﬁne the term “attractor” for both the environmental and the neural dy-namics.Let us focus on the environmental dynamics F which deﬁne how the robotactually travels in the workspace with respect to the motor command.Suppose the robottravels in the workspace for an inﬁnite time period,receiving a motor program x∗ whichis generated randomly.Let s∗:(s0s1∙ ∙ ∙ s∞) and p∗:(p0p1∙ ∙ ∙ p∞) be the sequencesof branch positions and the sensory input,respectively,during the resultant travel of therobot.The branch position snrepresents the state of the environmental dynamics F atthe time n.Since s∗ would be limited to a subspace of the entire workspace after an initialtransient period,an invariant set s∗ is formed in s∗.(Hereafter,X∗ and Xnrepresent theinvariant set and one of its elements,respectively,in an inﬁnite sequence X∗.) We deﬁnethis invariant set s∗ as the attractor of F.It is important to note that this attractor isthe global attractor,since the robot’s trajectory in the workspace converges to the sameattractor regardless of its starting position.Also,we deﬁne an invariant set p∗ for thesequence of sensory input which s∗ corresponds to.For the neural dynamics f,let us consider a lookahead prediction of the RNN (inthe closed-loop mode) with respect to a motor program x∗ of an inﬁnite length which isgenerated randomly.This generates an inﬁnite sequence of the transitions of the contextc∗.When this inﬁnite sequence forms an invariant set,this invariant set c∗ is deﬁnedto be the attractor of f.The prediction of sensory sequence which corresponds to c∗ isindicated asˆp∗.We note that the generation of the global attractor might not be assuredfor f,depending on the learning process.A trajectory with a diﬀerent c0might reacha diﬀerent attractor.Since the objective of learning is to make the neural dynamics femulate the environmental dynamics F by means of the sequence of sensory input,f inthe limit of a learning process satisﬁes:∃c0,∃s0⇒ˆp∗ = p∗ (9)for an arbitrary motor program x∗.The notion here is that there is at least one attractor25environment: Fpredi ct i on ofsensory i nputactual sensory i nputcnsensorylooppn+1mot orcommandgeneratorx *xnxnneural net: fpn+1snpnFigure 14:The internal dynamics are made coherent by the environmental dynamicsthrough entrainment using sensory coupling.for f for which the lookahead prediction of the sensory input can be made correctly,whichwill satisfy (9).Now let us consider the coupling between these two dynamics as shown schematicallyin Figure 6.In the open-loop mode,the RNN predicts the next sensory input as ˆpn+1using the current sensory input pnwhile the robot travels following the motor programx∗.In the ﬁgure,the sensory return loop exiting from the environmental dynamics and ininput to the neural dynamics is shown.Suppose that both the environmental dynamicsF and the neural dynamics f start to run from their transient states (s0,c0).Initially,the prediction of the sensory input does not match the measured one.In the meanwhile,the environmental dynamics F converge onto their global attractor,thereby producingthe regular sequence of sensory input p∗.Upon receiving this regular sequence of sensoryinput pn,the neural dynamics f begin to output the next prediction ˆpn+1correctly,asbeing equal to pn+1while the context values cnconverge onto c∗.This convergence isassured generally for ∀c0by (9),provided the neural dynamics f are trained to have thecorrect global attractor.The above analysis has shown that the creation of the global attractor in the neuraldynamics is essential to achieving the situatedness and the auto-recovery mechanism ofthe robot.Here,we examine whether or not the RNN,which was trained in our previousexperiment,has generated the global attractor.The RNN is switched to the closed-loopmode,then forward computation is conducted with a randomly generated motor programfor two thousand forward steps.The resultant orbit of the context c∗ is plotted in the two-dimensional space (c1,c2) by using the activation states of two context units (these two26units are arbitrarily chosen),and excluding the ﬁrst 100 points which resulted from theinitial transient steps.Fig.6(a) shows the orbit obtained,while (b) shows an enlargementof part of (a) where a highly one-dimensional structure is observed.We repeated thisseveral times with diﬀerent initial values for the context units,and found that they allresulted in the same invariant set.(The same qualitative results were obtained for anypair of the context units.) This conﬁrmed that the neural dynamics,which have beenused in the experiment,are characterized by a global attractor.Although no theory hasbeen established to explain the creation of a low-dimensional global attractor in recurrentneural learning,this tendency has been observed in other numerical experiments on thelearning of simple grammatical structures [36,44].The mechanism underlying situatedness will now be discussed qualitatively by intro-ducing the physical term “entrainment” [9,12].Entrainment is a dynamical phenomenonthat coupled nonlinear oscillators become synchronized stably.Recently,Beer [3] studiedthe self-organization of locomotion controllers in the context of walking motions of insects,where he observed the entrainment of the intrinsic oscillation of the leg controller by theenvironmental dynamics.Similarly,in our case,the internal neural dynamics becomecoherent with the environmental dynamics through the sensory loop,where we observethe entrainment of the internal dynamics by the environmental dynamics.In the initialtransient state,the neural system and the environmental system are “incoherent”,there-fore the neural system cannot recognize the present situation/position.In the meanwhile,the two systems start to become coherent by means of entrainment,with the result thatthe dynamical state of the entire system is conﬁned to the attractor which has reduceddimensionality.At this point,it can be said that the internal neural system has beensituated in the environment.This dynamical mechanism which generates the situatednessis an inherent one as long as the essential dynamical structure of the coupled system ischaracterized by the global attractor dynamics.7 DISCUSSION AND CONCLUSIONA primitive conceptualization of the symbol grounding process is conjectured as the resultof our experiments.Figure 7 illustrates the concept.As the robot travels around theworkspace,clusters of sensory input are collected in the sensory space arising from itsbranching sequences.Meanwhile the dynamical mapping is self-organized in the internalstate space such that it accounts for the transitions among the clusters of the collectedsensory inputs.If diﬀerent symbols are assigned to each cluster of sensory input,themental simulation process carried out by the internal chaotic dynamics might be equivalentto the symbolic process of manipulating a set of symbols:terminal symbols (the sensoryinput) and nonterminal symbols (the internal state).Here,our primitive symbols are notin the arbitrary shape of usual symbol tokens1,but in the nonarbitrary shape arising fromthe physical interaction between the robot and the environment.One might consider that such symbolic processes could be represented more easily inthe form of a FSM.We,however,consider that the internal representation of a FSM isstill “parasitic,” since symbols are manipulated into an arbitrary shape regardless of their1The discussion inherits Harnad’s [15] claim:“Symbol manipulation would be governed not just bythe arbitrary shapes of the symbol tokens,but by the nonarbitrary shapes of the icons and categoryinvariants in which they are grounded.”270.00.51.00.00.5 1.0(a)c1c20.80.950.4 0.55(b)c1c2Figure 15:(a) shows the orbit c∗ projected in (c1,c2) space using the activation statesof two context units,(b) is an enlargement of the rectangular section in (a),in which ahighly one-dimensional structure is seen.28task spacesensory spacei nt ernal st at espaceFigure 16:The symbol grounding process.meaning in the physical world.A crucial gap exists between the actual physical systemsdeﬁned in the metric space and their representation in the non-metric space,which makesthe discussion of the structural stability of the whole system diﬃcult.In contrast to thisstate of aﬀairs,the representation in our scheme can be said to be intrinsic to the systemsince it is embedded in the attractor dynamics which share the same metric space withthe physical environment.Here,structural stability arises in the interaction between theinternal and environmental systems,which accounts for the situatedness of the internalprocess.Although the symbol grounding process,described here,is still primitive,webelieve strongly that this philosophy is indispensable to design intelligent autonomousrobots operating in the physical world.It is important that we address the issue of the scalability of our scheme.Althoughthe learning of the forward model,in the adopted simple workspace,successfully gener-ated the global attractor after some trial and error,this sort of global convergence wouldinevitably become more diﬃcult in more complex environments.Facing this problem,one approach to take is to reﬁne the learning algorithms of the RNNs since the scal-ability largely depends on the learning capability of the RNNs.Recent research intothe RNNs’ learning processes have shown progress.Giles [14] reported that increasingthe “order” of the connectivity of an RNN enhances remarkably its learning capability.(The “order” refers to the dimensionality of product terms in the weighted sum,whichreﬂects the connectivity of the network.) Bengio et al.[4] showed theoretically thatlearning the long-term dependencies with a standard gradient-descent method applied toback-propagation is diﬃcult.They showed the advantages of other non-gradient meth-ods,such as pseudo-Newton,time-weighted pseudo-Newton,multi-grid random search,and simulated annealing.The idea of expert nets proposed by Jordan and Jacobs [20] isalso attractive.The essential idea is to divide a complex learning problem into simplerproblems by introducing a “sub-net” architecture.It will be challenging to study whetherthe same learning principle can be applied to the learning of large size FSMs by RNNs.29Although numerous other research projects concerning RNNs are in progress,the learningcapability of RNNs is still an open problem.It is expected that further advances in thetheory of RNNs will lead to the discovery of better learning algorithms.Another possible research direction consists of the investigation of strategies that couldcope with insuﬃcient learning without fatal degradation of the system performance inmore complex environments.The insuﬃcient learning can arise in three ways.Firstly,the robot might not be able to obtain all possible input data (sensory-motor sequences),which is necessary for building a correct model of the environment,through its behavior.Secondly,even if the robot could obtain all of the possible input data,it might not beable to “digest” all of it to form a correct model in a limited learning time.Thirdly,the environment may change after the robot has learned its model.In these situations,our basic assumption of embedding a correct model in the form of the global attractor isinevitably constrained,and thereby the mechanisms of situatedness as well as of optimalplanning might not be assured.However it is expected that the robot could recover itscontext temporarily if it happened to travel around well-known parts of the workspace,and then it might be able to generate certain sub-optimal plans from there.The current paper has formulated a model-based approach based on the assumptionof suﬃcient learning of the model.It is,however,expected that the theory will haveto be extended further to the problem of insuﬃcient learning which is likely in openenvironments associated with more complexity.This ﬁeld of study is quite attractive tous since an animal or “animat” often lives under such conditions.ACKNOWLEDGEMENTThe author wishes to thank Marco Dorigo and the anonymous referees who greatly helpedto improve the presentation of the ideas in this paper.References[1] K.Aihara.Chaotic neural networks.Physical Letters A,Vol.144,No.6,pp.333–340,1990.[2] M.Asada.Map building for a mobile robot from sensory data.IEEE Trans.Syst.Man Cybern.,Vol.37,No.6,pp.1326–1336,1990.[3] R.D.Beer.A dynamical systems perspective on agent-environment interaction.Ar-tiﬁcial Intelligence,Vol.72,No.1,pp.173–215,1995.[4] Y.Bengio,P.Simard,and P.Frasconi.Learning long-term dependencies with gra-dient descent is diﬃcult.IEEE Trans.Neural Networks,Vol.5,No.2,pp.157–166,1994.[5] R.Brooks.A robust layered control system for a mobile robot.IEEE J.Roboticsand Automat.,Vol.RA-2,No.1,pp.14–23,1986.[6] M.Colombetti and M.Dorigo.Training agents to perform sequential behavior.Adaptive Behavior,Vol.2,No.3,pp.247–275,1994.30[7] J.P.Crutchﬁeld.Inferring statistical complexity.Phys Rev Lett,Vol.63,pp.105–108,1989.[8] M.Dorigo and M.Colombetti.Robot shaping:developing autonomous agentsthrough learning.Artiﬁcial Intelligence,Vol.71,No.2,pp.321–370,1994.[9] T.Eisenhammer,A.Hubler,T.Geisel,and E.Luscher.Scaling behavior of themaximumenergy exchange between coupled anharmonic oscillators.Physical Review,Vol.A-41,pp.3332–3342,1990.[10] A.Elfes.Sonar-based real-world mapping and navigation.IEEE J.Robotics andAutomation,Vol.RA-3,No.3,pp.249–265,1987.[11] J.L.Elman.Finding structure in time.Cognitive Science,Vol.14,pp.179–211,1990.[12] T.Endo and S.Mori.Mode analysis of a ring of a large number of mutually coupledvan der Pol oscillators.IEEE Trans.Circuits Syst.,Vol.CAS-25,No.1,pp.7–18,1978.[13] F.Freyberger,P.Kampman,and G.Schmidt.Constructing maps for indoor naviga-tion of a mobile robot by using an active 3D range imaging device.In Proc.of theIEEE Int.Workshop on Intelligent Robots and Systems ’90,pp.143–148,1990.[14] C.L.Giles,G.Z.Sun,H.H.Chen,Y.C.Lee,and D.Chen.Higher order recurrentnetworks and grammatical inference.In D.S.Touretzky,editor,Advances in NeuralInformation Processings 2,pp.380–387.San Mateo,CA:Morgan Kaufmann,1990.[15] S.Harnad.The symbol grounding problem.Physica D,Vol.42,pp.335–346,1990.[16] J.Hopcroft and J.Ullman.Introduction to Automata Theory,Languages,and Com-putation.Reading,MA:Addison-Wesley,1979.[17] S.Iida and S.Yuta.Vechicle command systemand trajectory control for autonomousmobile robots.In Proc.of the IEEE/RSJ Int.Workshop on Intelligent Robots andSystems ’91,pp.212–217,1991.[18] M.I.Jordan.Attractor dynamics and parallelism in a connectionist sequential ma-chine.In Proc.of Eighth Annual Conference of Cognitive Science Society,pp.531–546.Hillsdale,NJ:Erlbaum,1986.[19] M.I.Jordan.Indeterminate motor skill learning problems.In M.Jeannerod,editor,Attention and Performances,XIII.Cambridge,MA:MIT Press,1988.[20] M.I.Jordan and R.A.Jacobs.Hierarchical mixtures of experts and the EMalgorithm.Neural Computation,Vol.6,No.2,pp.181–214,1994.[21] M.I.Jordan and D.E.Rumelhart.Forward models:supervised learning with a distalteacher.Cognitive Science,Vol.16,pp.307–354,1992.[22] L.P.Kaelbling.An adaptive mobile robot.In Proc.of the First European Conf.onArtiﬁcial Life,pp.41–47,1992.31[23] S.Kambhampati and L.S.Davis.Multiresolution path planning for mobile robots.IEEE J.Robotics and Automation,Vol.RA-2,No.3,pp.135–145,1986.[24] O.Khatib.Real-time obstacle avoidance for manipulators and mobile robots.TheInt.J.of Robotics Research,Vol.5,No.1,pp.90–98,1986.[25] T.Kohonen.Self-organized formation of topographically correct feature maps.Bio-logical Cybernetics,Vol.43,pp.59–69,1982.[26] J.R.Koza.Evolution of subsumption using genetic programming.In Proc.of theFirst European Conf.on Artiﬁcial Life,pp.110–119,1992.[27] B.J.Krose and M.Eecen.A self-organizing representation of sensor space for mobilerobot navigation.In Proc.of the Int.Conf.on Intelligent Robotics and Systems ’94,pp.257–263,1994.[28] B.J.Krose and J.W.van Dam.Adaptive state space quantisation for reinforcementlearning of collision-free navigation.In Proc.of the IEEE Int.Workshop on IntelligentRobots and Systems ’92,pp.9–14,1992.[29] B.Kuipers.A qualitative approach to robot exploration and map learning.In AAAIWorkshop Spatial Reasoning and Multi-Sensor Fusion (Chicago),pp.774–779,1987.[30] L.-J.Lin and T.M.Mitchell.Reinforcement learning with hidden states.In Proc.ofthe Second Int.Conf.on Simulation of Adaptive Behavior,pp.271–280,1992.[31] P.Maes,editor.Designing Autonomous Agents:Theory and Practice from Biologyto Engineering and Back.Cambridge,MA:MIT Press,1991.[32] M.Mataric.Integration of representation into goal-driven behavior-based robot.IEEE Trans.Robotics and Automation,Vol.8,No.3,pp.304–312,1992.[33] J.A.Meyer and S.W.Wilson,editors.From Animals to Animats:Proc.of the FirstInternational Conference on Simulation of Adaptive Behavior.Cambridge,MA:MITpress,1991.[34] S.Nara and P.Davis.Memory search using complex dynamics in a recurrent neuralnetwork model.Neural Networks,Vol.6,No.7,pp.963–974,1993.[35] F.Pineda.Generalization of back-propagation to recurrent neural networks.PhysicalReview Letters,Vol.59,pp.2229–2232,1987.[36] J.B.Pollack.The induction of dynamical recognizers.Machine Learning,Vol.7,pp.227–252,1991.[37] D.A.Pomerleau.Neural Network Perception for Mobile Robot Perception.Boston:Kluwer,1993.[38] D.E.Rumelhart,G.E.Hinton,and R.J.Williams.Learning internal representationsby error propagation.In D.E.Rumelhart and J.L.Mclelland,editors,Parallel Dis-tributed Processing.Cambridge,MA:MIT Press,1986.32[39] M.Shidara,K.Kawano,H.Gomi,and M.Kawato.Inverse-dynamics encoding ofeye movement by purkinje cells in the cerebellum.Nature,Vol.365,pp.50–52,1993.[40] C.A.Skarada and W.J.Freeman.Does the brain make chaos in order to make senseof the world?Behavioral and Brain Sciences,Vol.10,pp.161–165,1987.[41] J.Tani.Proposal of chaotic steepest descent method for neural networks and analysisof their dynamics.Electronics and Communication in Japan,Part 3,Vol.75,No.4,pp.62–70,1992.[42] J.Tani and N.Fukumura.Embedding task-based behavior into internal sensory-based attractor dynamics in navigation of a mobile robot.In Proc.of the IEEE Int.Conf.of Intelligent Robots and Systems ’94,pp.886–893,1994.[43] J.Tani and N.Fukumura.Learning goal-directed sensory-based navigation of amobile robot.Neural Networks,Vol.7,No.3,pp.553–563,1994.[44] J.Tani and N.Fukumura.Embedding a grammatical description in deterministicchaos:an experiment in recurrent neural learning.Biological Cybernetics,to appear.[45] I.Tsuda,E.Koerner,and H.Shimizu.Memory dynamics in asynchronous neuralnetworks.Prog.Theor.Phys.,Vol.78,pp.51–71,1987.[46] Y.Uno,M.Kawato,and R.Suzuki.Formation and control of optimal trajectory inhuman multijoint arm movement.Biological Cybernetics,Vol.61,pp.73–85,1989.[47] B.M.Yamauchi and R.D.Beer.Sequential behavior and learning in evolved dy-namical neural networks.Adaptive Behavior,Vol.2,No.3,pp.219–246,1994.[48] U.R.Zimmer.Robust world-modelling and navigation in a real world.Neuro Com-puting,to appear.33