sparseness:the model should aim for the minimum number of entities andrelationships to produce the desired behavior,-

asuitable level of detail for formalization:all components and relationshipshave to be specified to a degree of detail that allows for implementation as acomputational model,-

avoidance of over-specialization:where functional aspects or quantitativerelationships are unknown, the model should not be unnecessarilyconstrained.Also, the model should support an experimental paradigm, to be evaluated againstcompeting approaches, so that progress can be measured. This could be a set ofchallenge problems, a competition between different solutions, or a suitableapplication.A human-like intelligence could likely exist in a non-human body, and in asimulated world, as long as the internal architecture—the motivational andrepresentational mechanisms and the structure of cognitive processes—are similar tothe one of humans, and the environment provides sufficient stimulation. The desiresand fears of humans correspond to theirneeds, such as environmental exploration,identification and avoidance of danger, and the attainment of food, shelter,cooperation, procreation, and intellectual growth. Since the best way to satisfy theindividual needs varies with the environment, the motivational system is not alignedwith particulargoal situations, but with the needs themselves, through a set ofdrives.Let us call events that satisfy a need of the system agoal, or anappetitive event,and one that frustrates a need anaversive event(for instance, a failure or an accident).Goals and aversive events are given by the environment, they are not be part of thearchitecture. Instead, the architecture specifies a set of drivesaccording to the needsof the system. Drives are indicated asurges, as signals that make a need apparent. Anexample of a need would be nutrition, which relates to a drive for seeking out food.On the cognitive level of the system, the activity of the drive is indicated ashunger.The connection between urges and events is established byreinforcement learning.In our example, that connection will have to establish a representational link betweenthe indicator for food and aconsumptive action(i.e., the act of ingesting food), whichin turn must refer to an environmental situation that made the food available.Whenever the urge for food becomes active in the future, the system may use the linkto retrieve the environmental situation from memory and establish it as a goal.236 J. BachThis defines some additional requirements to the architecture: The system needs:-

a set of suitable urges,-

a way of evaluating them to establish goals and identify adverse events,-

a world model that represents environmental situations and events,-

a protocol memory that makes past situations and events accessible,-

a reinforcement learning mechanism working on that protocol,-

a mechanism for anticipation, to recollect memory content according to thecurrent environmental situation and needs,-

a decision making component, which pitches the current urges and theavailable ways to satisfy them against each other, and chooses a way ofaction,-

an action regulation component, so this way of action can be followedthrough.A more advanced architecture will also require mechanisms for planning,classification and problem solving, to actively construct ways from a given situationto a goal situation (instead of just remembering a successful way from the past), andmechanisms for reflection, to reorganize and abstract existing memory content.Note that many possible architectures may satisfy this set of requirements, and thusI will not specify an implementation here. Here, I will focus on the motivational side.4 An Outline of a Motivational System, According to the PsiTheoryThe Psi theory [8, 9] originates in the works of the psychologist Dietrich Dörner andhas been transformed into a cognitive architecture by the author [10]. Unlike high-level descriptions of motivation as they are more common in psychology, such as theone by Maslov [11] or Kuhl [12], the motivational model lined out in the Psi theory isrigorous enough to be implemented as a computational model, and unlike narrow,physiological models (such as the one by Tyrell [13]), it also addresses cognitive andsocial behavior. A simulation model of the Psi theory has been demonstrated withMicroPsi[14]. In the following, I will identify the core components of themotivational system.4.1 NeedsAll urges of the agent stem from a fixed and finite number of ‘hard-wired’ needs,implemented as parameters that tend to deviate from a target value. Because the agentstrives to maintain the target value by pursuing suitable behaviors, its activity can bedescribed as an attempt to maintain adynamic homeostasis.All behavior of Psi agents is directed towards a goal situation, that is characterizedby aconsumptive actionsatisfying one of the needs. In addition to what the physical(or virtual) embodiment of the agent dictates, there are cognitive needs that direct theagents towards exploration and the avoidance of needless repetition. The needs of theagent should be weighted against each other, so differences in importance can berepresented.A Motivational System for Cognitive AI 237Physiological needsFuel and water:In our simulations, water and fuel are used whenever an agentexecuted an action, especially locomotion.Certain areas of the environment causedthe agent to loose water quickly, which associated them with additional negativereinforcement signals.Intactness: Environmental hazards may damage the body of the agent, creating anincreased intactness need and leading tonegative reinforcement signals (akin topain).These simple needs can be extended at will, for instance by needs for shelter, for rest,for exercise, for certain types of nutrients etc.Cognitive needsCertainty:To direct agents towards the exploration of unknown objects and affairs,they possess an urge specifically for the reduction of uncertainty in their assessmentof situations, knowledge about objects and processes and in their expectations.Because the need for certainty is implemented similar to the physiological urges, theagent reacts to uncertainty just as it would to pain signals and will display a tendencyto remove this condition. This is done by triggering explorative behavior. Eventsleading to an urge for uncertainty reduction include:-

the agent meets unknown objects or events,-

for the recognized elements, there isno known connection to behavior—theagent has no knowledge what to do with them,-

there are problems to perceivethe current situation at all,-

there has been a breach of expectations; some event has turned out differentlyas anticipated,-

over-complexity: the situation changes faster than the perceptual process canhandle,-

the anticipated chain of events is either too short or branches too much. Bothconditions make predictions difficult.In each case, the uncertainty signal is weighted according tothe relation to theappetitive or aversive relevance of the object of uncertainty. The urge for certaintymay be satisfied by “certainty events”—the opposite of uncertainty events:-

the complete identification of objects and scenes,-

complete embedding of recognized elements into agent behaviors,-

fulfilled expectations (even negative ones),-

a long and non-branching chain of expected events.Like all urge-satisfying events, certainty events create a positive reinforcmentsignal and reduce the respective need. Because the agent may anticipate the rewardsignals from successful uncertainty reduction, it can actively look for newuncertainties to explore (“diversive exploration).Competence: When choosing an action, Psi agents weight the strength of thecorresponding urge against the chance of success. The measure for the chance ofsuccess to satisfy a given urge using a known behavior program is called “specificcompetence”. If the agent has no knowledge on how to satisfy an urge, it has to resort238 J. Bachto “general competence” as an estimate. Thus, general competence amounts tosomething like self-confidence of the agent, and it is an urge on its own. (Specificcompetencies are not urges.) The general competence reflects the ability to overcomeobstacles, which can be recognized as being sources of negative reinforcement signals,and to do that efficiently, which is represented by positive reinforcement signals. Thus,the general competence of an agent is estimated as a floating average over thereinforcement signals and the inverted displeasure signals. The general competence is aheuristics on how well the agent expects to perform in unknown situations.As in the case of uncertainty, the agent learns to anticipate the positive reinforcementsignals resulting from satisfying the competence urge. A main source of competence isthe reduction of uncertainty. As a result, the agent actively aims for problems that allowgaining competence, but avoids overly demanding situations to escape the frustration ofits competence urge. Ideally, this leads the agent into an environment of mediumdifficulty (measured by its current abilities to overcome obstacles).Aesthetics:Environmental situations and relationships can be represented in infinitelymany ways. Here ‘aesthetics’ corresponds to a need for improving representations,mainly by increasing their sparseness, while maintaining or increasing theirdescriptive qualities.Social needsAffiliation:Because the explorative and physiological desires of Psi agents are notsufficient to make them interested in each other, they have a need for positive socialsignals, so-called ‘legitimacy signals’. With a legitimacy signal (orl-signalfor short),agents may signal each other “okayness” with regard to the social group. Legitimacysignals are an expression of the sender’s belief in the social acceptability of the receiver.The need for l-signals needs frequent replenishment and thus amounts to an urge toaffiliate with other agents. Agents can send l-signals to reward each other forcooperation.Anti-l-signalsare the counterpart of l-signals. An anti-l-signal (whichbasically amounts to a frown) ‘punishes’ an agent by depleting its legitimacy reservoir.Agents may also be extended byinternal l-signals, which measure theconformance to internalized social norms.Supplicative signalsare ‘pleas for help’, i.e. promises to reward a cooperative actionwith l-signals or likewise cooperation in the future. Supplicative signals work like aspecific kind of anti-l-signals, because they increase the legitimacy urge of theaddressee when not answered. At the same time, they lead to (external and internal) l-signals when help is given. They can thus be used to triggeraltruistic behavior.The need for l-signals should adapt to the environment of the agent, and may alsovary strongly between agents, thus creating a wide range of types of social behavior.By making the receivable amount of l-signals dependent of the priming towardsparticular other agents, Psi agents might be induced to display‘jealous’ behavior.Social needs can be extended by romantic and sexual needs. However, there is noexplicit need for social power, because the model already captures social power as aspecific need for competence—the competence to satisfy social needs.

Even though the affiliation model is still fragmentary, we found that it provides agood handle on the agents during experiments. The experimenter can attempt toA Motivational System for Cognitive AI 239induce the agents to actions simply by the prospect of a smile or frown, which issometimes a good alternative to a more solid reward or punishment.4.2 Behavior Control and Action SelectionAllgoal-directed actions have their source in a motive that is connected to an urge,which in turn signals a physiological, cognitive or social need. Actions that are notdirected immediately onto a goal are either carried out to serve an exploratory goal orto avoid an aversive event. When a positive goal is reached (a need is partially orcompletely fulfilled), a positive reinforcement signal is created, which is used forlearning (by strengthening the associations ofthe goal with the actions and situationsthat have led to the fulfillment). In those cases in which a sub-goal does not yet leadto a consummative act, reaching it may still create a reinforcement via thecompetence it signals to the agent. After finally reaching a consumptive goal, theintermediate goals may receive further reinforcement by a retrogradient (backwards intime along the protocol) strengthening of the associations along the chain of eventsthat has lead to the target situation.Appetence and Aversion:For an urge to have an effect on the behavior on the agent,it does not matter whether itreallyhas an effect on its (physical or simulated) body,but that it is represented in the proper way within the cognitive system. Whenever theagent performs an action or is subjected to an event that reduces one of its urges, areinforcement signal with a strength that is proportional to this reduction is created bythe agent’s “pleasure center”. The naming of the “pleasure” and “displeasure centers”does not necessarily imply that the agent experiences something like pleasure ordispleasure. Like in humans, their purpose lies in signaling the reflexive evaluation ofpositive or harmful effects according to physiological, cognitive or social needs.(Experiencingthese signals would require an observation of these signals at certainlevels of the perceptual system of the agent.) Reinforcement signals create orstrengthen an association between the urge indicator and the action/event. Wheneverthe respective urge of the agent becomes active in the future, it may activate the nowconnected behavior/episodic schema. If the agent pursues the chains of actions/eventsleading to the situation alleviating the urge, we are witnessing goal-oriented behavior.Conversely, during events that increase a need (for instance by damaging the agentor frustrating one of its cognitive or social urges), the “displeasure center” creates asignal that causes an inverse link from the harmful situation to the urge indicator.When in future deliberation attempts (for instance, by extrapolating into theexpectation horizon) the respective situationgets activated, it also activates the urgeindicator and thus signals an aversion. Anaversion signalis a predictor for aversivesituations, and such aversive situations are avoided if possible.Motives:A motive consists of an urge (that is, the value of an indicator for a need)and a goal that has been associated to this indicator. The goal is a situation schemacharacterized by an action or event that has successfully reduced the urge in the past,and the goal situation tends to be the end element of a behavior program. Thesituations leading to the goal situation—that is, earlier stages in the connectedoccurrence schema or behavior program—might become intermediate goals. To turnthis sequence into an instance that may initiate a behavior, orient it towards a goal and240 J. Bachkeep it active, we need to add a connection to the pleasure/displeasure system. Theresult is amotivatorand consists of:!

a need sensor, connected to the pleasure/displeasure system in such a way, thatan increase in the deviation of the need from the target value creates adispleasure signal, and a decrease results in a pleasure signal. Thisreinforcement signal should be proportional to the strength of the increment ordecrement.!

optionally, a feedback loop that attempts to normalize the need automatically!

an urge indicator that becomes active if there is no way of automaticallyadjusting the need to its target value. The urge should be proportional to theneed.!

an associator (part of the pleasure/displeasure system) that creates a connectionbetween the urge indicator and an episodic schema/behavior program,specifically to the aversive or appetitive goal situation. The strength of theconnection should be proportional to the pleasure/displeasure signal. Note thatusually, an urge gets connected with more than one goal over time, since thereare often many ways to satisfy or increase a particular urge.Motive selection:If a motive becomes active, it is not always selected immediately;sometimes it will not be selected at all, because it conflicts with a stronger motive orthe chances of success when pursuing the motive are too low. In the terminology ofBelief-Desire-Intention agents[15], motives amount todesires, selected motives giverise to goals and thus areintentions. Active motives can be selected at any time, forinstance, an agent seeking fuel could satisfy a weaker urge for water on the way, justbecause the water is readily available, and thus, the active motives, together with theirrelated goals, behavior programs and so on, are calledintention memory. Theselection of a motive takes place according to avaluebysuccess probability

principle, where the value of a motive is given by its importance (indicated by therespective urge), and the success probability depends on the competence of the agentto reach the particular goal.In some cases, the agent may not know a way to reach a goal (i.e., it has noepistemic competence related to that goal). If the agent performs well in general, thatis, it has a highgeneralcompetence, it should still consider selecting the relatedmotive. The chance to reach a particular goal might be estimated using the sum of thegeneral competence and the epistemic competence for that goal. Thus, themotivestrengthto satisfy a needdis calculated asurged

 (generalCompetence+competenced), i.e. the product of the strength of the urge and the combinedcompetence.If the window of opportunity is limited, the motive strength should be enhancedwith a third factor:urgency. The rationale behind urgency lies in the aversive goalcreated by the anticipated failure of meeting the deadline. The urgency of a motiverelated to a time limit could be estimated by dividing the time needed through thetime left, and the motive strength for a motive with a deadline can be calculated using(urged+urgencyd)  (generalCompetence+competenced), i.e. as the combinedurgency multiplied with the combined competence. The time the agent has left toreach the goal can be inferred from episodic schemas stored in the agent’s currentA Motivational System for Cognitive AI 241expectation horizon, while the necessary timeto finish the goal oriented behavior canbe determined from the behavior program. (Obviously, these estimates require adetailed anticipation of things to come, which may be difficult to obtain.)At each time, only one motive is selected for the execution of its related behaviorprogram. There is a continuous competition between motives, to reflect changes in theenvironment and the internal states of the agent. To avoid oscillations betweenmotives, the switching between motives may be taxed with an additional cost: theselection threshold, a bonus that is added to the strength of the currently selectedmotive. The value of the selection thresholdcan be varied according to circumstances,rendering the agent ‘opportunistic’ or ‘stubborn’.

Intentions:As explained above, intentions amount to selected motives, combinedwith a way to achieve the desired outcome. Within the Psi theory, anintentionrefersto the set of representations that initiates, controls and structures the execution of anaction. (It is not required that an intention be conscious, that it is directed onto anobject etc.—here, intentions are simply those things that make actions happen.)Intentions may formintention hierarchies, i.e. to reach a goal it might be necessaryto establish sub-goals and pursue these. An intention can be seen as a set of a goalstate, an execution state, an intention history (the protocol of operations that tookplace in its context), a plan, the urge associated with the goal state (which delivers therelevance), the estimated specific competency to fulfill the intention (which is relatedto the probability of reaching the goal) and the time horizon during which theintention must be realized.The dynamics of modulation:In the course of the action selection and execution, Psiagents are modulated by several parameters: The agent’sactivationorarousal(whichresembles theascending reticular activation systemin humans) determines the action-readiness of an agent. It is proportional to the current strength of the urge signals. Theperceptual and memory processes are influenced by the agent’sresolution level,which is inversely related to the activation. A high resolution level increases thenumber of features examined during perception and memory retrieval, at the cost ofprocessing speed and resulting ambiguity. Theselection thresholddetermines howeasily the agent switches between conflicting intentions, and thesampling rateorsecuring thresholdcontrols the frequency of reflective and orientation behaviors. Thevalues of the modulators of an agent at a given time, together with the status of theurges, define a cognitive configuration, a setup that may be interpreted as anemergentemotional state.5 SummaryThe Psi theory defines a possible solution for a drive-based, poly-thematicmotivational system. It does not only explain how physiological needs can bepursued, but also addresses the establishment of cognitive and social goals.Its straightforward integration of needs allows adapting it quickly to differentenvironments and types of agents; a version of the model has been successfullyevaluated against human performance in problem solving game [9].242 J. BachThe existing implementation of the Psi theory in the MicroPsi architecture [14] stillrestricts social signals to simplel-signalsandanti-l-signals, and it does not cover aneed for improving internal representations (‘aesthetics’). Still, it may act as aqualitative demonstrator of an already quite broad computational model ofmotivation.The suggested motivational model can be implemented in a variety of differentways, and we are currently working on transferring it to other cognitive architecturesto obtain further scenarios and test-beds for criticizing and improving it.References1.