2011年12月14日水曜日

Learning theory suggests that animals attend to pertinent environmental cues when reward contingencies unexpectedly change so that learning can occur. We have previously shown that activity in basolateral nucleus of amygdala (ABL) responds to unexpected changes in reward value, consistent with unsigned prediction error signals theorized by Pearce and Hall. However, changes in activity were present only at the time of unexpected reward delivery, not during the time when the animal needed to attend to conditioned stimuli that would come to predict the reward. This suggested that a different brain area must be signaling the need for attention necessary for learning. One likely candidate to fulfill this role is the anterior cingulate cortex (ACC). To test this hypothesis, we recorded from single neurons in ACC as rats performed the same behavioral task that we have used to dissociate signed from unsigned prediction errors in dopamine and ABL neurons. In this task, rats chose between two fluid wells that produced varying magnitudes of and delays to reward. Consistent with previous work, we found that ACC detected errors of commission and reward prediction errors. We also found that activity during cue sampling encoded reward size, but not expected delay to reward. Finally, activity in ACC was elevated during trials in which attention was increased following unexpected upshifts and downshifts in value. We conclude that ACC not only signals errors in reward prediction, as previously reported, but also signals the need for enhanced neural resources during learning on trials subsequent to those errors.

Learning can be motivated by unanticipated success or unexpected failure. The former encourages us to repeat an action or activity, whereas the latter leads us to find an alternative strategy. Understanding the neural representation of these unexpected events is therefore critical to elucidate learning-related circuits. We examined the activity of neurons in the lateral prefrontal cortex (PFC) and caudate nucleus of monkeys as they performed a trial-and-error learning task. Unexpected outcomes were widely represented in both structures, and neurons driven by unexpectedly negative outcomes were as frequent as those activated by unexpectedly positive outcomes. Moreover, both positive and negative reward prediction errors (RPEs) were represented primarily by increases in firing rate, unlike the manner in which dopamine neurons have been observed to reflect these values. Interestingly, positive RPEs tended to appear with shorter latency than negative RPEs, perhaps reflecting the mechanism of their generation. Last, in the PFC but not the caudate, trial-by-trial variations in outcome-related activity were linked to the animals' subsequent behavioral decisions. More broadly, the robustness of RPE signaling by these neurons suggests that actor-critic models of reinforcement learning in which the PFC and particularly the caudate are considered primarily to be “actors” rather than “critics,” should be reconsidered to include a prominent evaluative role for these structures.

The computational processes by which attention improves behavioral performance were characterized by measuring visual cortical activity with functional magnetic resonance imaging as humans performed a contrast-discrimination task with focal and distributed attention. Focal attention yielded robust improvements in behavioral performance accompanied by increases in cortical responses. Quantitative analysis revealed that if performance were limited only by the sensitivity of the measured sensory signals, the improvements in behavioral performance would have corresponded to an unrealistically large reduction in response variability. Instead, behavioral performance was well characterized by a pooling and selection process for which the largest sensory responses, those most strongly modulated by attention, dominated the perceptual decision. This characterization predicts that high-contrast distracters that evoke large responses should negatively impact behavioral performance. We tested and confirmed this prediction. We conclude that attention enhanced behavioral performance predominantly by enabling efficient selection of the behaviorally relevant sensory signals.

Midbrain dopamine (DA) neurons respond to sensory stimuli associated with future rewards. When reward is delivered probabilistically, DA neurons reflect this uncertainty by increasing their firing rates in a period between the sensory cue and reward delivery time. Probability of reward, however, has been externally conveyed by visual cues, and it is not known whether DA neurons would signal uncertainty arising internally. Here we show that DA neurons code the uncertainty associated with a perceptual judgment about the presence or absence of a vibrotactile stimulus. We observed that uncertainty modulates the activity elicited by a go cue instructing monkey subjects to communicate their decisions. That is, the same go cue generates different DA responses depending on the uncertainty level of a judgment made a few seconds before the go instruction. Easily detected suprathreshold stimuli elicit small DA responses, indicating that future reward will not be a surprising event. In contrast, the absence of a sensory stimulus generates large DA responses associated with uncertainty: was the stimulus truly absent, or did a low-amplitude vibration go undetected? In addition, the responses of DA neurons to the stimulus itself increase with vibration amplitude, but only when monkeys correctly detect its presence. This finding suggests that DA activity is not related to actual intensity but rather to perceived intensity. Therefore, in addition to their well-known role in reward prediction, DA neurons code subjective sensory experience and uncertainty arising internally from perceptual decisions.

Standard economic and evolutionary models assume that humans are fundamentally selfish. On this view, any acts of prosociality—such as cooperation, giving, and other forms of altruism—result from covert attempts to avoid social injunctions against selfishness. However, even in the absence of social pressure, individuals routinely forego personal gain to share resources with others. Such anomalous giving cannot be accounted for by standard models of social behavior. Recent observations have suggested that, instead, prosocial behavior may reflect an intrinsic value placed on social ideals such as equity and charity. Here, we show that, consistent with this alternative account, making equitable interpersonal decisions engaged neural structures involved in computing subjective value, even when doing so required foregoing material resources. By contrast, making inequitable decisions produced activity in the anterior insula, a region linked to the experience of subjective disutility. Moreover, inequity-related insula response predicted individuals’ unwillingness to make inequitable choices. Together, these data suggest that prosocial behavior is not simply a response to external pressure, but instead represents an intrinsic, and intrinsically social, class of reward.

Decisions are most effective after collecting sufficient evidence to accurately predict rewarding outcomes. We investigated whether human participants optimally seek evidence and we characterized the brain areas associated with their evidence seeking. Participants viewed sequences of bead colors drawn from hidden urns and attempted to infer the majority bead color in each urn. When viewing each bead color, participants chose either to seek more evidence about the urn by drawing another bead (draw choices) or to infer the urn contents (urn choices). We then compared their evidence seeking against that predicted by a Bayesian ideal observer model. By this standard, participants sampled less evidence than optimal. Also, when faced with urns that had bead color splits closer to chance (60/40 versus 80/20) or potential monetary losses, participants increased their evidence seeking, but they showed less increase than predicted by the ideal observer model. Functional magnetic resonance imaging showed that urn choices evoked larger hemodynamic responses than draw choices in the insula, striatum, anterior cingulate, and parietal cortex. These parietal responses were greater for participants who sought more evidence on average and for participants who increased more their evidence seeking when draws came from 60/40 urns. The parietal cortex and insula were associated with potential monetary loss. Insula responses also showed modulation with estimates of the expected gains of urn choices. Our findings show that participants sought less evidence than predicted by an ideal observer model and their evidence-seeking behavior may relate to responses in the insula and parietal cortex.

Many psychiatric disorders are characterized by abnormal risky decision-making and dysregulated dopamine receptor expression. The current study was designed to determine how different dopamine receptor subtypes modulate risk-taking in young adult rats, using a “Risky Decision-making Task” that involves choices between small “safe” rewards and large “risky” rewards accompanied by adverse consequences. Rats showed considerable, stable individual differences in risk preference in the task, which were not related to multiple measures of reward motivation, anxiety, or pain sensitivity. Systemic activation of D2-like receptors robustly attenuated risk-taking, whereas drugs acting on D1-like receptors had no effect. Systemic amphetamine also reduced risk-taking, an effect which was attenuated by D2-like (but not D1-like) receptor blockade. Dopamine receptor mRNA expression was evaluated in a separate cohort of drug-naive rats characterized in the task. D1 mRNA expression in both nucleus accumbens shell and insular cortex was positively associated with risk-taking, while D2 mRNA expression in orbitofrontal and medial prefrontal cortex predicted risk preference in opposing nonlinear patterns. Additionally, lower levels of D2 mRNA in dorsal striatum were associated with greater risk-taking. These data strongly implicate dopamine signaling in prefrontal cortical-striatal circuitry in modulating decision-making processes involving integration of reward information with risks of adverse consequences.

Reinforcements and punishments facilitate adaptive behavior in diverse domains ranging from perception to social interactions. A conventional approach to understanding the corresponding neural substrates focuses on the basal ganglia and its dopaminergic projections. Here, we show that reinforcement and punishment signals are surprisingly ubiquitous in the gray matter of nearly every subdivision of the human brain. Humans played either matching-pennies or rock-paper-scissors games against computerized opponents while being scanned using fMRI. Multivoxel pattern analysis was used to decode previous choices and their outcomes, and to predict upcoming choices. Whereas choices were decodable from a confined set of brain structures, their outcomes were decodable from nearly all cortical and subcortical structures. In addition, signals related to both reinforcements and punishments were recovered reliably in many areas and displayed patterns not consistent with salience-based explanations. Thus, reinforcement and punishment might play global modulatory roles in the entire brain.

The orbitofrontal cortex (OFC) has been hypothesized to carry information regarding the value of expected rewards. Such information could be used for generating instructive error signals conveyed by dopamine neurons. Here the authors report that this is indeed the case. However, contrary to the simplest hypothesis, OFC lesions did not result in the loss of all value information. Instead, lesions caused the loss of value information derived from model-based representations.

Reward signals are widespread in the brain, but why? A study now identifies an important difference in the reward signals encoded by the neurons in the primate anterior cingulate and orbitofrontal cortices during decision making, suggesting that reward-related activity in these areas is shaped by different contextual factors.

Reward prediction error (RPE) signals are central to current models of reward-learning. Temporal difference (TD) learning models posit that these signals should be modulated by predictions, not only of magnitude but also timing of reward. Here we show that BOLD activity in the VTA conforms to such TD predictions: responses to unexpected rewards are modulated by a temporal hazard function and activity between a predictive stimulus and reward is depressed in proportion to predicted reward. By contrast, BOLD activity in ventral striatum (VS) does not reflect a TD RPE, but instead encodes a signal on the variable relevant for behavior, here timing but not magnitude of reward. The results have important implications for dopaminergic models of cortico-striatal learning and suggest a modification of the conventional view that VS BOLD necessarily reflects inputs from dopaminergic VTA neurons signaling an RPE.

Human subjects are proficient at tracking the mean and variance of rewards and updating these via prediction errors. Here, we addressed whether humans can also learn about higher-order relationships between distinct environmental outcomes, a defining ecological feature of contexts where multiple sources of rewards are available. By manipulating the degree to which distinct outcomes are correlated, we show that subjects implemented an explicit model-based strategy to learn the associated outcome correlations and were adept in using that information to dynamically adjust their choices in a task that required a minimization of outcome variance. Importantly, the experimentally generated outcome correlations were explicitly represented neuronally in right midinsula with a learning prediction error signal expressed in rostral anterior cingulate cortex. Thus, our data show that the human brain represents higher-order correlation structures between rewards, a core adaptive ability whose immediate benefit is optimized sampling.

Humans are noted for their capacity to over-ride self-interest in favor of normatively valued goals. We examined the neural circuitry that is causally involved in normative, fairness-related decisions by generating a temporarily diminished capacity for costly normative behavior, a 'deviant' case, through non-invasive brain stimulation (repetitive transcranial magnetic stimulation) and compared normal subjects' functional magnetic resonance imaging signals with those of the deviant subjects. When fairness and economic self-interest were in conflict, normal subjects (who make costly normative decisions at a much higher frequency) displayed significantly higher activity in, and connectivity between, the right dorsolateral prefrontal cortex (DLPFC) and the posterior ventromedial prefrontal cortex (pVMPFC). In contrast, when there was no conflict between fairness and economic self-interest, both types of subjects displayed identical neural patterns and behaved identically. These findings suggest that a parsimonious prefrontal network, the activation of right DLPFC and pVMPFC, and the connectivity between them, facilitates subjects' willingness to incur the cost of normative decisions.

2011年9月15日木曜日

Although the human amygdala and striatum have both been implicated in associative learning, only the striatum's contribution has been consistently computationally characterized. Using a reversal learning task, we found that amygdala blood oxygen level–dependent activity tracked associability as estimated by a computational model, and dissociated it from the striatal representation of reinforcement prediction error. These results extend the computational learning approach from striatum to amygdala, demonstrating their complementary roles in aversive learning.

Confidence is an essential ingredient of success in a wide range of domains ranging from job performance and mental health to sports, business and combat1, 2, 3, 4. Some authors have suggested that not just confidence but overconfidence—believing you are better than you are in reality—is advantageous because it serves to increase ambition, morale, resolve, persistence or the credibility of bluffing, generating a self-fulfilling prophecy in which exaggerated confidence actually increases the probability of success3, 4, 5, 6, 7, 8. However, overconfidence also leads to faulty assessments, unrealistic expectations and hazardous decisions, so it remains a puzzle how such a false belief could evolve or remain stable in a population of competing strategies that include accurate, unbiased beliefs. Here we present an evolutionary model showing that, counterintuitively, overconfidence maximizes individual fitness and populations tend to become overconfident, as long as benefits from contested resources are sufficiently large compared with the cost of competition. In contrast, unbiased strategies are only stable under limited conditions. The fact that overconfident populations are evolutionarily stable in a wide range of environments may help to explain why overconfidence remains prevalent today, even if it contributes to hubris, market bubbles, financial collapses, policy failures, disasters and costly wars9, 10, 11, 12, 13.

Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based on work in nonhuman primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging. Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis—social preferences, response latencies, and modeling neural responses—are consistent with reinforcement learning theory and nonhuman primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one's peers in altering subsequent behavior.

There is a growing consensus in behavioral neuroscience that the brain makes simple choices by first assigning a value to the options under consideration and then comparing them. Two important open questions are whether the brain encodes absolute or relative value signals, and what role attention might play in these computations. We investigated these questions using a human fMRI experiment with a binary choice task in which the fixations to both stimuli were exogenously manipulated to control for the role of visual attention in the valuation computation. We found that the ventromedial prefrontal cortex and the ventral striatum encoded fixation-dependent relative value signals: activity in these areas correlated with the difference in value between the attended and the unattended items. These attention-modulated relative value signals might serve as the input of a comparator system that is used to make a choice.

The ability to learn from the consequences of actions—no matter when those consequences take place—is central to adaptive behavior. Despite major advances in understanding how immediate feedback drives learning, it remains unknown precisely how the brain learns from delayed feedback. Here, we present converging evidence from neuropsychology and neuroimaging for distinct roles for the striatum and the hippocampus in learning, depending on whether feedback is immediate or delayed. We show that individuals with striatal dysfunction due to Parkinson's disease are impaired at learning when feedback is immediate, but not when feedback is delayed by a few seconds. Using functional imaging (fMRI) combined with computational model-derived analyses, we further demonstrate that healthy individuals show activation in the striatum during learning from immediate feedback and activation in the hippocampus during learning from delayed feedback. Additionally, later episodic memory for delayed feedback events was enhanced, suggesting that engaging distinct neural systems during learning had consequences for the representation of what was learned. Together, these findings provide direct evidence from humans that striatal systems are necessary for learning from immediate feedback and that delaying feedback leads to a shift in learning from the striatum to the hippocampus. The results provide a link between learning impairments in Parkinson's disease and evidence from single-unit recordings demonstrating that the timing of reinforcement modulates activity of midbrain dopamine neurons. Collectively, these findings indicate that relatively small changes in the circumstances under which information is learned can shift learning from one brain system to another.

Neuroimaging and neuropsychological studies implicate both frontal and temporoparietal cortices when humans reason about the mental states of others. Here, we report an event-related potentials study of the time course of one such “theory of mind” ability: visual perspective taking. The findings suggest that posterior cortex, perhaps the temporoparietal cortex, calculates and represents the perspective of self versus other, and then, later, the right frontal cortex resolves conflict between perspectives during response selection.

We investigated how rapidly the reward-predicting properties of visual cues are signaled in the human brain and the extent these reward prediction signals are contextually modifiable. In a magnetoencephalography study, we presented participants with fractal visual cues that predicted monetary rewards with different probabilities. These cues were presented in the temporal context of a preceding novel or familiar image of a natural scene. Starting at ∼100 ms after cue onset, reward probability was signaled in the event-related fields (ERFs) over temporo-occipital sensors and in the power of theta (5–8 Hz) and beta (20–30 Hz) band oscillations over frontal sensors. While theta decreased with reward probability beta power showed the opposite effect. Thus, in humans anticipatory reward responses are generated rapidly, within 100 ms after the onset of reward-predicting cues, which is similar to the timing established in non-human primates. Contextual novelty enhanced the reward anticipation responses in both ERFs and in beta oscillations starting at ∼100 ms after cue onset. This very early context effect is compatible with a physiological model that invokes the mediation of a hippocampal-VTA loop according to which novelty modulates neural response properties within the reward circuitry. We conclude that the neural processing of cues that predict future rewards is temporally highly efficient and contextually modifiable.

2011年8月31日水曜日

Humans and monkeys can learn to classify perceptual information in a statistically optimal fashion if the functional groupings remain stable over many hundreds of trials, but little is known about categorization when the environment changes rapidly. Here, we used a combination of computational modeling and functional neuroimaging to understand how humans classify visual stimuli drawn from categories whose mean and variance jumped unpredictably. Models based on optimal learning (Bayesian model) and a cognitive strategy (working memory model) both explained unique variance in choice, reaction time, and brain activity. However, the working memory model was the best predictor of performance in volatile environments, whereas statistically optimal performance emerged in periods of relative stability. Bayesian and working memory models predicted decision-related activity in distinct regions of the prefrontal cortex and midbrain. These findings suggest that perceptual category judgments, like value-guided choices, may be guided by multiple controllers.

The macaque ventrolateral prefrontal (VLPF) area 12r is thought to be involved in higher-order nonspatial information processing. We found that this area is connectionally heterogeneous, and the intermediate part is fully integrated in a cortical network involved in selecting and controlling object-oriented hand and mouth actions. Specifically, intermediate area 12r displayed dense connections with the caudal half of area 46v and orbitofrontal areas and relatively strong extraprefrontal connections involving the following: (1) the hand- and mouth-related ventral premotor area F5 and the anterior intraparietal (AIP) area, jointly involved in visuomotor transformations for grasping; (2) the SII sector that is connected to AIP and F5; (3) a sector of the inferotemporal area TEa/m, primarily corresponding to the sector densely connected to AIP; and (4) the insular and opercular frontal sectors, which are connected to AIP and F5. This connectivity pattern differed markedly from those of the caudal and rostral parts of area 12r. Caudal area 12r displayed dense connections with the caudal part of the VLPF, including oculomotor areas 8/FEF and 45B, relatively weak orbitofrontal connections and extraprefrontal connections limited to the inferotemporal cortex. Rostral area 12r displayed connections mostly with rostral prefrontal and orbitofrontal areas and relatively weaker connections with the fundus and the upper bank of the superior temporal sulcus. The present data suggest that the intermediate part of area 12r is involved in nonspatial information processing related to object properties and identity, for selecting and controlling goal-directed hand and mouth actions.

Spontaneous mimicry of other people's actions serves an important social function, enhancing affiliation and social interaction. This mimicry can be subtly modulated by different social contexts. We recently found behavioral evidence that direct eye gaze rapidly and specifically enhances mimicry of intransitive hand movements (Wang et al., 2011). Based on past findings linking medial prefrontal cortex (mPFC) to both eye contact and the control of mimicry, we hypothesized that mPFC might be the neural origin of this behavioral effect. The present study aimed to test this hypothesis. During functional magnetic resonance imaging (fMRI) scanning, 20 human participants performed a simple mimicry or no-mimicry task, as previously described (Wang et al., 2011), with direct gaze present on half of the trials. As predicted, fMRI results showed that performing the task activated mirror systems, while direct gaze and inhibition of the natural tendency to mimic both engaged mPFC. Critically, we found an interaction between mimicry and eye contact in mPFC, superior temporal sulcus (STS) and inferior frontal gyrus. We then used dynamic causal modeling to contrast 12 possible models of information processing in this network. Results supported a model in which eye contact controls mimicry by modulating the connection strength from mPFC to STS. This suggests that mPFC is the originator of the gaze–mimicry interaction and that it modulates sensory input to the mirror system. Thus, our results demonstrate how different components of the social brain work together to on-line control mimicry according to the social context.

Associative learning is a dynamic process that allows us to incorporate new knowledge within existing semantic networks. Even after years, a seemingly stable association can be altered by a single significant experience. Here, we investigate whether the acquisition of new associations affects the neural representation of stimuli and how the brain categorizes stimuli according to preexisting and emerging associations. Functional MRI data were collected during a differential fear conditioning procedure and at test (4–5 weeks later). Two pictures of faces and two pictures of houses served as stimuli. One of each pair coterminated with a shock in half of the trials (partial reinforcement). Applying Multivoxel Pattern Analysis (MVPA) in a trial-by-trial manner, we quantified changes in the similarity of neural representations of stimuli over the course of conditioning. Our findings show an increase in similarity of neural patterns throughout the cortex on consecutive trials of the reinforced stimuli. Furthermore, neural pattern similarity reveals a shift from original categories (faces/houses) toward new categories (reinforced/unreinforced) over the course of conditioning. This effect was differentially represented in the cortex, with visual areas primarily reflecting similarity of low-level stimulus properties (original categories) and frontal areas reflecting similarity of stimulus significance (new categories). Effects were not dependent on overall response amplitude and were still present during follow-up. We conclude that trial-by-trial MVPA is a useful tool for examining how the human brain encodes relevant associations and forms new associative networks.

Lateral habenula (LHb) neurons signal negative “reward-prediction errors” and inhibit midbrain dopamine (DA) neurons. Yet LHb neurons are largely glutamatergic, indicating that this inhibition may occur through an intermediate structure. Recent studies in rats have suggested a candidate for this role, the GABAergic rostromedial tegmental nucleus (RMTg), but this neural pathway has not yet been tested directly. We now show using electrophysiology and anatomic tracing that (1) the monkey has an inhibitory structure similar to the rat RMTg; (2) RMTg neurons receive excitatory input from the LHb, exhibit negative reward-prediction errors, and send axonal projections near DA soma; and (3) stimulating this structure inhibits DA neurons. Surprisingly, some RMTg neurons responded to reward cues earlier than the LHb, and carry “state-value” signals not found in DA neurons. Thus, our data suggest that the RMTg translates LHb reward-prediction errors (negative) into DA reward-prediction errors (positive), while transmitting additional motivational signals to non-DA networks.

Human behavior displays hierarchical structure: simple actions cohere into subtask sequences, which work together to accomplish overall task goals. Although the neural substrates of such hierarchy have been the target of increasing research, they remain poorly understood. We propose that the computations supporting hierarchical behavior may relate to those in hierarchical reinforcement learning (HRL), a machine-learning framework that extends reinforcement-learning mechanisms into hierarchical domains. To test this, we leveraged a distinctive prediction arising from HRL. In ordinary reinforcement learning, reward prediction errors are computed when there is an unanticipated change in the prospects for accomplishing overall task goals. HRL entails that prediction errors should also occur in relation to task subgoals. In three neuroimaging studies we observed neural responses consistent with such subgoal-related reward prediction errors, within structures previously implicated in reinforcement learning. The results reported support the relevance of HRL to the neural processes underlying hierarchical behavior.

The basolateral amygdala (BLA) has a crucial role in emotional learning irrespective of valence1, 2, 3, 4, 5, 21, 22, 23. The BLA projection to the nucleus accumbens (NAc) is thought to modulate cue-triggered motivated behaviours4, 6, 7, 24, 25, but our understanding of the interaction between these two brain regions has been limited by the inability to manipulate neural-circuit elements of this pathway selectively during behaviour. To circumvent this limitation, we used in vivo optogenetic stimulation or inhibition of glutamatergic fibres from the BLA to the NAc, coupled with intracranial pharmacology and ex vivo electrophysiology. Here we show that optical stimulation of the pathway from the BLA to the NAc in mice reinforces behavioural responding to earn additional optical stimulation of these synaptic inputs. Optical stimulation of these glutamatergic fibres required intra-NAc dopamine D1-type receptor signalling, but not D2-type receptor signalling. Brief optical inhibition of fibres from the BLA to the NAc reduced cue-evoked intake of sucrose, demonstrating an important role of this specific pathway in controlling naturally occurring reward-related behaviour. Moreover, although optical stimulation of glutamatergic fibres from the medial prefrontal cortex to the NAc also elicited reliable excitatory synaptic responses, optical self-stimulation behaviour was not observed by activation of this pathway. These data indicate that whereas the BLA is important for processing both positive and negative affect, the glutamatergic pathway from the BLA to the NAc, in conjunction with dopamine signalling in the NAc, promotes motivated behavioural responding. Thus, optogenetic manipulation of anatomically distinct synaptic inputs to the NAc reveals functionally distinct properties of these inputs in controlling reward-seeking behaviours.

2011年7月20日水曜日

The macaque orbital prefrontal cortex (PFo) has been implicated in a wide range of reward-guided behaviors essential for efficient foraging. The PFo, however, is not a homogeneous structure. Two major subregions, distinct by their cytoarchitecture and connections to other brain structures, compose the PFo. One subregion encompasses Walker's areas 11 and 13 and the other centers on Walker's area 14. Although it has been suggested that these subregions play dissociable roles in reward-guided behavior, direct neuropsychological evidence for this hypothesis is limited. To explore the independent contributions of PFo subregions to behavior, we studied rhesus monkeys (Macaca mulatta) with restricted excitotoxic lesions targeting either Walker's areas 11/13 or area 14. The performance of these two groups was compared to that of a group of unoperated controls on a series of reward-based tasks that has been shown to be sensitive to lesions of the PFo as a whole (Walker's areas 11, 13, and 14). Lesions of areas 11/13, but not area 14, disrupted the rapid updating of object value during selective satiation. In contrast, lesions targeting area 14, but not areas 11/13, impaired the ability of monkeys to learn to stop responding to a previously rewarded object. Somewhat surprisingly, neither lesion disrupted performance on a serial object reversal learning task, although aspiration lesions of the entire PFo produce severe deficits on this task. Our data indicate that anatomically defined subregions within macaque PFo make dissociable contributions to reward-guided behavior.

Motivation improves the efficiency of intentional behavior, but how this performance modulation is instantiated in the human brain remains unclear. We used a reward-cued antisaccade paradigm to investigate how motivational goals (the expectation of a reward for good performance) modulate patterns of neural activation and functional connectivity to improve preparation for antisaccade performance. Behaviorally, subjects performed better (faster and more accurate antisaccades) when they knew they would be rewarded for good performance. Reward anticipation was associated with increased activation in the ventral and dorsal striatum, and cortical oculomotor regions. Functional connectivity between the caudate nucleus and cortical oculomotor control structures predicted individual differences in the behavioral benefit of reward anticipation. We conclude that although both dorsal and ventral striatal circuitry are involved in the anticipation of reward, only the dorsal striatum and its connected cortical network is involved in the direct modulation of oculomotor behavior by motivational incentive.

The representation of value is a critical component of decision making. Rational choice theory assumes that options are assigned absolute values, independent of the value or existence of other alternatives. However, context-dependent choice behavior in both animals and humans violates this assumption, suggesting that biological decision processes rely on comparative evaluation. Here we show that neurons in the monkey lateral intraparietal cortex encode a relative form of saccadic value, explicitly dependent on the values of the other available alternatives. Analogous to extra-classical receptive field effects in visual cortex, this relative representation incorporates target values outside the response field and is observed in both stimulus-driven activity and baseline firing rates. This context-dependent modulation is precisely described by divisive normalization, indicating that this standard form of sensory gain control may be a general mechanism of cortical computation. Such normalization in decision circuits effectively implements an adaptive gain control for value coding and provides a possible mechanistic basis for behavioral context-dependent violations of rationality.

How does the brain translate information signaling potential rewards into motivation to get them? Motivation to obtain reward is thought to depend on the midbrain [particularly the ventral tegmental area (VTA)], the nucleus accumbens (NAcc), and the dorsolateral prefrontal cortex (dlPFC), but it is not clear how the interactions among these regions relate to reward-motivated behavior. To study the influence of motivation on these reward-responsive regions and on their interactions, we used dynamic causal modeling to analyze functional magnetic resonance imaging (fMRI) data from humans performing a simple task designed to isolate reward anticipation. The use of fMRI permitted the simultaneous measurement of multiple brain regions while human participants anticipated and prepared for opportunities to obtain reward, thus allowing characterization of how information about reward changes physiology underlying motivational drive. Furthermore, we modeled the impact of external reward cues on causal relationships within this network, thus elaborating a link between physiology, connectivity, and motivation. Specifically, our results indicated that dlPFC was the exclusive entry point of information about reward in this network, and that anticipated reward availability caused VTA activation only via its effect on the dlPFC. Anticipated reward thus increased dlPFC activation directly, whereas it influenced VTA and NAcc only indirectly, by enhancing intrinsically weak or inactive pathways from the dlPFC. Our findings of a directional prefrontal influence on dopaminergic regions during reward anticipation suggest a model in which the dlPFC integrates and transmits representations of reward to the mesolimbic and mesocortical dopamine systems, thereby initiating motivated behavior.

2011年7月12日火曜日

We consider the mechanisms that enable decisions to be postponed for a period after the evidence has been provided. Using an information theoretic approach, we show that information about the forthcoming action becomes available from the activity of neurons in the medial premotor cortex in a sequential decision-making task after the second stimulus is applied, providing the information for a decision about whether the first or second stimulus is higher in vibrotactile frequency. The information then decays in a 3-s delay period in which the neuronal activity declines before the behavioral response can be made. The information then increases again when the behavioral response is required. We model this neuronal activity using an attractor decision-making network in which information reflecting the decision is maintained at a low level during the delay period, and is then selectively restored by a nonspecific input when the response is required. One mechanism for the short-term memory is synaptic facilitation, which can implement a mechanism for postponed decisions that can be correct even when there is little neuronal firing during the delay period before the postponed decision. Another mechanism is graded firing rates by different neurons in the delay period, with restoration by the nonspecific input of the low-rate activity from the higher-rate neurons still firing in the delay period. These mechanisms can account for the decision making and for the memory of the decision before a response can be made, which are evident in the activity of neurons in the medial premotor cortex.

Understanding cooperation and punishment in small-scale societies is crucial for explaining the origins of human cooperation. We studied warfare among the Turkana, a politically uncentralized, egalitarian, nomadic pastoral society in East Africa. Based on a representative sample of 88 recent raids, we show that the Turkana sustain costly cooperation in combat at a remarkably large scale, at least in part, through punishment of free-riders. Raiding parties comprised several hundred warriors and participants are not kin or day-to-day interactants. Warriors incur substantial risk of death and produce collective benefits. Cowardice and desertions occur, and are punished by community-imposed sanctions, including collective corporal punishment and fines. Furthermore, Turkana norms governing warfare benefit the ethnolinguistic group, a population of a half-million people, at the expense of smaller social groupings. These results challenge current views that punishment is unimportant in small-scale societies and that human cooperation evolved in small groups of kin and familiar individuals. Instead, these results suggest that cooperation at the larger scale of ethnolinguistic units enforced by third-party sanctions could have a deep evolutionary history in the human species.

2011年6月30日木曜日

From group hunting to global warming, how to deal with collective action may be formulated in terms of a public goods game of cooperation. In most cases, contributions depend on the risk of future losses. Here, we introduce an evolutionary dynamics approach to a broad class of cooperation problems in which attempting to minimize future losses turns the risk of failure into a central issue in individual decisions. We find that decisions within small groups under high risk and stringent requirements to success significantly raise the chances of coordinating actions and escaping the tragedy of the commons. We also offer insights on the scale at which public goods problems of cooperation are best solved. Instead of large-scale endeavors involving most of the population, which as we argue, may be counterproductive to achieve cooperation, the joint combination of local agreements within groups that are small compared with the population at risk is prone to significantly raise the probability of success. In addition, our model predicts that, if one takes into consideration that groups of different sizes are interwoven in complex networks of contacts, the chances for global coordination in an overall cooperating state are further enhanced.

In numerous and high-pro?le studies, researchers have recently begun to integrate computational models into the analysis of data from experiments on reward learning and decision making (Platt and Glimcher, 1999; O'Doherty et al., 2003; Sugrue et al., 2004; Barraclough et al., 2004; Samejima et al., 2005; Daw et al., 2006; Li et al., 2006; Frank et al., 2007; Tom et al., 2007; Kable and Glimcher, 2007; Lohrenz et al., 2007; Schonberg et al., 2007; Wittmann et al., 2008; Hare et al., 2008; Hampton et al., 2008; Plassmann et al., 2008). As these techniques are spreading rapidly, but have been developed and documented somewhat sporadically alongside the studies themselves, the present review aims to clarify the toolbox (see also O’Doherty et al., 2007). In particular, we discuss the rationale for these methods and the questions they are suited to address. We then offer a relatively practical tutorial about the basic statistical methods for their answer and how they can be applied to data analysis. The techniques are illustrated with ?ts of simple models to simulated datasets. Throughout, we ?ag interpretational and technical pitfalls of which we believe authors, reviewers, and readers should be aware. We focus on cataloging the particular, admittedly somewhat idiosyncratic, combination of techniques frequently used in this literature, but also on exposing these techniques as instances of a general set of tools that can be applied to analyze behavioral and neural data of many sorts. A number of other reviews (Daw and Doya, 2006; Dayan and Niv, 2008) have focused on the scientific conclusions that have been obtained with these methods, an issue we omit almost entirely here. There are also excellent books that cover statistical inference of this general sort with much greater generality, formal precision, and detail (MacKay, 2003; Gelman et al., 2004; Bishop, 2006; Gelman and Hill, 2007).

2011年6月27日月曜日

Deciding when to leave a depleting resource to exploit another is a fundamental problem for all decision makers. The neuronal mechanisms mediating patch-leaving decisions remain unknown. We found that neurons in primate (Macaca mulatta) dorsal anterior cingulate cortex, an area that is linked to reward monitoring and executive control, encode a decision variable signaling the relative value of leaving a depleting resource for a new one. Neurons fired during each sequential decision to stay in a patch and, for each travel time, these responses reached a fixed threshold for patch-leaving. Longer travel times reduced the gain of neural responses for choosing to stay in a patch and increased the firing rate threshold mandating patch-leaving. These modulations more closely matched behavioral decisions than any single task variable. These findings portend an understanding of the neural basis of foraging decisions and endorse the unification of theoretical and experimental work in ecology and neuroscience.

Everyday choice options have advantages (positive values) and disadvantages (negative values) that need to be integrated into an overall subjective value. For decades, economic models have assumed that when a person evaluates a choice option, different values contribute independently to the overall subjective value of the option. However, human choice behavior often violates this assumption, suggesting interactions between values. To investigate how qualitatively different advantages and disadvantages are integrated into an overall subjective value, we measured the brain activity of human subjects using fMRI while they were accepting or rejecting choice options that were combinations of monetary reward and physical pain. We compared different subjective value models on behavioral and neural data. These models all made similar predictions of choice behavior, suggesting that behavioral data alone are not sufficient to uncover the underlying integration mechanism. Strikingly, a direct model comparison on brain data decisively demonstrated that interactive value integration (where values interact and affect overall valuation) predicts neural activity in value-sensitive brain regions significantly better than the independent mechanism. Furthermore, effective connectivity analyses revealed that value-dependent changes in valuation are associated with modulations in subgenual anterior cingulate cortex–amygdala coupling. These results provide novel insights into the neurobiological underpinnings of human decision making involving the integration of different values.

Reward-guided decision-making and learning depends on distributed neural circuits with many components. Here we focus on recent evidence that suggests four frontal lobe regions make distinct contributions to reward-guided learning and decision-making: the lateral orbitofrontal cortex, the ventromedial prefrontal cortex and adjacent medial orbitofrontal cortex, anterior cingulate cortex, and the anterior lateral prefrontal cortex. We attempt to identify common themes in experiments with human participants and with animal models, which suggest roles that the areas play in learning about reward associations, selecting reward goals, choosing actions to obtain reward, and monitoring the potential value of switching to alternative courses of action.

In decision under risk, people choose between lotteries that contain a list of potential outcomes paired with their probabilities of occurrence. We previously developed a method for translating such lotteries to mathematically equivalent “motor lotteries.” The probability of each outcome in a motor lottery is determined by the subject's noise in executing a movement. In this study, we used functional magnetic resonance imaging in humans to compare the neural correlates of monetary outcome and probability in classical lottery tasks in which information about probability was explicitly communicated to the subjects and in mathematically equivalent motor lottery tasks in which probability was implicit in the subjects' own motor noise. We found that activity in the medial prefrontal cortex (mPFC) and the posterior cingulate cortex quantitatively represent the subjective utility of monetary outcome in both tasks. For probability, we found that the mPFC significantly tracked the distortion of such information in both tasks. Specifically, activity in mPFC represents probability information but not the physical properties of the stimuli correlated with this information. Together, the results demonstrate that mPFC represents probability from two distinct forms of decision under risk.

Cooperation among genetically unrelated individuals is a fundamental aspect of society, but it has been a longstanding puzzle in biological and social sciences. Recently, theoretical studies in biology and economics showed that conditional cooperation?cooperating only with those who have exhibited cooperative behavior?can spread over a society. Furthermore, experimental studies in psychology demonstrated that people are actually conditional cooperators. In this study, we used functional magnetic resonance imaging to investigate the neural system underlying conditional cooperation by scanning participants during interaction with cooperative, neutral and non-cooperative opponents in prisoner's dilemma games. The results showed that: (i) participants cooperated more frequently with both cooperative and neutral opponents than with non-cooperative opponents; and (ii) a brain area related to cognitive inhibition of pre-potent responses (right dorsolateral prefrontal cortex) showed greater activation, especially when participants confronted non-cooperative opponents. Consequently, we suggest that cognitive inhibition of the motivation to cooperate with non-cooperators drives the conditional behavior.

How are decision-making strategies altered by hypothetical outcomes resulting from unchosen actions? Abe and Lee find that monkeys adjust their strategies in a rock-paper-scissors task according to both actual and hypothetical outcomes. Neurons in the prefrontal cortex modulated their activity related to actual and hypothetical outcomes differently depending on the animal's choices, thereby encoding choice-outcome conjunctions for both experienced and hypothetical outcomes.

The acquisition of reward and the avoidance of punishment could logically be contingent on either emitting or withholding particular actions. However, the separate pathways in the striatum for go and no-go appear to violate this independence, instead coupling affect and effect. Respect for this interdependence has biased many studies of reward and punishment, so potential action?outcome valence interactions during anticipatory phases remain unexplored. In a functional magnetic resonance imaging study with healthy human volunteers, we manipulated subjects' requirement to emit or withhold an action independent from subsequent receipt of reward or avoidance of punishment. During anticipation, in the striatum and a lateral region within the substantia nigra/ventral tegmental area (SN/VTA), action representations dominated over valence representations. Moreover, we did not observe any representation associated with different state values through accumulation of outcomes, challenging a conventional and dominant association between these areas and state value representations. In contrast, a more medial sector of the SN/VTA responded preferentially to valence, with opposite signs depending on whether action was anticipated to be emitted or withheld. This dominant influence of action requires an enriched notion of opponency between reward and punishment.

Recent work in neuroeconomics has shown that regions in orbitofrontal and medial prefrontal cortex encode the subjective value of different options during choice. However, these electrophysiological and neuroimaging studies cannot demonstrate whether such signals are necessary for value-maximizing choices. Here we used a paradigm developed in experimental economics to empirically measure and quantify violations of utility theory in humans with damage to the ventromedial frontal lobe (VMF). We show that people with such damage are more likely to make choices that violate the generalized axiom of revealed preference, which is the one necessary and sufficient condition for choices to be consistent with value maximization. These results demonstrate that the VMF plays a critical role in value-maximizing choice.

Decisions are often based on a combination of new evidence with prior knowledge of the probable best choice. Optimal combination requires knowledge about the reliability of evidence, but in many realistic situations, this is unknown. Here we propose and test a novel theory: the brain exploits elapsed time during decision formation to combine sensory evidence with prior probability. Elapsed time is useful because (1) decisions that linger tend to arise from less reliable evidence, and (2) the expected accuracy at a given decision time depends on the reliability of the evidence gathered up to that point. These regularities allow the brain to combine prior information with sensory evidence by weighting the latter in accordance with reliability. To test this theory, we manipulated the prior probability of the rewarded choice while subjects performed a reaction-time discrimination of motion direction using a range of stimulus reliabilities that varied from trial to trial. The theory explains the effect of prior probability on choice and reaction time over a wide range of stimulus strengths. We found that prior probability was incorporated into the decision process as a dynamic bias signal that increases as a function of decision time. This bias signal depends on the speed–accuracy setting of human subjects, and it is reflected in the firing rates of neurons in the lateral intraparietal area (LIP) of rhesus monkeys performing this task.

Ryan K. Jessup and John P. O'Doherty
The Journal of Neuroscience, 27 April 2011, 31(17): 6296-6304

Reinforcement learning theory has generated substantial interest in neurobiology, particularly because of the resemblance between phasic dopamine and reward prediction errors. Actor–critic theories have been adapted to account for the functions of the striatum, with parts of the dorsal striatum equated to the actor. Here, we specifically test whether the human dorsal striatum—as predicted by an actor–critic instantiation—is used on a trial-to-trial basis at the time of choice to choose in accordance with reinforcement learning theory, as opposed to a competing strategy: the gambler's fallacy. Using a partial-brain functional magnetic resonance imaging scanning protocol focused on the striatum and other ventral brain areas, we found that the dorsal striatum is more active when choosing consistent with reinforcement learning compared with the competing strategy. Moreover, an overlapping area of dorsal striatum along with the ventral striatum was found to be correlated with reward prediction errors at the time of outcome, as predicted by the actor–critic framework. These findings suggest that the same region of dorsal striatum involved in learning stimulus–response associations may contribute to the control of behavior during choice, thereby using those learned associations. Intriguingly, neither reinforcement learning nor the gambler's fallacy conformed to the optimal choice strategy on the specific decision-making task we used. Thus, the dorsal striatum may contribute to the control of behavior according to reinforcement learning even when the prescriptions of such an algorithm are suboptimal in terms of maximizing future rewards.

The orbitofrontal cortex (OFC) is implicated in a variety of adaptive decision-making processes. Human studies suggest that there is a functional dissociation between medial and lateral OFC (mOFC and lOFC, respectively) subregions when performing certain choice procedures. However, little work has examined the functional consequences of manipulations of OFC subregions on decision making in rodents. In the present experiments, impulsive choice was assessed by evaluating intolerance to delayed, but economically optimal, reward options using a delay-discounting paradigm. Following initial delay-discounting training, rats received bilateral neurotoxic or sham lesions targeting whole OFC (wOFC) or restricted to either mOFC or lOFC subregions. A transient flattening of delay-discounting curves was observed in wOFC-lesioned animals relative to shams—differences that disappeared with further training. Stable, dissociable effects were found when lesions were restricted to OFC subregions; mOFC-lesioned rats showed increased, whereas lOFC-lesioned rats showed decreased, preference for the larger-delayed reward relative to sham-controls—a pattern that remained significant during retraining after all delays were removed. When locations of levers leading to small–immediate versus large–delayed rewards were reversed, wOFC- and lOFC-lesioned rats showed retarded, whereas mOFC-lesioned rats showed accelerated, trajectories for reversal of lever preference. These results provide the first direct evidence for dissociable functional roles of the mOFC and lOFC for impulsive choice in rodents. The findings are consistent with recent human functional imaging studies and suggest that functions of mOFC and lOFC subregions may be evolutionarily conserved and contribute differentially to decision-making processes.

The nucleus accumbens (NAc) is involved in many reward-related behaviors. The NAc has two major components, the core and the shell. These two areas have different inputs and outputs, suggesting that they contribute differentially to goal-directed behaviors. Using a discriminative stimulus (DS) task in rats and inactivating the NAc by blocking excitatory inputs with glutamate antagonists, we dissociated core and shell contributions to task performance. NAc core but not shell inactivation decreased responding to a reward-predictive cue. In contrast, inactivation of either subregion induced a general behavioral disinhibition. This reveals that the NAc actively suppresses actions inappropriate to the DS task. Importantly, selective inactivation of the shell but not core significantly increased responding to the nonrewarded cue. To determine whether the different contributions of the NAc core and shell depend on the information encoded in their constituent neurons, we performed electrophysiological recording in rats performing the DS task. Although there was no firing pattern unique to either core or shell, the reward-predictive cue elicited more frequent and larger magnitude responses in the NAc core than in the shell. Conversely, more NAc shell neurons selectively responded to the nonrewarded stimulus. These quantitative differences might account for the different behavioral patterns that require either core or shell. Neurons with similar firing patterns could also have different effects on behavior due to their distinct projection targets.

2011年4月12日火曜日

How similar are the experiences of social rejection and physical pain? Extant research suggests that a network of brain regions that support the affective but not the sensory components of physical pain underlie both experiences. Here we demonstrate that when rejection is powerfully elicited by having people who recently experienced an unwanted break-up view a photograph of their ex-partner as they think about being rejected areas that support the sensory components of physical pain (secondary somatosensory cortex; dorsal posterior insula) become active. We demonstrate the overlap between social rejection and physical pain in these areas by comparing both conditions in the same individuals using functional MRI. We further demonstrate the specificity of the secondary somatosensory cortex and dorsal posterior insula activity to physical pain by comparing activated locations in our study with a database of over 500 published studies. Activation in these regions was highly diagnostic of physical pain, with positive predictive values up to 88%. These results give new meaning to the idea that rejection “hurts.” They demonstrate that rejection and physical pain are similar not only in that they are both distressing?they share a common somatosensory representation as well.

2011年4月6日水曜日

Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used functional magnetic resonance imaging in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms used. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and blood oxygen level-dependent (BOLD) signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, although most often associated with TD learning, were better explained by the model-based theory. Furthermore, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain.

Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

Self-projection, the capacity to re-experience the personal past and to mentally infer another person's perspective, has been linked to medial prefrontal cortex (mPFC). In particular, ventral mPFC is associated with inferences about one's own self, whereas dorsal mPFC is associated with inferences about another individual. In the present fMRI study, we examined self-projection using a novel camera technology, which employs a sensor and timer to automatically take hundreds of photographs when worn, in order to create dynamic visuospatial cues taken from a first-person perspective. This allowed us to ask participants to self-project into the personal past or into the life of another person. We predicted that self-projection to the personal past would elicit greater activity in ventral mPFC, whereas self-projection of another perspective would rely on dorsal mPFC. There were three main findings supporting this prediction. First, we found that self-projection to the personal past recruited greater ventral mPFC, whereas observing another person's perspective recruited dorsal mPFC. Second, activity in ventral versus dorsal mPFC was sensitive to parametric modulation on each trial by the ability to relive the personal past or to understand another's perspective, respectively. Third, task-related functional connectivity analysis revealed that ventral mPFC contributed to the medial temporal lobe network linked to memory processes, whereas dorsal mPFC contributed to the fronto-parietal network linked to controlled processes. In sum, these results suggest that ventral–dorsal subregions of the anterior midline are functionally dissociable and may differentially contribute to self-projection of self versus other.

The mesostriatal dopamine system is prominently implicated in model-free reinforcement learning, with fMRI BOLD signals in ventral striatum notably covarying with model-free prediction errors. However, latent learning and devaluation studies show that behavior also shows hallmarks of model-based planning, and the interaction between model-based and model-free values, prediction errors, and preferences is underexplored. We designed a multistep decision task in which model-based and model-free influences on human choice behavior could be distinguished. By showing that choices reflected both influences we could then test the purity of the ventral striatal BOLD signal as a model-free report. Contrary to expectations, the signal reflected both model-free and model-based predictions in proportions matching those that best explained choice behavior. These results challenge the notion of a separate model-free learner and suggest a more integrated computational architecture for high-level human decision-making.

In attentional models of learning, associations between actions and subsequent rewards are stronger when outcomes are surprising, regardless of their valence. Despite the behavioral evidence that surprising outcomes drive learning, neural correlates of unsigned reward prediction errors remain elusive. Here we show that in a probabilistic choice task, trial-to-trial variations in preference track outcome surprisingness. Concordant with this behavioral pattern, responses of neurons in macaque (Macaca mulatta) dorsal anterior cingulate cortex (dACC) to both large and small rewards were enhanced when the outcome was surprising. Moreover, when, on some trials, probabilities were hidden, neuronal responses to rewards were reduced, consistent with the idea that the absence of clear expectations diminishes surprise. These patterns are inconsistent with the idea that dACC neurons track signed errors in reward prediction, as dopamine neurons do. Our results also indicate that dACC neurons do not signal conflict. In the context of other studies of dACC function, these results suggest a link between reward-related modulations in dACC activity and attention and motor control processes involved in behavioral adjustment. More speculatively, these data point to a harmonious integration between reward and learning accounts of ACC function on one hand, and attention and cognitive control accounts on the other.

A suboptimal bias toward accepting the status quo option in decision-making is well established behaviorally, but the underlying neural mechanisms are less clear. Behavioral evidence suggests the emotion of regret is higher when errors arise from rejection rather than acceptance of a status quo option. Such asymmetry in the genesis of regret might drive the status quo bias on subsequent decisions, if indeed erroneous status quo rejections have a greater neuronal impact than erroneous status quo acceptances. To test this, we acquired human fMRI data during a difficult perceptual decision task that incorporated a trial-to-trial intrinsic status quo option, with explicit signaling of outcomes (error or correct). Behaviorally, experienced regret was higher after an erroneous status quo rejection compared with acceptance. Anterior insula and medial prefrontal cortex showed increased blood oxygenation level-dependent signal after such status quo rejection errors. In line with our hypothesis, a similar pattern of signal change predicted acceptance of the status quo on a subsequent trial. Thus, our data link a regret-induced status quo bias to error-related activity on the preceding trial.

According to reinforcement learning theory of decision making, reward expectation is computed by integrating past rewards with a fixed timescale. In contrast, we found that a wide range of time constants is available across cortical neurons recorded from monkeys performing a competitive game task. By recognizing that reward modulates neural activity multiplicatively, we found that one or two time constants of reward memory can be extracted for each neuron in prefrontal, cingulate and parietal cortex. These timescales ranged from hundreds of milliseconds to tens of seconds, according to a power law distribution, which is consistent across areas and reproduced by a 'reservoir' neural network model. These neuronal memory timescales were weakly, but significantly, correlated with those of monkey's decisions. Our findings suggest a flexible memory system in which neural subpopulations with distinct sets of long or short memory timescales may be selectively deployed according to the task demands.

Uncertainty about the function of orbitofrontal cortex (OFC) in guiding decision-making may be a result of its medial (mOFC) and lateral (lOFC) divisions having distinct functions. Here we test the hypothesis that the mOFC is more concerned with reward-guided decision making, in contrast with the lOFC's role in reward-guided learning. Macaques performed three-armed bandit tasks and the effects of selective mOFC lesions were contrasted against lOFC lesions. First, we present analyses that make it possible to measure reward-credit assignment--a crucial component of reward-value learning--independently of the decisions animals make. The mOFC lesions do not lead to impairments in reward-credit assignment that are seen after lOFC lesions. Second, we examined how the reward values of choice options were compared. We present three analyses, one of which examines reward-guided decision making independently of reward-value learning. Lesions of the mOFC, but not the lOFC, disrupted reward-guided decision making. Impairments after mOFC lesions were a function of the multiple option contexts in which decisions were made. Contrary to axiomatic assumptions of decision theory, the mOFC-lesioned animals' value comparisons were no longer independent of irrelevant alternatives.

Orbitofrontal cortex (OFC) is widely held to be critical for flexibility in decision-making when established choice values change. OFC's role in such decision making was investigated in macaques performing dynamically changing three-armed bandit tasks. After selective OFC lesions, animals were impaired at discovering the identity of the highest value stimulus following reversals. However, this was not caused either by diminished behavioral flexibility or by insensitivity to reinforcement changes, but instead by paradoxical increases in switching between all stimuli. This pattern of choice behavior could be explained by a causal role for OFC in appropriate contingent learning, the process by which causal responsibility for a particular reward is assigned to a particular choice. After OFC lesions, animals' choice behavior no longer reflected the history of precise conjoint relationships between particular choices and particular rewards. Nonetheless, OFC-lesioned animals could still approximate choice-outcome associations using a recency-weighted history of choices and rewards.

In many cases, learning is thought to be driven by differences between the value of rewards we expect and rewards we actually receive. Yet learning can also occur when the identity of the reward we receive is not as expected, even if its value remains unchanged. Learning from changes in reward identity implies access to an internal model of the environment, from which information about the identity of the expected reward can be derived. As a result, such learning is not easily accounted for by model-free reinforcement learning theories such as temporal difference reinforcement learning (TDRL), which predicate learning on changes in reward value, but not identity. Here, we used unblocking procedures to assess learning driven by value- versus identity-based prediction errors. Rats were trained to associate distinct visual cues with different food quantities and identities. These cues were subsequently presented in compound with novel auditory cues and the reward quantity or identity was selectively changed. Unblocking was assessed by presenting the auditory cues alone in a probe test. Consistent with neural implementations of TDRL models, we found that the ventral striatum was necessary for learning in response to changes in reward value. However, this area, along with orbitofrontal cortex, was also required for learning driven by changes in reward identity. This observation requires that existing models of TDRL in the ventral striatum be modified to include information about the specific features of expected outcomes derived from model-based representations, and that the role of orbitofrontal cortex in these models be clearly delineated.

2011年2月4日金曜日

Reward from a particular action is seldom immediate, and the influence of such delayed outcome on choice decreases with delay. It has been postulated that when faced with immediate and delayed rewards, decision makers choose the option with maximum temporally discounted value. We examined the preference of monkeys for delayed reward in an intertemporal choice task and the neural basis for real-time computation of temporally discounted values in the dorsolateral prefrontal cortex. During this task, the locations of the targets associated with small or large rewards and their corresponding delays were randomly varied. We found that prefrontal neurons often encoded the temporally discounted value of reward expected from a particular option. Furthermore, activity tended to increase with [corrected] discounted values for targets [corrected] presented in the neuron's preferred direction, suggesting that activity related to temporally discounted values in the prefrontal cortex might determine the animal's behavior during intertemporal choice.

A large body of evidence exists on the role of dopamine in reinforcement learning. Less is known about how dopamine shapes the relative impact of positive and negative outcomes to guide value-based choices. We combined administration of the dopamine D2 receptor antagonist amisulpride with functional magnetic resonance imaging in healthy human volunteers. Amisulpride did not affect initial reinforcement learning. However, in a later transfer phase that involved novel choice situations requiring decisions between two symbols based on their previously learned values, amisulpride improved participants' ability to select the better of two highly rewarding options, while it had no effect on choices between two very poor options. During the learning phase, activity in the striatum encoded a reward prediction error. In the transfer phase, in the absence of any outcome, ventromedial prefrontal cortex (vmPFC) continually tracked the learned value of the available options on each trial. Both striatal prediction error coding and tracking of learned value in the vmPFC were predictive of subjects' choice performance in the transfer phase, and both were enhanced under amisulpride. These findings show that dopamine-dependent mechanisms enhance reinforcement learning signals in the striatum and sharpen representations of associative values in prefrontal cortex that are used to guide reinforcement-based decisions.

Successful social interaction depends on not only the ability to identify with others but also the ability to distinguish between aspects of self and others [1-4]. Although there is considerable knowledge of a shared neural substrate between self-action and others' action [5], it remains unknown where and how in the brain the action of others is uniquely represented. Exploring such agent-specific neural codes is important because one's action and intention can differ between individuals [1]. Moreover, the assignment of social agency breaks down in a range of mental disorders [6-8]. Here, using two monkeys monitoring each other's action for adaptive behavioral planning, we show that the medial frontal cortex (MFC) contains a group of neurons that selectively encode others' action. These neurons, observed in both dominant and submissive monkeys, were significantly more prevalent in the dorsomedial convexity region of the MFC including the pre-supplementary motor area than in the cingulate sulcus region of the MFC including the rostral cingulate motor area. Further tests revealed that the difference in neuronal activity was not due to gaze direction or muscular activity. We suggest that the MFC is involved in self-other differentiation in the domain of motor action and provides a fundamental neural signal for social learning.