The mesostriatal dopamine system is prominently implicated in model-free reinforcement learning, with fMRI BOLD signals in ventral striatum notably covarying with model-free prediction errors. However, latent learning and devaluation studies show that behavior also shows hallmarks of model-based planning, and the interaction between model-based and model-free values, prediction errors, and preferences is underexplored. We designed a multistep decision task in which model-based and model-free influences on human choice behavior could be distinguished. By showing that choices reflected both influences we could then test the purity of the ventral striatal BOLD signal as a model-free report. Contrary to expectations, the signal reflected both model-free and model-based predictions in proportions matching those that best explained choice behavior. These results challenge the notion of a separate model-free learner and suggest a more integrated computational architecture for high-level human decision-making.

In attentional models of learning, associations between actions and subsequent rewards are stronger when outcomes are surprising, regardless of their valence. Despite the behavioral evidence that surprising outcomes drive learning, neural correlates of unsigned reward prediction errors remain elusive. Here we show that in a probabilistic choice task, trial-to-trial variations in preference track outcome surprisingness. Concordant with this behavioral pattern, responses of neurons in macaque (Macaca mulatta) dorsal anterior cingulate cortex (dACC) to both large and small rewards were enhanced when the outcome was surprising. Moreover, when, on some trials, probabilities were hidden, neuronal responses to rewards were reduced, consistent with the idea that the absence of clear expectations diminishes surprise. These patterns are inconsistent with the idea that dACC neurons track signed errors in reward prediction, as dopamine neurons do. Our results also indicate that dACC neurons do not signal conflict. In the context of other studies of dACC function, these results suggest a link between reward-related modulations in dACC activity and attention and motor control processes involved in behavioral adjustment. More speculatively, these data point to a harmonious integration between reward and learning accounts of ACC function on one hand, and attention and cognitive control accounts on the other.

A suboptimal bias toward accepting the status quo option in decision-making is well established behaviorally, but the underlying neural mechanisms are less clear. Behavioral evidence suggests the emotion of regret is higher when errors arise from rejection rather than acceptance of a status quo option. Such asymmetry in the genesis of regret might drive the status quo bias on subsequent decisions, if indeed erroneous status quo rejections have a greater neuronal impact than erroneous status quo acceptances. To test this, we acquired human fMRI data during a difficult perceptual decision task that incorporated a trial-to-trial intrinsic status quo option, with explicit signaling of outcomes (error or correct). Behaviorally, experienced regret was higher after an erroneous status quo rejection compared with acceptance. Anterior insula and medial prefrontal cortex showed increased blood oxygenation level-dependent signal after such status quo rejection errors. In line with our hypothesis, a similar pattern of signal change predicted acceptance of the status quo on a subsequent trial. Thus, our data link a regret-induced status quo bias to error-related activity on the preceding trial.