2Interacting Minds Centre, School of Culture and Society, Aarhus University, Aarhus 8000, Denmark, 3Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, United Kingdom, and

Abstract

Expectation of reward can be shaped by the observation of actions and expressions of other people in one's environment. A person's apparent confidence in the likely reward of an action, for instance, makes qualities of their evidence, not observed directly, socially accessible. This strategy is computationally distinguished from associative learning methods that rely on direct observation, by its use of inference from indirect evidence. In twenty-three healthy human subjects, we isolated effects of first-hand experience, other people's choices, and the mediating effect of their confidence, on decision-making and neural correlates of value within ventromedial prefrontal cortex (vmPFC). Value derived from first-hand experience and other people's choices (regardless of confidence) were indiscriminately represented across vmPFC. However, value computed from agent choices weighted by their associated confidence was represented with specificity for ventromedial area 10. This pattern corresponds to shifts of connectivity and overlapping cognitive processes along a posterior-anterior vmPFC axis. Task behavior and self-reported self-reliance for decision-making in other social contexts correlated. The tendency to conform in other social contexts corresponded to increased activation in cortical regions previously shown to respond to social conflict in proportion to subsequent conformity (Campbell-Meiklejohn et al., 2010). The tendency to self-monitor predicted a selectively enhanced response to accordance with others in the right temporoparietal junction (rTPJ). The findings anatomically decompose vmPFC value representations according to computational requirements and provide biological insight into the social transmission of preference and reassurance gained from the confidence of others.

SIGNIFICANCE STATEMENT Decades of research have provided evidence that the ventromedial prefrontal cortex (vmPFC) signals the satisfaction we expect from imminent actions. However, we have a surprisingly modest understanding of the organization of value across this substantial and varied region. This study finds that using cues of the reliability of other peoples' knowledge to enhance expectation of personal success generates value correlates that are anatomically distinct from those concurrently computed from direct, personal experience. This suggests that representation of decision values in vmPFC is suborganized according to the underlying computation, consistent with what we know about the anatomical heterogeneity of the region. These results also provide insight into the observational learning process by which someone else's confidence can sway and reassure our choices.

Introduction

The human brain can use a variety of learning strategies to better its situation. Unifying economic theories suggest that different decision processes converge to a single expectation of satisfaction (or “value”) that will be gained or lost from available actions—correlates of which are found in activity throughout ventromedial prefrontal cortex (vmPFC) (Levy and Glimcher, 2012; Bartra et al., 2013) and guide decisions toward options of greater value. Whether this activity represents a single computation of value, a collection of distinct computations, or both is not clear. However, there are both computational and anatomical reasons to suspect subregional specialization.

Reward predictions from different types of information are computed differently. For instance, one can learn about an action's value by maintaining a running average of rewards received from performing it. However, some knowledge is only socially and/or inferentially accessible; if one observes someone else perform an action, then one can recruit inferential strategies for a judgment of that action's value. For instance, people provide signals of high confidence when they have good knowledge of the likely outcomes to their actions (Patel et al., 2012), so confident actions of others should have greater influence on our own appraisals (Thomas and Mcfadyen, 1995). Therefore, if the other person appears to know what she is doing (and assuming her intentions are similar to mine), one can use a rule of imitate-if-appears-confident or, more flexibly, one can infer what she knows from her actions and confidence and then combine this with knowledge inferred from other cues and personal experience. These and similar strategies, unlike directly sampling outcomes, may require steps of indirection and integration of multiple cues (e.g., of others' preference and confidence), but also enable us to compute value in the absence of first-hand experience with prior choice outcomes. We developed a task that provides different types of information for evaluating the value of the same choice: first-hand knowledge, choices of others, and their confidence. We then compared contributions of each information source to decision making and neural representations of value across vmPFC.

We considered that different computations of value could make use of shifting connectivity, cytoarchitecture, and overlapping cognition toward the anterior of vmPFC (Kringelbach and Rolls, 2004). Cytoarchitecturally, laminar density, and granular layer IV volume increase along a posterior to anterior axis (Ongür et al., 2003; Mackey and Petrides, 2010) indicative of increasing interarea connectivity (Barbas, 2007). Local connectivity required for integrating value signals in situ dominates posterior vmPFC. In contrast, anterior regions maintain a balance of local and distant connectivity that can additionally recruit higher-order input (Sepulcre et al., 2010). Correspondingly, moving anterior and dorsal from classical representations of value in area 25 (through areas labeled as area 14 m (Mackey, 2010) or 10 m and 10 r; Ongür et al., 2003) to medial area 10, known overlapping processes become more abstract, integrative, and inferential (Ramnani and Owen, 2004; Amodio and Frith, 2006; Burgess et al., 2007; Sescousse et al., 2013). Representations of value that require more inferential or integrative processes may map to distinct regions across this cortical landscape.

An anatomical distinction between value from first-hand knowledge and value from agents' confidence-weighted influence would exemplify the mapping of value to cortex as determined by its computational requirements. It would also provide the foundation for investigation of the neural mechanism by which the supporting confidence of others guides our actions. Based on changes of connectivity and overlapping function, value from confidence-weighted influence was predicted to recruit anterior regions of vmPFC preferentially while direct, personal sampling and the main effect of agents' choices (regardless of confidence) was not.

We linked individual differences of influence on task performance to sensitivity of neural responses to corresponding task stimuli. Finally, to establish the relevance of our task and findings to social environments, we tested the relationship between observed effects (neural and behavioral) and influence of others on participant behavior (conformity and self-monitoring) outside of the laboratoryoratory.

Materials and Methods

Participants

Participants were right-handed and had no history of brain injury or mental disorder. Each gave informed consent. Inclusion criteria for fMRI analysis included at least some use of each information source during the behavioral task to enable contrasts between value-based activation derived from different sources (exclusions were indicated by near-perfect prediction of choice outcomes by only one or two influences, a simple rule-based strategy that precluded quantitative examination of behavior by leading to extremely large and unstable estimates of regression coefficients) (five participants were excluded). Head movements were required to be consistently smaller than a single voxel (2 mm) (1 participant excluded). Twenty-three participants (13 male, age: M 26 SD 3.6) met all requirements. Each received 110 Danish Krioner (kr) (20 USD) for time spent participating and 60 kr (10 USD) for winnings on tasks (all subjects were paid this same amount). The study was approved by Central Denmark Region ethics (No. 29718).

Task

The urn task (Fig. 1) examines how participants make decisions when provided with different sources of information. It asks participants to infer the color of the next marble to be drawn (red or green) for each of a series of 160 mixed urns, with opportunities to do so by inferring other people's information about the likely outcome. Participants had the following information about each urn: a random sample of eight marbles, four predictions described as being made by past participants (four “agents”) based on their own samples from the same urns, and the confidence expressed by those agents. All samples were said to be replaced before the final marble was drawn.

Urn task. On each trial, a new urn of randomly mixed marbles is presented. Contents are hidden. Animation of five hands individually reaching for five different samples from the urn occurs (the first two parts last 1.5 to 2 s). Predictions of the next marble drawn from the urn and their associated confidence in those predictions of four agents are shown. Agents were represented as sepia-toned faces with neutral expression. Agents' predictions were expressed by the color of the circle positioned next to their faces. Confidence indicated as the expression within that circle and the speed at which the answer was shown (rapid + smug = high confidence). Agents' answers took between 300 and 2500 ms to appear. Once agents' choices were in, they remained on the screen for 1250 s. Next, the participant's own sample of marbles appeared from the bottom of the screen was displayed with all other information for a further 1250 ms. The sample contained eight marbles, with one to seven of them being red. This was the event modeled in the fMRI analysis. Finally, the participant was asked to make a prediction (red or green). No choice feedback was provided.

Agents' faces were presented with neutral expressions. To ensure that any effects were due to the use of agents' confidence, rather than its inference, we communicated confidence with an explicit cue described to subjects during training: the smug (confident cue) or perplexed (unconfident) animated smiley that appeared within the indicated choice (red or green marble) (Fig. 1). Agents' choices were also said to be presented in the time taken by the agent to make them. Confident response times were between 325 and 1000 ms. Unconfident response times were between 1800 and 2600 ms. Once all agents' responses were made, participants had 1300 ms to observe them before their own sample appeared. All information was available for 1250 ms before a choice could be made, cued by presentation of choice options.

The participant never saw agent samples and agents were said never to see the participant's sample. Agent choices were seen before personal samples because they were assumed to take longer to process. There was no time limit for participant decisions once the cue for a choice was presented. After a choice, confirmation of the choice was shown briefly and followed by a 1 s intertrial interval. The task took 23 min to complete, on average.

After instructions for the task, using visual displays, participants were quizzed by the experimenter on the meaning of each display item and asked what might be in the urn given various combinations of information. Instructions were repeated if necessary. It was critical before progression that participants understood that samples reflected “randomly mixed” urn contents (the urn shook on arrival for emphasis) and were replaced. No trial outcome was provided, so participants could not learn to associate outcomes with colors or agents. They were required to make the best use of the information available and told that a random urn would be selected and sampled to see if the subject was correct, with a potential reward of 30 Danish Kroner (DKK).

The experimenters actually programmed agents' choices, agents' confidences, and agents' deliberation times in a way that fully counterbalanced the factors of participants' samples, agents' choices, and the confidences of agents favoring either color. Agents were drawn with replacement from a set of 30 images from the Radboud face dataset in sepia tone (Langner et al., 2010). The use of fictional agents' responses was necessary to achieve sufficient statistical power and ensure that each subject experienced each possible combination of agents' choice, agents' confidence, and personal sample. This was explained to participants during debriefing.

The task was designed to study how participants decide across a range of combinations of sample, agents' choice, and agents' confidence. Accordingly, trials were generated by creating every possible combination, across trials, of: the number of agents favoring red (vs green), confidence for agents favoring red, confidence of agents favoring green, and participants' samples—all varying independently of one another. Therefore, there was no real-world optimal strategy (in the sense that the observations were not actually produced by simulating draws from an urn whose contents could then be predicted) and each participant was paid the maximum reward at the end of the study.

Confidence expressed by any agent was the same as confidence expressed by any other agent that predicted the same color. At the risk of this aspect appearing odd to participants, it enabled us to test our predictions with a clear choice × confidence factorial design that identified independent contributions of each information source to decisions and neural activity. It also kept the task length manageable for an fMRI study. During debriefing, no participant expressed doubt in the authenticity of the agents' responses.

There were 160 trials in the scanned task. For each number of agents choosing red (0–4), there were 32 trials. Sets of 32 were made up of 7 sample distributions (1–7 reds in the participant sample; samples with 4 reds were presented twice as often as the other combinations) evenly distributed over each available combination of agents' confidence. When 1–3 agents chose red, there were 4 confident/unconfident combinations. When 0 or 4 agents chose red, there were 2 options (high and low confidence in the agents' chosen color).

Procedure

Before scanning, outside of the scanner, the participants guessed the next marble from a series of 28 urns and indicated confidence in these predictions based only on a sample of eight marbles from each (no agents' choices). Participants were told that their own predictions during this task would be used for the urn task of future participants. This provided participants with the agents' perspective. Participants were also given a 30 DKK “prize” for this task. Next, after receiving instructions, the participant practiced for 20 trials before beginning the task in the scanner. Finally, after scanning, the participant filled in questionnaires measuring self-monitoring and the tendency to conform outside of the laboratory before being debriefed.

Behavior analysis

We performed a factorial, mixed-effects logistic regression, which we refer to as the “component value behavioral model,” to analyze participant choices (dependent variable: per trial choice of red vs green) as a function of a number of candidate explanatory factors and their interactions. Choices were analyzed in R (version 3.2.3, R development Core Team RRID:SCR_001905) using the lme4 package (version 1.1.12). The model allows for the measurement of influence of multiple sources of information on the probability of choosing red. This includes added value of a confident agent's choice over an unconfident one. The following predictors of participant choice of red (described below) were added to the regression: S: proportion of marbles in participant's sample that were red; O: proportion of agents choosing red; CR: confidence of agents choosing red (−0.5 low, 0.5 high); CG: confidence of agents choosing green (−0.5 low, 0.5 high); ORCR: proportion of agents choosing red × their confidence; and OGCG: proportion of agents choosing green × their confidence

Each predictor was scaled to a range for 0–1 and centered around its mean for each subject. Common scaling allows for the estimated coefficients to be comparable across predictors, with each expressed in units of change in log odds of choice corresponding to a change from the minimal to the maximal value. Centering is recommended (Cohen, 2003) because it allows main effects to be interpreted as effects when all other effect variables are at their mean without affecting interaction terms. Interaction terms were calculated from the product of the scaled and centered predictors. CR and CG had no meaning when no agents chose that color and so these variables were set to their mean (i.e., 0) in these instances, giving them no statistical effect.

The rationale and interpretation of these variables is as follows. S and O are two independent sources of information about the urn in the design; our major question is whether the confidence CR and CG modulates the effect of O. S codes the relative proportion of red (vs green) marbles in the agent's own sample and a positive regression coefficient for it captures an increasing tendency to choose red when more red marbles are observed. Because the fraction of green marbles is equal to 1 − S, a model with predictors for both effects (and an intercept) would be rank deficient; however, after mean centering, S codes the difference in red versus green marbles sampled (with a positive value indicating a preponderance of red and negative values the opposite), capturing both effects symmetrically. The same point holds for O. The effect of O captures an increasing tendency to choose red when the relative proportion of agents choosing red (minus those choosing green) is larger. After mean centering, O = −OG (proportion choosing green). Therefore, the interactions ORCR and OGCG capture the difference of this effect for red and green choices (respectively) between confident and unconfident agents. The main effects of confidence, CR and CG, capture the baseline effect of confident votes when the votes are evenly split and control for any overall change in choice tendency when red or green votes are confident independently of the number of votes.

Each effect contained both a fixed and a random effect term; that is, a mean slope at the group level, an error term allowing that slope to vary from subject to subject, and a full covariance matrix among the random slopes. We verified that all of these effects should be included in several ways. First, the significant tests for all regression coefficients (save the intercept) in Table 1 reject the null hypotheses corresponding to the nested models with any one effect being zero. We also present Akaike information criterion (AIC) comparisons for the full model against a set of submodels, each omitting one effect. Conversely, we also tested for progressive improvement in model fit to behavior (by AIC and a likelihood ratio test, χ2p < 0.05; Vazquez et al., 2010) as effects were added incrementally (in the order: intercept, S, O, CR, and CG, ORCR, and OGCG. Each significantly improved the model fit to behavior (detailed results not reported). An additional interaction term between S and O did not improve the fit.

Fixed effects from mixed effects logistic regression of each variable's effect on choice of red

Component value behavioral model

The equation for the component value behavioral model is as follows:
where P is the probability choosing red for the ith trial for subject j, β represents regression coefficients including random effects for all coefficients and ϵ is error.

Individual difference measures

Task behavior

To measure individual differences of information influence, we measured the effect of each information source on the likelihood of each participant's decisions. Using the fixed and random effects of the “component value behavioral model,” we calculated the fit (via negative log likelihood) of this model to choices of individual participants. We then calculated the fit of four other models (using the same fitted parameters as the full model): one without S, one without O, one without both CR or CG, and one without both ORCR and OGCG. The effect of each model reduction on behavior likelihood (smaller model − full model) represented the impact of the subtracted variable(s) for that participant: S effect, O effect, C effect, and OC effect, respectively (Hampton et al., 2008). The intent of this procedure was to measure the effect of each variable in each subject, but using units of differential log likelihood, analogous to variance explained, rather than estimated regression coefficients. This is because the scaling of the latter can be erratic from subject to subject. These effects were entered as covariates in a separate group-level fMRI analysis and tested for relation to behavior outside of the laboratory.

Choices in other contexts

This study was agnostic to whether findings are unique to social environments, but it was imperative to test the relevance of the results to social behavior outside of the laboratory given their social context. To test the relevance of task behavior, we used a self-report “conformity scale” (Mehrabian and Stefl, 1995). The conformity scale assesses weight placed on other people's choices relative to one's own information in a variety of social contexts. It requires participants to rate agreement or disagreement (on a scale from −4 to 4) with 11 statements referring to reliance on others for decisions. Higher scores mean greater reliance on others. The scale does not assess the use of the others' confidence, so to test its relationship to task behavior, we created a simplified model that contained only S and O (removing confidence from the model). Then, similar to above, we calculated the effects of S and O on model fit. We then tested the correlation between these effects (and difference between them) and conformity scale scores. Self-monitoring (Snyder, 1974) was also assessed measuring the tendency to adapt behavior in response to cues from an audience, but no effect was hypothesized given that there was no audience in this task.

Neuroimaging

fMRI procedure

Participants were instructed and given time to practice until the task was understood and responses could be made in <4 s. The task was presented with Presentation version 12 software (Neurobehavioral Systems RRID:SCR_002521). Scanning took place at the Danish Neuroscience Centre, Aarhus, Denmark on a 3 tesla Siemens Trio Scanner fitted with a 32-channel head coil. The urn task displays were back-projected and observed via a mirror and responses were collected from the right hand using a fiber-optic button box.

First-level fMRI analysis

All image analysis was performed with tools of FMRIB's Software Library (FSL RRID:SCR_002823) version 5.0.6 (Smith et al., 2004). Preparation of the EPI data used FSL defaults (FEAT 6.0). Volumes acquired during significant (>1 mm) head movements were replaced with neighboring volumes and events with modeled responses occurring during these acquisitions were removed from all models. Independent component analysis was used to visually identify and remove remaining artifacts in the data using MELODIC (Beckmann and Smith, 2004). General linear models were fit in prewhitened data space for each individual participant. Regressors and temporal derivatives were convolved with the default FSL hemodynamic response function (gamma function, delay: 6 s, SD: 3 s), and filtered by the same high-pass filter as the data. Single-participant results were transformed using nonlinear deformation algorithms into standard space (Montreal Neurological Institute, MNI152).

Component value fMRI model

In the “component value behavioral model” of choices, the dependent variable was choice of red (vs green) and variables were constructed as they related to the evidence in favor of choosing either color. In the fMRI analysis, the dependent measures were neural activity during these choices. However, rather than covarying with the tendency to choose options on the basis of a dimension such as color, BOLD activity is widely reported to vary along the dimension of the chosen versus unchosen option, with larger responses in medial PFC when the chosen (or about to be chosen) option is more likely to be correct, carries more reward, or is otherwise more strongly preferred (Tanaka et al., 2004; Daw et al., 2006; Kim et al., 2006; O'Doherty et al., 2007; Behrens et al., 2008; Wunderlich et al., 2010). Therefore, to test for analogous neural effects as on the behavior, we modeled the BOLD data using precisely the same set of explanatory variables as the behavioral analysis, but expressed with respect to evidence supporting the chosen versus unchosen options on each trial rather than the red versus green options. So when red is chosen, SA and OA (for evidence in “accordance” with the choice) are defined in terms of red samples and choices. When green is chosen, they are defined in terms of green samples and choices. In keeping with the literature on neural correlates of decision variables, we refer to activity correlated with these variables as reflecting their influence as components of “chosen value,” meaning that the overall supporting evidence that the subject's choice was correct. However, we stress that, given the symmetry of our task (where choices are mutually exclusive and evidence supporting red opposes green), such activity can be understood as reflecting relative (chosen minus unchosen) value, as indeed has been reported for medial PFC (Boorman et al., 2013).

The set of variables in the fMRI analysis track those from the behavioral analysis, but coded in terms of accordance of an information source with the subject's choice. Each variable had a parametric weight of accordance that varied trial by trial. As in the behavioral model, variables were scaled to a range of 0:1 and mean centered (orthogonal to a constant term). Like the behavioral model, interaction regressors were calculated by multiplying respective main effect regressors after scaling and mean centering.

“A” and “D” subscripts represent accordance or discordance, respectively, of the information source with the choice. Just as in the behavioral analysis, after mean centering, SA codes the relative proportion of accordant (minus discordant) marbles and OA codes the relative proportion of accordant (minus discordant) other agents' choices; therefore, SD (= −SA) and OD (= −OA) are redundant. Also as with behavior, CA and CD are mutually independent and their interactions with OA (and OD = −OA) capture the difference of effect on BOLD from other agents' accordant (vs discordant) choices when those agents are confident compared with when they are unconfident. Also as with behavior, CA and CD had no meaning (and were not presented) when no agents chose the color, so these variables were set to the mean (i.e., 0) in these instances, having no statistical effect.

Integrated value fMRI model

Preceding the full model fit described above, which aimed to decompose the influence of different sources of information on value-related BOLD correlates, we wished to verify the presence of activity related to overall value. To define this, we extracted the likelihood assigned by the component value behavioral model to each chosen option of each subject and took this choice probability as our estimate of the integrated chosen value. Such probabilities reflect the transform of the weighted sum of evidences, through the logistic softmax so as to range from 0 to 1, which provides a well normalized summary of the overall evidence in favor of a red versus green choice, or chosen minus unchosen value, which has been shown to track medial PFC activity (Daw and Doya, 2006).

Accordingly, the fitted model to each subject was as follows:

Yi = β0 + β1Si + β2Oi + β3CR i + β4CG i + β5ORCR i + β6OGCG i + ϵ for each ith trial, where β represents fitted mixed- and random-effect parameters and ϵ is error. If the participant chooses red, then the choice likelihood is . If the participant chooses green, then the choice likelihood is 1 − . This vector of likelihoods was entered, along with a constant, to predict the neural correlates of integrated value (see Fig. 3A).

Timing

The modeled fMRI event, for all regressors, was a 1250 ms period when agents' choices and personal samples were concurrently available before a choice could be made. During this period, both vmPFC sample-driven activation and agent-driven activation were at peak levels (see Fig. 4). A temporal derivative of each event was added to the model to control for slight errors of fit to the hemodynamic response. The interval between modeled events varied as determined by participant reaction time (M 1.2 s, SD 1.6 s) and variable time taken for choices of agents to appear (range 0.32–2.7 s). Eighty percent of participant reaction times were under 1.7 s and 90% under 3 s. This resulted in a positively skewed distribution of periods between modeled events (M 8.6 s, SD 1.7 s).

Spatial linear parametric analysis

To characterize anatomical patterns of activation associated with estimated value from SA (first-hand experience of sample), OA (agents' choices alone), and OACA (interaction effect of agents' confidence and agents' choices), we generated five anatomically defined spheres of 10 mm diameter linearly traversing an anatomical axis (Nicolle et al., 2012; Sul et al., 2015). This axis spanned the superior medial gyrus, dorsal to the gyrus rectus. It began in area 10 m/14 m (Ongür et al., 2003; Mackey and Petrides, 2010), at MNI coordinates [0 mm, 32 mm, −16 mm], within a region defined by meta-analysis as associated with choice value (Levy and Glimcher, 2012; Bartra et al., 2013). It ended to the frontal pole at the approximate center of ventromedial area 10 (MNI coordinates 0 mm, 64 mm, 8 mm), a region associated with abstract reasoning about mental states (Amodio and Frith, 2006; see Fig. 4). Mean percentage signal change was extracted from each sphere for each variable. Percentage signal change in response to S, O, and OC variables, as well as the contrasts of OC − S and OC − O (to test how the difference of responses to different information change along this axis) were examined by mixed-effect linear regressions (lme4 with Kenward–Roger approximation of dof).

Fixed impulse response sets were then fitted to each regressor in the component value fMRI model. These sets spanned a period of 19.6 s divided into 7 time bins, each the length of a TR (2.8 s). This period began at the point that agents' responses began to appear on the screen. Mean parameter estimates for each of SA, OA and OACA were extracted in each sphere and plotted in Figure 4. To investigate the nature of the interaction between of agents' choice and confidence, we did the same for a second GLM that included separate regressors for confident and unconfident agents supporting the participant's choice. Remaining regressors of the component value fMRI model were included as covariates (SA, CD, and ODCD). We plotted the time courses of the effects of confident agents and unconfident agents in Figure 5.

Individual differences (group level)

Individuals vary on their reliance on others for decision making and this is a reasonably stable trait (McGuire, 1967). We harnessed these differences to test relationships between behavioral influence of specific types of information, their effects on neural correlates of value, and social influence on choice in other contexts.

Neural effects × behavior.

The four measured influences on individual participant behavior (S effect, O effect, C effect, and OC effect) were added, mean-centered, as between-participant covariates in a new group-level component value fMRI model analysis. We asked whether the behavioral O effect, S effect, their difference, and the OC effect predicted the sensitivity of the brain to these influences, highlighted by fMRI effects of OA, SA, and OACA. This analysis used small volume correction within our a priori anatomical region of interest, the vmPFC. This mask was created as the conjunction of all Harvard–Oxford Atlas regions falling within vmPFC with a range along the x-axis of −18 mm to +18 mm and maximum height level of 18 mm to cover, with some room for error, regions anatomically specified as vmPFC in prior studies (Mackey and Petrides, 2010).

Conformity and self-monitoring.

To test the relationship of neural activity during the task to real-world behavior, we tested whether sensitivity to different sources of information during task execution (SA and OA) related to self-reported self-reliance for decision making (conformity scale) and self-monitoring behavior by adding these as covariates in a separate group-level analysis of the component value fMRI model. Age and gender were also added as covariates.

vmPFC effects were insensitive to individual differences, so we explored outside of this region. For the conformity comparison, we used small volume correction within regions identified to respond more to social conflict in proportion to individual differences of subsequent conformity in an independent study (Campbell-Meiklejohn et al., 2010). Social conflict being a driver of conformity in the brain (Wu et al., 2016), it was predicted that those rating high on self-reported conformity would experience greater neural responses to conflict with agents.

Although self-monitoring would not affect behavior in our task (with no live audience), individuals high in self-monitoring behavior might react differently to social cues. With no a priori region of interest for this contrast, we used a whole-brain cluster-corrected analysis.

Results

While scanned with fMRI, participants repeatedly predicted the color of marbles to be drawn from an urn by balancing several sources of evidence: their own sample of marbles from the urn, predictions described as made by four other agents based on their private samples from the same urn, and those agents' confidences. We sought to measure subjects' reliance on the subject's sample versus agents' choices, particularly to examine the extent to which the influence of the agents' choices was modulated by their confidence. The latter is a key measure in this setting for distinguishing behavior from simpler heuristics such as imitation. This is because an agent's confidence implies the quality of the agent's knowledge informing the choice and therefore the extent to which their opinion should be trusted. Indeed, a differential effect of confident agents can be captured in a Bayesian ideal-observer model (simulations not shown), which infers the proportion of marbles in the urn from the different sources of evidence by inferring and marginalizing over the agents' private samples, with confidence as a signal of more decisive evidence.

Behavior

We used a multilevel mixed logistic regression to estimate parameters reflecting how the tendency to predict red was influenced by each information source that should affect it monotonically.

Each variable was established as a distinct contribution to choice behavior (Fig. 2, Table 1). At the group level, participants were more likely to predict red after samples with more red (vs green) marbles (S), more agents choosing red (vs green) (O), higher confidence of agents associated with red (CR) (regardless of how many agents chose that color), and lower confidence for green (CG). The magnitude of the coefficients for S and O indicates that the four agents (at average confidence) were relied on approximately the same amount as the sample of eight marbles, meaning that each of the agents' choices had an impact approximately twice that of observing a single marble in terms of information on this task. Table 1 additionally presents the different variable effects in terms of AIC and variance explained according to McFadden's pseudo R2.

Behavioral effects. Probability (mean proportion) of choices for red as a function of: reds in sample (A), frequency and confidence of red choices by agents (B), and frequency and confidence of green choices by agents (C). Error bars indicate SE.

Our key question concerned the interaction that measures whether agents' choices have a differential effect when they are confident. Here, as hypothesized, we found that the effect of each agent's choice on participant decisions was greater when that agent was confident (ORCR, OGCG).

Conformity and self-monitoring scales

We next considered whether individual differences in this relatively stylized laboratory task tracked an index of real-world social behavior. Greater reliance on personal samples during the urn task corresponded to greater self-reliance for decision making in other social contexts (r = −0.461, p = 0.013, mean conformity scale score −3.7 SD 11.4; see Fig. 7A). Controlling for gender and age, the effect remained significant (r = −0.37, p = 0.049). There was no relationship between the conformity scale and the effect of agents on behavior (p > 0.4), most likely due to the complication of agent confidence. In contrast, self-monitoring relates specifically to behavioral adjustments to social cues to enhance reputation and would not be expected to show an effect in the absence of a live audience and this was confirmed by the data.

Neuroimaging

We pursued a parallel strategy in the brain to distinguish influence of each information type on neural correlates of estimated value in vmPFC. Because vmPFC activity is well known to track the degree to which the chosen option is correct or likely to be rewarded (“chosen value,” often relative to the unchosen option; Tanaka et al., 2004; Daw et al., 2006; Kim et al., 2006; O'Doherty et al., 2007; Behrens et al., 2008; Wunderlich et al., 2010), we redefined all of our explanatory variables in terms of information indicating the chosen (vs unchosen) option will be correct, rather than red versus green. Therefore, neural results are described with respect to “accordance” (evidence indicating that the subject's chosen option will be correct) versus “discordance” (evidence indicating that the subject's choice will be incorrect). Note that these are symmetric: evidence that the subjects' choice is correct is equivalently evidence that the other is incorrect.

Neuroimaging findings are reported as group-level cluster-corrected statistics with a cluster-forming voxel threshold of Z > 3 and cluster significance level of p < 0.05. Results are described in format: [peak location in MNI coordinates (mm)], Z score of peak voxel, p-value of cluster, size of cluster in voxels (and minimum cluster size at p < 0.05). Scaling of variables should be considered when interpreting the results (i.e., a change from 0 to 4 agents' choices is scaled the same as a change from 0 to 8 marbles in a sample). All observed effects relate to when all other variables are at their mean. Mean effect contrasts all endured whole-brain cluster correction. Only activations within the mPFC are reported because this was the focus of our study. Between-subject analysis used small volume correction using independent masks of regions of interest (detailed in the Materials and Methods).

As with behavior, the key question concerned activity corresponding to the differential impact of agents' choices when they are confident versus unconfident, captured by the interaction (OACA). Such an interaction was observed for BOLD activity only in anterior mPFC (amPFC), primarily occupying ventromedial area 10, with a small extension through anterior 14 m (Mackey and Petrides, 2010; peaks [8 58 6] and [4 62 2]], Zmax = 3.78, 85 voxels, min 79). This was the only activation for this contrast in the brain. Within this region, unconfident agent choices did little to influence the representation of value, whereas confident choices had a clear effect (see Fig. 5).

Effect time courses across vmPFC. A, Anatomically defined spherical regions of interest spanning the superior medial gyrus from area 14 m (Mackey and Petrides, 2010) to ventromedial area 10. B, Plots of mean effects of interest within the component value fMRI model across 5 time bins are 2.8 s (1 TR) beginning at the onset of agents' responses. Figure shows relative nonspecificity of SA and OA across the region, the increasing specificity of OACA toward dorsoanterior regions, and the sustained response to agents' choices OA into the response window of SA. Error bars indicate SE.

Effects of confident and unconfident agents across vmPFC. Mean effect of an adapted component value fMRI model that separates effects of confident agents' choices from unconfident agents' choices across the five spheres and time bins of Figure 4 are shown. Figure shows the increasing specificity of socially learned value that is contingent on agents' confidence toward dorsoanterior regions of vmPFC. Error bars indicate SE.

Individual differences

Neural effects × behavior

We next tested whether component neural correlates predict component influence on decision-making behavior using a between-subject analysis. At the group level, the effects of different information sources on choices, as estimated from behavior for each individual, were added as covariates to the component value fMRI model. Contrasts were made between them. Within a mask of vmPFC and at a conservative cluster forming threshold (Z > 3), we found that weight placed on personal samples (S effect) predicted a greater neural response to personal sample accordance with choice (SA) ([−6 54 −4], Zmax = 3.83, p = 0.03, 32 voxels, min 28). Dorsal and anterior to this, relatively more weight placed on the choices of agents (O effect − S effect) predicted a greater neural response to agents (OA) ([−4 60 10], Zmax = 4.02, p = 0.05, 28 voxels, min 28; Fig. 6).

Individual differences of task behavior. Effects of information sources on choices, for each individual, were added as covariates to the component value fMRI model analysis. vmPFC responses to SA are predicted by the influence of samples on choice behavior. amPFC responses to OA are predicted by the relative influence of agents (O effect − S effect) on choice behavior.

Social influence in other situations. A, Scatterplot showing relationship between the use of one's own sample during the urn task (arbitrary units) and self-reported reliance on agents outside of the laboratory. B, Between subjects, those more likely to conform in other social situations respond more to conflict with agents and less conflict with personal samples within regions shown previously to predict conformity from social conflict responses (Campbell-Meiklejohn et al., 2010). C, Between subjects, rTPJ responds more to accordance of agents' choices in proportion to tendency to adapt behavior to social cues in real-world situations (self-monitoring).

Discussion

This study identified distinct behavioral and neural effects of distinct types of evidence used for making predictions. Behaviorally, each information type influenced decisions in a sensible direction. Confident agents had a greater influence on choices of participants, the hypothesized signature of complex, and integrative use of information to evaluate options.

Collectively, increased reward expectancy from all evidence was tracked indiscriminately across a posterior to anterior axis in vmPFC, confirming earlier findings (Kim et al., 2006; Daw et al., 2006; Behrens et al., 2008; Wunderlich et al., 2010). However, this signal could also be decomposed according to the contribution of distinct components. Increased reward expectancy due to accordant samples and accordant agents' choices (regardless of confidence) increased activity across vmPFC. In contrast, value that varied with accordant agents' choices but conditional on their confidence was represented preferentially in ventromedial regions of area 10. This is a distinct neurobiological marker of assurance from another person's confidence.

Segregation of value representations over regions of systematically varying cytoarchitecture is consistent with heterogeneity of their computation. The anatomical locations of confidence-based value processes are likely the resolution of computational requirements and connectivity of supporting anatomy. The cytoarchitecture and connectivity of amPFC suggests that value estimates from confident agents specifically involve a form of higher-order cognition (Ongür et al., 2003; Mackey and Petrides, 2010; Sepulcre et al., 2010). Although influences of agents in the task can be modeled (as in our regression) simply as the counting of agents' choices weighted by their confidence, the weighting itself is a signature of the variable's metacognitive treatment.

The nature of this treatment and how it relates to anterior vmPFC is not yet known. One possibility is the use of inference. The urn task can be solved by using statistical inference to infer the contents of the urn given the observed evidence. This requires marginalizing agents' private samples, the quality of which is inferred from an interaction of agents' choice and confidence. In principle, Bayes' rule can be invoked to infer (and then marginalize out) the probable state of the agents' samples. This could involve inferences of agents' knowledge from their behavior which would account for BA 10 involvement (Frith and Frith, 2012). Such computations are likely part of a broader class of inferential influences on choice (Tolman, 1948; Hampton et al., 2006, 2008; Daw et al., 2011; Solway and Botvinick, 2012). In addition, BA 10 involvement could relate the confluence of cognitive processes (Ramnani and Owen, 2004; Zaki, 2013) such as the integration of observed and nonobservable information (Burgess et al., 2007).

Both inferential and integrative processes are useful because they flexibly allow a naive observer to make decisions in new environments. In social contexts, they allow for adaptable valuation using shifting combinations of inferred knowledge, intentions, impulsivity, and optimism of others before deciding how to use their choices to inform one's own. From this perspective, our findings support a theory that the evolution of area 10 could relate to cognitive specialization that optimizes decision making in human cultures with the complexity of human expression (Povinelli and Preuss, 1995; Dunbar and Shultz, 2007).

It less likely that differences between neural correlates of value relate to a difference of mathematic heuristics: these regions do not come up in fMRI contrasts of counting methods, addition, or multiplication (Piazza et al., 2002; Kawashima et al., 2004). Similarly, whereas previous studies have examined associative learning in social contexts (Behrens et al., 2008; Hampton et al., 2008; Burke et al., 2010), associative learning was precluded in the present study by the omission of trial outcomes and infrequent repetition of agents.

Previously, however, neuroscience has highlighted the relationship between mPFC and associative learning about the reliability of others. For example, Behrens et al. (2008) showed that mPFC is recruited to update beliefs about the accuracy of advice. Meshi et al. (2012) and Boorman et al. (2013) found that mPFC is related to using and evaluating expertise. It will be interesting to explore how inferential and associative learning about the reliability of others relate.

Nicolle et al. (2012) found that the ventral–dorsal vmPFC axis delineates action-relevant from action-irrelevant preferences. Subsequent work has shown that individual differences of value representation along a (ventral–dorsal) vmPFC axis also distinguishes self- and other-regarding individuals (Sul et al., 2015). In the present study, we found that ventral area 10 does represent action-relevant preferences, but depends on the computations required. It may be the abstract nature of the calculation (counterfactual choice in the previous study, abstractly inferred or integrated information in the present) that determines amPFC involvement. Indeed, specificity across the posterior to anterior vmPFC axis may relate to similar anatomical distinctions between primary and secondary rewards (McNamee et al., 2013; Sescousse et al., 2013; Clithero and Rangel, 2014; Li et al., 2015).

dmPFC activity was negatively correlated with estimated value from various sources of information. As in most decision tasks, the value of the action and uncertainty/conflict associated with that value are inversely correlated, though not perfectly coupled. This is because, for example, it is more difficult to choose the correct option for choices with conflicting information. Given the literature on correlates of different decision variables in midline prefrontal cortex, the activity that we observed in anterior cingulate may reflect a form of conflict (Botvinick et al., 2004) or a cost–benefit process that accounts for both conflict and reduced likelihood of reward (Rushworth et al., 2011).

If a participant was more likely to be influenced by the personal sample, BOLD activity in central vmPFC varied more with the accordance of the sample with their imminent choice. Similarly, if a participant was more likely to be influenced by the choices of agents (relative to the personal sample) BOLD activity in amPFC varied more with accordance of agents' choices with their imminent choice. This suggests that the tendency to be influenced by an information source can be tracked, to an extent, by the sensitivity of that individual to supporting information from that source, within specific mPFC anatomy.

Social behavior outside of the laboratory (i.e., conformity) was inversely correlated with the influence of private evidence during the task. In vmPFC, conformity and self-monitoring in other contexts did predict activations. However, exploration outside of this region revealed a link between neural responses to task influences and the tendency to be socially influenced in other contexts. The tendency to adopt the decisions of others outside of the laboratory predicted increased supramarginal gyrus responses to conflict with agents, just anterior to rTPJ (Fig. 7B). It also predicted increased dmPFC activity when going with one's own sample. These results replicate findings that the tendency to conform socially can be predicted by the neural response to social conflict in these regions (Campbell-Meiklejohn et al., 2010; for meta-analysis, see Wu et al., 2016). Activity within rTPJ that correlates with reward expectancy from observing agent choices (Fig. 7C) in high-self-monitors may relate to findings that reward expectancies from observing choices of others can recruit theory-of-mind-like processes (Bruguier et al., 2010; De Martino et al., 2013). This occurs in the absence of a live audience, suggesting that self-monitoring relates to cognitive processes that are somewhat independent of an action's immediate social consequences. Stimulation of the rTPJ region enhances the tendency to take another's perspective (Santiesteban et al., 2012) and its activity has been shown previously to increase when determining the relevance of someone else's behavior for one's decision making (Carter et al., 2012). Although social interactions in the task were simulated and stylized, their relevance to real-world social settings is supported by these findings.

Conclusions

The finding that decision-related signals in vmPFC are segmented by the unique cognitive requisites of their computation is an important step in our understanding of the representation of value in the brain. Concurrently, our findings provide new neurobiological insight into the transmission of value information between individuals and the mechanism by which confidence expressed by others assures or discourages us in our decisions. Looking to the future, the findings present new questions as to how distinct valuation processes with separate neural mechanisms can be altered independently by experience, damage, and treatment.

Notes

Supplemental material, (i) a Bayesian inference framework for integrating personal information, choices of others and their confidence for decision-making with neural correlates of value, and (ii) whole brain activation tables are available at: https://dx.doi.org/10.6084/m9.figshare.4290776. Statistical parametric maps are also deposited with www.neurovault.org. This material has not been peer reviewed.

Footnotes

This work was funded by the Danish Council for Independent Research: Medical Sciences (Sapere Aude Grant to D.C.-M.) and the Lundbeck Foundation (D.C.-M. and C.F.). N.D. was supported by Scholar Award from the James S. McDonnell Foundation.

The authors declare no competing financial interests.

Correspondence should be addressed to Dr. Daniel Campbell-Meiklejohn,
School of Psychology, Pevensey Building, University of Sussex, Falmer BN1 9QH, United Kingdom.daniel.cm{at}sussex.ac.uk