Abstract

Dorsomedial prefrontal cortex (dmPFC), dorsolateral prefrontal cortex (dlPFC), and inferior frontal gyrus (IFG) have all been implicated in resolving decision conflict whether this conflict is generated by having to select between responses of similar value or by making selections following a reversal in reinforcement contingencies. However, work distinguishing their individual functional contributions remains preliminary. We used functional magnetic resonance imaging to delineate the functional role of these systems with regard to both forms of decision conflict. Within dmPFC and dlPFC, blood oxygen level-dependent responses increased in response to decision conflict regardless of whether the conflict occurred in the context of a reduction in the difference in relative value between objects, or an error following a reversal of reinforcement contingencies. Conjunction analysis confirmed that overlapping regions of dmPFC and dlPFC were activated by both forms of decision conflict. Unlike these regions, however, activity in IFG was not modulated by reductions in the relative value of available options. Moreover, although all three regions of prefrontal cortex showed enhanced activity to reversal errors, only dmPFC and dlPFC were also modulated by the magnitude of value change during the reversal. These data are interpreted with reference to models of dmPFC, dlPFC, and IFG functioning.

Introduction

Optimal decision-making requires selecting the response that yields the greatest value. In many situations, however, specific responses do not elicit constant levels of reward. Instead, the reward associated with a given response may fluctuate over time and across contexts, leading to changes in the level of decision conflict. Decision conflict is defined as the degree of competition between responses initiated by the stimulus (i.e., the relative extent to which a particular response is primed by a given stimulus). Core regions of prefrontal cortex implicated in this form of flexible decision making include dorsomedial prefrontal cortex (dmPFC), dorsolateral prefrontal cortex (dlPFC), and inferior frontal gyrus (IFG) (Ernst et al., 2004; Rogers et al., 2004). Considerable data suggest that at least two of these regions, dmPFC and dlPFC, are involved in resolving conflict in Stroop-like paradigms (Carter et al., 1999; Botvinick et al., 2004). In addition, dmPFC, dlPFC, and IFG have each been implicated in resolving decision conflict whether this conflict is generated by having to select between responses of similar value (Blair et al., 2006; Pochon et al., 2008) or by making selections following a reversal in reinforcement contingencies (Cools et al., 2002; O'Doherty et al., 2003; Remijnse et al., 2005). However, selecting between two options of similar value and reversal learning potentially embody different forms of decision conflict. For example, whereas the former example involves conflict generated by reward differential, the latter would involve overruling a previously learned response. These key differences raise the possibility that regions of prefrontal cortex make distinct contributions to resolving decision conflict. This appears particularly likely given accounts stressing functional specialization within dissociable regions of frontal cortex. Thus, it has been argued that dmPFC is implicated in response conflict detection (cf. Carter et al., 1999; Botvinick et al., 2004), error detection (Holroyd et al., 2004), or action-reinforcement learning (Rushworth et al., 2007). In contrast, dlPFC is implicated in maintaining stimulus information against interference from competing nontarget stimuli (Casey et al., 2001), selecting context-appropriate representations (Liu et al., 2006; Hester et al., 2007), or classifying representations with respect to a criterion (Han et al., 2009). Last, IFG is implicated in the selection of appropriate motor responses (Rushworth et al., 2005; Budhani et al., 2007), processing punishment information (O'Doherty et al., 2001), or the inhibition of prepotent responses (Casey et al., 2001). However, the extent of overlap between neural regions activated by decision conflict during dynamic changes in relative reward value and decision conflict generated by a reversal of value remains unclear.

The current study tests two contrasting hypotheses regarding the functional contribution of dmPFC, dlPFC, and IFG in resolving decision conflict. The first suggests that all three regions are implicated in both forms of decision conflict with dmPFC detecting conflict (cf. Carter et al., 1999; Botvinick et al., 2004), dlPFC enhancing attention to relevant stimulus features (MacDonald et al., 2000), and IFG selecting an appropriate motor response (Rushworth et al., 2005; Budhani et al., 2007). This hypothesis implies significant conjunction of activity across these regions for decision conflict whether it is encountered by changes in reward differential or by reversal learning. A second possibility suggests greater functional specialization with recruitment of dmPFC and dlPFC during both forms of decision conflict, but recruitment of IFG only when a suboptimal response must change. We tested these contrasting hypotheses using a novel instrumental learning task that included both forms of decision conflict: (1) conflict generated by dynamic changes in reward differentials associated with available choices over time (either increasing or decreasing reward differentials); and (2) conflict generated by reversing the reward contingencies.

Materials and Methods

Participants.

Twenty-two subjects participated in the study. Three subjects were excluded due to technical difficulties (user-interface, scanner, or computer failure) and their data were not analyzed, leaving 19 participants in total (10 female and 9 male) aged 21–50 years (mean = 29.6; SD = 8.6). Three subjects did not show evidence of reversal learning (performance was 2 SDs or more below the mean) and were excluded from the functional magnetic resonance imaging (fMRI) analysis leaving 16 participants (8 female, age range 21–50, mean 29.19, SD 8.53). All participants underwent a medical exam performed by a physician, were free of psychotropic medication, and were screened with the Structured Clinical Interview for DSM-IV (First et al., 1997) to exclude those with a history of psychiatric or neurological disorder. Before proceeding to the fMRI scanner, all participants completed an abbreviated practice version of the task consisting of 48 trials to ensure that they understood the objectives of the task and were proficient in their responding.

Experimental task.

We developed a novel object discrimination task in which participants made operant responses for positive reinforcement (token dollar amounts). The objective of the task was to maximize monetary gains while response options underwent graded changes in reinforcement. On each trial, subjects selected one object (fractal image) within a pair displayed against a white background. Subjects were told that the values associated with the objects would be changing throughout the task and that it may be necessary to alter their responding at any time. Following each selection, subjects received reinforcement (e.g., “you win $55”). The reward differed depending on their accuracy and the preassigned value of the selected image. Each trial lasted 3000 ms and involved the presentation of a choice screen depicting the two objects (1750 ms), a feedback display (1000 ms), and a fixation cross (250 ms). In addition, 32 fixation trials (also of 3000 ms duration) were presented per run to serve as a baseline. Subjects responded during the choice screen by making left/right button presses on keypads held in both hands. Within each pair, object positions were counterbalanced so that they appeared equally on the left and right side of the screen. The task was programmed in E-Studio (Psychology Software Tools, 2002).

In the first three blocks, we manipulated the reward differential between four pairs of objects across three blocks (each object pair was presented eight times per block). For two of the pairs, the decreasing reward differential pairs, the reward differential between the correct and incorrect object was initially high (e.g., $95 for correct responses vs $5 for incorrect responses). The correct response then steadily decreased in value from block 1 to block 3 (block 1: $95 vs $5; block 2: $75 vs $25; block 3: $55 vs $45); see Figure 1. For the other two pairs, the increasing reward differential pairs, the reward differential between the correct and incorrect object was initially low (e.g., $55 for correct responses vs $45 for incorrect responses). The correct response then steadily increased across blocks 2 and 3 (block 1: $55 vs $45; block 2: $75 vs $25; block 3: $95 vs $5).

An example of the stimuli and reinforcement contingencies used to determine the impact of varying reward differential on BOLD responding (blocks 1–3). In a given run, four distinct object pairs were used. Two object pairs corresponded to the decreasing reward differential condition (left); the reward differential between these objects steadily declined across the blocks. Two object pairs correspond to the increasing reward differential condition (right); the reward differential for these objects steadily increases across the blocks. In the fourth row, sample stimuli and reinforcement contingencies used to determine the impact of varying the reward differential of reversals on the BOLD response are depicted (block 4). In a given run, two stimuli underwent a reversal in contingencies (shown in the fourth row of columns 1 and 3), and two objects retained their original values (shown in the fourth row of columns 2 and 4). Of the two stimuli that reversed contingencies, one involved a low-differential reversal (fourth row, column 1), and the second underwent a high-differential reversal (fourth row, column 3). Of the two control pairs that retained their original values, one involved a low reward differential between objects within a pair (fourth row, column 2), and the other involved a high reward differential (fourth row, column 4).

Low and high-differential reversals (block 4).

In the fourth block, two of the four object pairs reversed reward values. One of the pairs, the low-differential reversal pair, involved the reversal of two objects of similar values (e.g., the object previously valued at $55 became the “incorrect” object valued at $45; conversely, the object previously valued at $45 became the “correct” object valued at $55). The other reversed pair, the high-differential reversal pair, involved the reversal of two objects with dissimilar values (e.g., the object previously valued at $95 became the “incorrect” object valued at $5, and vice versa). Each reversal pair had a corresponding control pair that retained the same value across blocks 3 and 4. There were therefore a total of four conditions in block 4: low-differential reversal, low-differential control, high-differential reversal, and high-differential control.

Subjects completed four 8 min runs of the task. Each run involved new stimuli with unique reward values to ensure that participants learned new reinforcement values and contingencies each time. As a consequence, participants received a total of 64 trials for each of the variable reward differential conditions (blocks 1–3), and 32 trials for each of the differential reversal conditions (block 4).

fMRI analysis.

Data were analyzed within the framework of the general linear model using the Analysis of Functional Neuroimages program (AFNI) (Cox, 1996). Motion correction was performed by registering all volumes in the EPI dataset to a volume collected shortly before the high-resolution anatomical dataset was acquired. EPI datasets were spatially smoothed (isotropic 6 mm Gaussian kernel) and converted into percentage signal change from the mean to reduce the effect of anatomical variability among the individual maps in generating group maps. The time series data were normalized by dividing the signal intensity of a voxel at each time point by the mean signal intensity of that voxel for each run, and multiplying the result by 100. Resultant regression coefficients represented a percentage signal change from the mean. Regressors depicting each of the trial types were created by convolving the train of stimulus events with a γ-variate hemodynamic response function to account for the slow hemodynamic response. The hemodynamic response function was modeled across the trial. To control for voxelwise correlated drifting, a baseline plus linear drift and quadratic trend were modeled in each voxel's time series. Voxelwise group analyses involved transforming single-subject β coefficients into the standard coordinate space of Talairach and Tournoux, followed by a statistical analysis of the functional data. Separate analyses were conducted for each of the two experimental phases as described below.

The second regressor model concentrated on the BOLD response associated with correct responding after varying the reward differential of reversals (block 4). This low- and high-differential reversal analysis involved four separate regressors depicted in Figure 1 (block 4): (1) low-differential reversal (pair 1); (2) low-differential control (nonreversal, pair 2); (3) high-differential reversal (pair 3); and (4) high-differential control (nonreversal, pair 4). Using a similar strategy as that used in previous studies (O'Doherty et al., 2001; Budhani et al., 2007; Mitchell et al., 2008), we contrasted reversal errors with all correct control condition responses to identify regions involved in reversal learning. The percentage signal change relative to the mean within regions of interest (ROIs) was then examined across conditions to determine whether the BOLD response within these regions varied according to the magnitude of change.

Last, a conjunction analysis was conducted to determine the extent to which neural regions sensitive to our reward differential manipulation overlapped with those neural regions that showed a differential BOLD response to reversal errors. We created a mask of the voxels that were active during each of our statistical maps of interest ([reversal errors vs correct control] and [reward differential × block interaction]) using a common threshold value for each map (p < 0.005). Using the 3dCalc function (http://afni.nimh.nih.gov/sscc/gangc/ConjAna.html) in AFNI (Cox, 1996), we were able to identify regions that were activated by voxels that were modulated as a function of reward differentials and reversal errors either singularly or collectively.

Behavioral results from the experimental task reveal a significant reward differential × block interaction for error rates (y-axis = proportion correct). Across blocks 2 and 3, error rates decreased as the reward differential between the two options increased. Conversely, error rates increased when the reward differentials between the two pairs decreased (p < 0.01). An ANOVA conducted on the reversal learning phase of the task revealed a significant effect of reward differential; more errors were committed to low relative to high reward differentials. Participants also made significantly more errors to high-differential reversal pairs relative to high-differential control pairs (p < 0.005). Error bars represent the SEM.

Impact of reward differential on reversal

Our second ANOVA examined the reversal of high relative to low reward differential pairs (block 4). A 2 (reward differential: low or high) × 2 (reversal: reversing or nonreversing) ANOVA was conducted on the error data. This revealed a significant main effect of reward differential; subjects made significantly more errors to the low (block 4: $45 vs $55) relative to the high reward differential (block 4: $5 vs $95) pairs (F(1,15) = 18.70; p < 0.005). The main effect of reversal was not significant (F(1,15) = 0.19; ns). However, there was a significant reward differential × reversal interaction (F(1,15) = 16.84; p < 0.005). Subjects made significantly more errors to high-differential reversal stimuli than to high-differential control pairs (those stimuli that had not changed value) (t = 3.84; p < 0.005). In contrast, there was no significant difference between low-differential reversal and low-differential control condition selections (t = 1.65; p > 0.10). Finally, subjects made significantly more errors in the low- versus high-differential control condition (t = 5.71; p < 0.001), but not for the low- versus high-differential reversals (t = 1.22; p > 0.20).

Both right (top) and left (bottom) IFG showed significantly enhanced activity to reversal errors relative to correct responses (p < 0.05, corrected). However, unlike dorsal regions of prefrontal cortex, activity in IFG was not significantly modulated by the size of the change in reinforcement. Error bars represent the SEM.

Because our low- and high-differential reversals also differed in reward magnitude, the observed differences in percentage signal change to high versus low-differential reversals may reflect choice properties specific to reward magnitude independent of whether a reversal in contingencies had occurred. To control for this potential confound, we compared percentage signal change within our ROIs during low- and high-differential reversals relative to their respective control conditions (those trials that were of equal value but had not undergone a reversal in contingencies). Significantly greater signal change was observed to high-differential reversals versus high-differential control conditions in both dmPFC (t = 2.52; p < 0.05) and left dlPFC (t = 2.02; p = 0.06). In contrast, signal in left or right IFG did not distinguish between high-differential reversals and high-differential control trials during correct selections (t < 1; ns). These results are consistent with the hypothesis that the BOLD response within dmPFC and dlPFC was modulated by the size of the differential reversal during correct responding rather than by the magnitude of reinforcement alone.

Conjunction analysis

Our analysis showed that there was a differential BOLD response in dmPFC and dlPFC for both reward differential and magnitude of reward reversal. To test the hypothesis that similar regions of dmPFC and dlPFC were involved in decision-making conflict generated by both reward differential and reversal learning, we conducted a conjunction analysis. Shown in Figure 6, the conjunction of [reversal errors vs correct control] and [reward differential × block interaction] (p < 0.005) revealed common areas of activity in both dmPFC (1242 mm3) and dlPFC (270 mm3).

The results of a conjunction analysis (p < 0.005) demonstrate that overlapping areas of dlPFC (top left) and dmPFC (bottom right) exist that respond to decision conflict whether it is conflict generated by low reward differentials or conflict encountered during reversal learning.

Discussion

The goal of this study was to delineate the function of dmPFC, dlPFC, and IFG in the context of two distinct sources of decision conflict: conflict generated through decreasing reward differentials between available choices and conflict generated by reversing reinforcement contingencies. We observed that activity in dmPFC and dlPFC, but not IFG, was greater when participants selected between two options that were similar in value (had a low reward differential) relative to options that were dissimilar in value (had a high reward differential). Notably, this effect of reward differential was not restricted to the initial learning of stimulus–response associations; activity in this region changed to reflect the updated levels of decision conflict across trials. Specifically, activity in dmPFC and dlPFC decreased when the reward differential between two options got larger, and increased when the reward differential between two options got smaller. Reversal errors were also associated with activity in dmPFC and dlPFC. Moreover, greater activity in dmPFC and dlPFC was observed to correct reversals in responding when the stimuli underwent larger relative to smaller changes in value. Importantly, a conjunction analysis revealed that the areas of dmPFC and dlPFC activated by decision conflict generated by diminishing reward differentials are highly overlapping with those areas activated by reversal errors. In contrast, while IFG showed clear activation to the decision conflict induced by reversal errors, there was no indication of IFG activity to decision conflict generated by smaller reward differentials. Even the region of IFG identified as responding to reversal errors showed no significant response to decision conflict generated by smaller reward differentials. Finally, IFG did not distinguish between reversals in reward that involved larger relative to smaller changes in value. In summary, whereas dmPFC and dlPFC were responsive to both forms of decision conflict examined, IFG activity was enhanced only when a suboptimal response had been made and an alternative selection was warranted.

According to influential conflict monitoring accounts, dmPFC signals instances of response conflict, triggering compensatory adjustments in cognitive control via activity in dlPFC (Carter et al., 1998, 1999; Botvinick et al., 2004). Subsequent studies have supported the view that cognitive control is achieved by amplifying cortical responses to task-relevant representations (Egner and Hirsch, 2005). There have been suggestions that the functional neuroanatomy of conflict resolution may differ depending on whether the source of conflict is generated by “stimulus-based conflict” or “response-based conflict” (Liu et al., 2006; Egner et al., 2007). While this may be correct, the current data, particularly the conjunction analysis, indicate that the resolution of conflict generated by choosing between options of similar value and the resolution of conflict generated by reversing reinforcement contingencies involve overlapping areas of dmPFC and dlPFC. Moreover, it is notable that the regions of dmPFC and dlPFC are proximal to those identified in previous decision conflict studies (Blair et al., 2006; Marsh et al., 2007; Pochon et al., 2008) as well as those identified in previous work with tasks such as the Stroop (Botvinick et al., 2004).

There are three features of the current results with respect to dmPFC and dlPFC that are worthy of note. First, in line with previous work (Blair et al., 2006; Pochon et al., 2008), both dmPFC and dlPFC showed increased activity when choosing between response options that were similar in value (had a small reward differential) relative to options that were dissimilar in value (had a large reward differential). However, strikingly, activity within dmPFC and dlPFC was modulated by this reward differential even when the objects to be chosen between remained the same. For objects whose reward differential increased across blocks 2 and 3, activity in dmPFC and dlPFC decreased. However, for objects whose reward differential decreased across blocks 2 and 3, activity in dmPFC and dlPFC increased. This was true in both cases despite the fact that the correct response had not changed. We interpret these data in terms of response conflict resolution (cf. Carter et al., 1999; Botvinick et al., 2004); as the differentials decreased, there is increased competition between the two response tendencies (respond toward pattern 1 vs pattern 2) because the reward value differentiating these response tendencies is reduced. While dmPFC appears implicated in action-reinforcement learning (Kennerley et al., 2006; Rushworth et al., 2007), we believe the current data support its potentially additional role in decision conflict resolution. Activity in these regions was greatest in the first block, perhaps reflecting action-reinforcement learning. However, activity in these regions was also modulated by reward differential in blocks 2–3 when the correct response had not changed. Activity in dmPFC and dlPFC regions increased for stimuli whose reward differentials increased across blocks 2 and 3, and decreased for stimuli whose reward differentials increased across blocks 2 and 3). We believe our data are compatible with the view that dmPFC's role in conflict resolution leads to the recruitment of dlPFC during decision making, such that attention to relevant stimulus features is enhanced as a function of the level of conflict (cf. Liu et al., 2006; Hester et al., 2007).

Second, in block 2, it is notable that although the reward differential between both increasing and decreasing pairs was equal, there was significantly greater activity to the increasing relative to the decreasing pairs. This finding is consistent with suggestions that dmPFC plays a key role in integrating information about actions and outcomes over time to guide responding (Kennerley et al., 2006; Walton et al., 2007). However, data are also compatible with the idea that superior learning associated with the decreasing (high reward differential) pairs in block 1 meant less response conflict for these pairs relative to the increasing (low reward differential) pairs in block 2 (cf. Carter et al., 1999; Botvinick et al., 2004).

Third, while subjects were no more likely to make errors for large reward differential reversals relative to small reward differential reversals, both dmPFC and dlPFC showed greater activity for larger differential reversals. This pattern of activity may reflect the higher volatility associated with a greater discrepancy between past and current reward values in the larger versus smaller differential reversals (Rushworth and Behrens, 2008). In this situation, there should be rapid behavioral adjustment on the basis of the reversed reward outcomes, particularly for the high-differential reversals where the cost of error is particularly high. During the reversal component of the task, we believe that there is a conflict between the previous optimal response tendency (“respond toward pattern 1”) and the new optimal response tendency (“respond toward pattern 2”). Of course, it is notable that dmPFC has been implicated in responding to errors (Holroyd et al., 2004). Following previous literature, we identified these regions of dmPFC and dlPFC with the contrast reversal errors versus correct responses (O'Doherty et al., 2001; Budhani et al., 2007; Mitchell et al., 2008). However, activity was significantly greater within both regions for correct responses to high-differential reversal pairs relative to low value reversal pairs. Furthermore, whereas these two conditions did not differ significantly in error rate, they do differ in terms of response conflict: the high-differential reversal involves reversing the tendency to “respond toward the previously high pattern 1” and instead “respond toward the previously low pattern 2.”

The current results have implications for attempts to distinguish the functional contribution to decision making of IFG relative to dmPFC and dlPFC. At present, there are two prevailing views about how these regions interact to resolve decision conflict. The first suggests that whereas dmPFC is involved in conflict detection (cf. Carter et al., 1999; Botvinick et al., 2004) and dlPFC in resolving conflict through attentional means (MacDonald et al., 2000), IFG is involved in selecting an appropriate motor response (Rushworth et al., 2005; Budhani et al., 2007). This hypothesis implies significant conjunction of activity across these regions for decision conflict whether it is generated by reward differentials or reversals of reward. A second possibility suggests greater functional specialization with recruitment of dmPFC and dlPFC to both forms of decision conflict, but recruitment of IFG only when a suboptimal response must change. Our results support the latter suggestion; whereas activity in overlapping areas of dmPFC and dlPFC reflected conflict generated by reward differentials or a reversal in contingencies, IFG activity was enhanced only when a suboptimal response had been made and a change must take place on a subsequent trial. To our knowledge, this is the first neuroimaging study to dissociate the functional contribution to reversal learning of IFG from that of dlPFC and dmPFC.

In the current study expected values were tied to specific stimuli rather than specific motor responses (either left or right button presses could be correct depending on the spatial location of stimuli). As a consequence, we cannot rule out the possibility that IFG may be sensitive to the reinforcement differential between specific actions. In short, if we had manipulated the reward differential between two actions with respect to a single stimulus (e.g., the left vs right button) rather than the reward differential between two stimuli, we might have seen modulation of IFG rather than dlPFC. Future work that varies the reward value associated with specific actions rather than stimuli (cf. Tanaka et al., 2008; Gläscher et al., 2009) is needed to address this unresolved issue. However, the current results do suggest at least in the context of the form of object discrimination paradigm used here, the production of a suboptimal response is necessary for IFG activity. Moreover, it is important to note that while IFG showed greater activity to reversal errors relative to correct responses, activity in this region, unlike dmPFC and dlPFC, was not influenced by whether the reversal involved a high or low differential, a manipulation that should be related to response competition.

Conclusions

In this study we demonstrated the complementary yet dissociable roles that dmPFC, dlPFC, and IFG play with respect to decision conflict. dmPFC and dlPFC were both responsive to increases in decision conflict regardless of whether it was decision conflict engendered by choosing between two objects of similar value, or conflict encountered during reversal learning. Conjunction analysis showed that overlapping areas within these regions were activated in each form of decision conflict. Moreover, whereas dmPFC and dlPFC were responsive to conflict as a function of the current and past stimulus-response values, IFG showed enhanced responding only when a suboptimal response had been made. In contrast, we found no evidence that activity in IFG was modulated by either current or past reward differentials. These results provide additional insight into the distinct functional contributions made by dorsal versus ventral regions of prefrontal cortex to decision making.

Footnotes

This research was conducted at the National Institutes of Health in Bethesda, Maryland, and was supported by the Intramural Research Program of the National Institutes of Health–National Institute of Mental Health and by a grant to D.G.V.M. from the Natural Sciences and Engineering Research Council of Canada.

Correspondence should be addressed to Dr. Derek Mitchell,
Departments of Psychiatry and Anatomy & Cell Biology, Schulich School of Medicine & Dentistry, The University of Western Ontario, 339 Windermere Road, London, ON N6A 5A5, Canada.dmitch8{at}uwo.ca