Login using

You can login by using one of your existing accounts.

We will be provided with an authorization token (please note: passwords are not shared with us) and will sync your accounts for you. This means that you will not need to remember your user name and password in the future and you will be able to login with the account you choose to sync, with the click of a button.

3Department of Neurology, School of Medicine, Boston University, Boston, MA, United States

The balance and smooth shift between flexible, goal-directed behaviors and repetitive, habitual actions are critical to optimal performance of behavioral tasks. The striatum plays an essential role in control of goal-directed versus habitual behaviors through a rich interplay of the numerous neurotransmitters and neuromodulators to modify the input, processing and output functions of the striatum. The adenosine receptors (namely A2AR and A1R), with their high expression pattern in the striatum and abilities to interact and integrate dopamine, glutamate and cannabinoid signals in the striatum, may represent novel therapeutic targets for modulating instrumental behavior. In this study, we examined the effects of pharmacological blockade of the A2ARs and A1Rs on goal-directed versus habitual behaviors in different information processing phases of instrumental learning using a satiety-based instrumental behavior procedure. We found that A2AR antagonist acts at the coding, consolidation and expression phases of instrumental learning to modulate animals’ sensitivity to goal-directed valuation without modifying action-outcome contingency. However, pharmacological blockade and genetic knockout of A1Rs did not affect acquisition or sensitivity to goal-valuation of instrumental behavior. These findings provide pharmacological evidence for a potential therapeutic strategy to control abnormal instrumental behaviors associated with drug addiction and obsessive-compulsive disorder by targeting the A2AR.

Introduction

Goal-directed and habitual behaviors are crucial adaptive behaviors for our daily life. Goal-directed behavior evaluates actions prospectively and can flexibly adjust action depending on environmental changes, but this comes at the cost of more cognitive resource. By contrast, habitual behavior is usually developed after repeated overtraining for days and represents automatic responses elicited by external or internal triggers during the performance of routine procedures with less cognitive loads (Dolan and Dayan, 2013). These two behavioral processes can develop in parallel or sequentially and can also reciprocally compete with each other for behavioral control (Yin and Knowlton, 2006; Balleine and O’Doherty, 2010; Kim and Hikosaka, 2015). The balance between flexible goal-directed actions and repetitive habitual behaviors has an essential role in achieving optimal performance of behavioral task. Dysregulation of goal-directed versus habitual behaviors is considered to be a potential mechanism underlying the relapse of drug addiction (Ostlund and Balleine, 2008), obsessive compulsive disorder (Gillan et al., 2011; Robbins et al., 2012; Burguiere et al., 2015), and may contribute to the executive dysfunction in Parkinson’s (Redgrave et al., 2010; de Wit et al., 2011) and Huntington’s disease patients (Lawrence et al., 1998).

The striatum plays an essential role in control of goal-directed versus habitual behaviors (Yin and Knowlton, 2006; Graybiel and Grafton, 2015; Kim and Hikosaka, 2015). The dorsal medial striatum (DMS)-connecting orbitofrontal cortex (OFC) is critical for goal-directed valuation (Gremel and Costa, 2013), while the dorsal lateral striatum (DLS) and its connecting infralimbic cortex act as dual operators for habitual behavioral control (Smith and Graybiel, 2013a,b). Additionally, the accumbens nucleus (NAc)-ventral Pallidum (VP) pathway is necessary for goal-directed valuation as inactivation of NAc-VP pathway impairs the predictive learning (Leung and Balleine, 2013). Furthermore, the nigro-striatal dopamine signaling acts as a prediction error and motivational signal to drive instrumental learning (Glimcher, 2011; Rossi et al., 2013; Steinberg et al., 2013). Thus, the striatum acts as a key locus in integrating the cortico-striatal glutamate and the substantia nigra-striatal dopamine signals to control goal-directed and habitual behaviors.

The striatal control of instrumental behaviors is accomplished through a rich interplay of the numerous neurotransmitters and neuromodulators to modify the input, processing and output functions of the striatum (Lovinger, 2010). Several studies have documented the involvement of the D2 receptor (Kwak et al., 2014), cannabinoid receptor type 1 (CB1R) (Hilario et al., 2007) and 5-hydroxytryptamine 6 (5-HT6) receptor (Eskenazi et al., 2015) in control of instrumental behavior. However, pharmacological control of instrumental behaviors is under-explored and the effective pharmacological strategies for the control of goal-directed versus habitual behaviors are lacking. Adenosine A1 and A2A receptors are highly expressed in the striatum and are increasingly recognized as important pharmacological targets for controlling cognition under normal and disease conditions (Chen et al., 2013; Chen, 2014). The Gs-coupled facilitating A2A receptor (A2AR) and Gi-coupled inhibitory A1 receptor (A1R) both integrate dopamine (Shen W. et al., 2008), glutamate (Kreitzer and Malenka, 2007), and BNDF (Tebano et al., 2008; Wei et al., 2014) signaling to modulate synaptic plasticity and control cognition. For example, using our newly developed chimeric rhodopsin-A2AR proteins (optoA2AR), we recently demonstrated that transient activation of A2AR by light in a time-locked manner with reward delivery is sufficient to impair goal-directed behavior whereas focal knockdown of A2AR in the striatum enhances goal-directed behaviors (Yu et al., 2009; Li et al., 2016). Similarly, pharmacological blockade of A2AR promoted goal-directed seeking for ethanol in ENT1 knockout mice (Nam et al., 2013b) and restored goal-directed sensitivity to negative feedback in the methamphetamine (METH)-paired context (Furlong et al., 2017). These pharmacological, genetic, and optogenetic demonstrations of the cognitive “brake” mechanism of A2AR activation led us to propose that pharmacological blockade of the A2AR represents a promising therapeutic target for controlling goal-directed behaviors.

As the first step in developing an adenosine receptor-based pharmacological approach to control the goal-directed versus habitual behaviors, we coupled the A2AR antagonist (KW6002) and A1R antagonist (DPCPX) with the satiety-based instrumental learning paradigm to address the effect of pharmacological blockade of the A2AR and A1R on three aspects of instrumental learning processes: (i) behavioral elements of instrumental behaviors (i.e., acquisition of action-outcome contingency versus goal-evaluation) by acquisition of instrumental behavior, the devaluation test and the omission test; (ii) the instrumental learning processes by administering the A2AR antagonist either prior to the training (learning/encoding) or post-training (consolidation) during the random interval (RI) schedule, or immediately before the devaluation and omission tests (expression/retrieval of instrumental behaviors); (iii) the potential role of the A1 receptor in control of instrumental learning.

Materials and Methods

Animals

Animals were handled in accordance with the protocols approved by the Institutional Ethics Committee for Animal Use in Research and Education at Wenzhou Medical University, China. C57BL/6 male mice at least 8 weeks old (23–27 g each) were used in the experiments. The A1R knockout mice (A1R-/-=+/+) and wild-type littermate controls (A1RC=C) have been well characterized previously (Johansson et al., 2001) and confirmed by PCR analysis of gene identification before the experiment. Mice were housed in an ambient temperature of 22 ± 0.5°C and a relative humidity of 60 ± 2% with a 12 h light/dark cycle. Mice were single-housed and underwent experiments in the light cycle.

Satiety-Based Instrumental Training and Testing

All instrumental learning experiments were performed in standard operant chambers (Med Associates). Each chamber was equipped with a retractable lever on either side of a pump with a syringe that delivered liquid reward (20% sucrose solution, 20 μl/reinforce which can be suspended from the syringe) and a house light (3 W, 24 V) mounted on the opposite side of the chamber. Training and testing procedures were performed following Rossi et al (Rossi and Yin, 2012) and illustrated in Figure 1A. In brief, mice were first given one 30-min magazine training session during which the sucrose solution was delivered on a random time 60 s schedule with the lever removed. Three days of continuous reinforcement (CRF) training sessions were followed to sufficiently establish the initial association between lever press and reward. At the start of the session, the house light was illuminated, and one lever was inserted into the chamber. The house light remained illuminated and the lever remained inserted and active during the entire session. During CRF session, each lever press resulted in the delivery of one drop of 20 μl 20% sucrose solution. Sessions ended after 60 min or when 50 rewards had been earned, whichever came first. After CRF, mice underwent RI schedule which was critical for habitual learning. They were trained 2 days on RI 30 s, with a 0.1 probability of reward availability every 3 s contingent upon lever pressing, followed by 4 days on the 60 s interval schedules (0.1 probability of reward availability every 6 s contingent upon lever pressing). Just as CRF training, RI sessions ended after 60 min or when 50 rewards had been earned, whichever came first. To further confirm goal-directed behavioral pattern, we also employed random ratio (RR) training paradigm as control which contributed to goal-directed behavior. Progressively leaner schedules of reinforcement were used: CRF for 3 days, then RR 5 for 2 days (RR5; each response was rewarded at a probability of 0.2 on average), RR10 for 2 days and finally RR20 for 2 days. In the training sessions, home chows were given 1.5–2g daily to maintain 80–85% of their free-feeding weight.

Following the RI/RR training sessions, a 2-day devaluation test was conducted. A specific satiety procedure was applied to alter the current value of a specific reward. On each day the mice were allowed to have free access to home chows, which were used for maintaining their weights in the training sessions or sucrose solution which was earned by their lever pressing for at least an hour to achieve sensory-specific satiety. Immediately after the unlimited pre-feeding session, mice were given a 5-min extinction test during which the lever was inserted and pressing times were recorded without reward delivery. The order of the valued and devalued condition tests (day 1 or day 2) was counterbalanced across animals. Mice sensitive to manipulation of outcome value would significantly reduce their lever presses on the devalued condition compared with the valued condition. Then after two supplementary RI60 training sessions, mice were further evaluated by a 30-min omission test in which action-outcome contingency was altered. In the omission test, mice had to control their lever-press impulsion formed by previous training sessions for 20 s to obtain the reward. Any lever press would reset the time counter and mice would hold another 20 s not to press the lever for reward delivery.

Drug Administration

The following drugs were used in the present study: KW-6002 ((E)-1,3-diethyl-8-(3,4-dimethoxystyryl)-7-methyl-3,7-dihydro-1H-purine-2,6-dione, a selective adenosine A2AR antagonist) and DPCPX (8-cyclopentyl-1,3-dipropylxanthine, a selective adenosine A1R antagonist). KW-6002 (1 mg/kg, 5 mg/kg, Sundia, United States) was suspended in dimethyl sulfoxide (DMSO, sigma), ethoxylated castor oil (Sigma) and water with a proportion of 15%:15%:70%. DPCPX (6 mg/kg, Abcam) was dissolved in 0.9% NaCl with 5% DMSO. The control mice were treated with corresponding vehicles. All the solutions were prepared immediately before administration. The administered doses of KW-6002 and DPCPX referred to previous researches (Chen et al., 2001; Prediger et al., 2004; Nguyen et al., 2014). Drugs were injected intraperitoneally (i.p.) routinely in a volume of 0.1 ml/10 g of body weight. The specific drug administration time course depended on experimental designs: prior to (30 min before) and post (10 min after) everyday RI training for learning and consolidation periods of instrumental learning, respectively (Figure 2A), while treated 30 min before devaluation test/omission test, but not available in the RI training sessions for expression of instrumental behavior (Figure 3A).

DPCPX Concentration Detection

Considering the critical role of the striatum in control of instrumental behavior, we measured the concentration of DPCPX in the striatum of mice after intraperitoneal injection to verify the effective concentration of DPCPX. 30 min after DPCPX (6 mg/kg, i.p.) administration, the striata of mice were collected and homogenized. 0.1 ml of collected homogenate was added to a 1.5 ml centrifuge tube and followed by the addition of 0.01 ml methanol and 0.3ml of acetonitrile. The tubes were vortex mixed for 0.5 min. After centrifugation at 13,000 rpm for 10 min, 100 μl of supernatant was transferred to an auto-sampler vial. Next, 2 μl of the mixture was injected into the LC-MS/MS system for analysis. DCPCX concentrations were determined by ultrahigh performance liquid chromatography with mass spectrometry method (UHPLC-MS/MS). UHPLC-MS/MS analyses were performed by an Agilent UHPLC unit (Agilent Corporation, MA, United States) with a ZORBAX Eclipse Plus C18 column (1.8 μm, 2.1 × 50 mm, I.D. Agilent Corporation, MA, United States) thermostated at 25°C. The mobile phase was composed of 0.1% formic acid (A) and acetonitrile (B) with gradient as follows: 0.0 min at 50% B, 0.0–2.0 min linear increase to 98% B, and 2.0–3.5 min at 50% B and the flow rate was 0.4 ml/min. The total run time was 3.5 min. The electrospray interface was maintained at 500°C. Nitrogen nebulization was performed with a nitrogen flow of 800 l/h. Argon was used as the collision gas. DPCPX was detected in multiple reaction monitoring (MRM) scan mode with positive ion detection. The precursor-product ion pairs used for the MRM detection were m/z 305.4 → 178.1 for DCPCX.

Quantitative PCR of A1R mRNA

Striatal tissues from A1R KO mice and their WT littermates were analyzed by the quantitative real-time polymerase chain reaction (qPCR) procedure as we have described previously (Zhang et al., 2015) using the following forward and reverse primers for A1R mRNA: primers: forward, 5′-CATCCTGGCTC TGCTTGCTATT-3′; reverse and 5′-TTGGCTATCCAGGCTTGTTCC-3′.

Statistical Analysis

All data presented as mean ± SEM and were processed with SPSS 17.0. Two-way ANOVA for repeated measurements was used with training/testing sessions as within-subject effect and different drug administrations/genotypes as between-subject effect, followed by post hoc comparison by Bonferroni test, and with p < 0.05 as statistical significance.

Results

Pharmacological Blockade of A2ARs Promoted Goal-Directed Valuation

To perform flexible, goal-directed actions, animals must acquire the ability to encode both the contingency between a specific action and its outcome, and the current value of the outcome during instrumental conditioning (Balleine and Dickinson, 1998). We administered KW6002 (i.p. at 1 mg/kg or 5 mg/kg or vehicle) 5 min prior to everyday RI training session which was critical for establishment of habitual action (Figure 1B) to investigate the modulatory effect of A2AR blockade on the acquisition of instrumental behaviors. To better identify goal-directed behavioral pattern, we have also included another group of mice that were trained in parallel with RR paradigm which led to goal-directed behavior as control (Figure 1B). All mice gradually increased their lever presses and reached a platform eventually, indicating the successful training paradigm (Figure 1C). Mice treated with KW6002 at 5 mg/kg significantly elevated lever presses rate (interaction effect of training sessions X drug administration groups: F5,140 = 2.659, p = 0.006; between-subject effect of drug administration groups: F3,28 = 3.740, p = 0.022): the statistical significance was observed between the RI + KW6002 5 mg/kg and the RR + Vehicle groups (Bonferroni post hoc test, p = 0.035) but absent in any other comparison pairs including RI+KW6002 5 mg/kg versus RI + Vehicle groups (post hoc by Bonferroni test, p = 0.116).

Pharmacological Blockade of A2AR at the Coding, Consolidation and Expression Phases of Instrumental Behavior Exerted Its Enhanced Effect on Goal-Directed Valuation but Not on Action-Outcome Contingency

To further determine the modulatory effect of A2AR on the distinct processes of instrumental behavior (i.e., learning/coding, consolidation and expression phases), we administered KW6002 at specific time course of instrumental learning processes. Based on our previous study showing the effective biological (i.e., motor) effect of KW6002 5 mg/kg maintained for 150–170 min (Shen H.Y. et al., 2008; Yu et al., 2008), we selected the specific three time points for KW6002/vehicle administration (Figures 2A, 3A): (a) prior to training (30 min before RI training) or (b) post training (10 min after RI training) or (c) prior to behavioral testing (30 min before devaluation/omission test but not available in the RI training sessions) to determine the modulatory effects of KW6002 on coding and consolidation phases as well as the expression of instrumental behavior, respectively.

We then sought to investigate whether A2AR exerted its effect by acting on expression phase of instrumental behavior. In this experiment, KW6002 was administered 30 min before behavioral tests (devaluation and omission tests), but unavailable in all of the RI training sessions (Figure 3A). As expected, both pre-manipulation groups gradually increased lever presses rate and reached the platform and didn’t show any difference between each other (between groups effect, F1,13 = 0.395, p = 0.541; interaction effect of training sessions X pre-manipulation groups, F5,65 = 0.554, p = 0.608) (Figure 3B). As Figure 3C shows, mice with KW6002 treatment at the expression phase displayed markedly sensitivity to outcome devaluation (F1,6 = 10.857, p = 0.017) compared with the controls (F1,7 = 0.150, p = 0.710) in the devaluation test. Thus, blockade of A2AR facilitated expression of goal-directed behavior. In the omission test (Figure 3D), both groups decreased their lever presses gradually over testing time (testing time main effect: F5,65 = 4.226, p = 0.020), indicating the timing effectiveness of the omission test. But the tendencies of lever-press decrease rate for the two groups were parallel as indicated by the absent of the drug treatments X testing time interaction effect (F5,65 = 0.365, p = 0.728), though mice injected with KW6002 apparently pressed more than that of the vehicle-treated mice (between-subject effect of drug treatments, F1,13 = 3.369, p = 0.089). The increased lever presses rate by KW6002 in the omission test might attribute to general motor but not learning effect of A2AR antagonist, for drug administration was 30 min before the test. Therefore, the action-outcome contingency may not be affected by A2AR antagonist.

Pharmacological Blockade and Genetic Knockout of A1Rs Did Not Affect Acquisition or Goal-Evaluation of Instrumental Behavior

Discussion

Action-outcome contingency and goal-directed valuation are two cognitive components involved in instrumental conditioning (Balleine and Dickinson, 1998). Action-outcome contingency is determined by the causal relationship between the particular actions and outcomes, while goal-directed valuation depends on the anticipation or desire for the outcome (Yin and Knowlton, 2006). Both components were acquired in the training sessions of instrumental behavior. Thus, outcome devaluation procedure was specialized to probe the importance of the evaluative component of goal-directed actions. We found that pharmacological blockade of A2ARs critically promoted animals’ sensitivity to outcome value (by the devaluation test) but did not affect action-outcome relationship (as manifested by similar performance in the training sessions and in the omission test). When administering 5 min prior to the training, KW6002 at 5 mg/kg apparently elevated the acquisition of learning curve. This enhancement is, however, potentially confounded by the enhanced general motor activity effect of the A2AR antagonist. Additional studies with the A2AR antagonist administering 30 min prior to or post-training can better dissociate the learning process from motor effect and clarify this issue. The selective modulation of animals’ sensitivity to outcome devaluation by A2AR antagonist is in agreement with our recent finding that optogenetic activation of striatopallidal A2AR signaling in DMS alters goal-valuation as evident by the devaluation test (Li et al., 2016). On the other hand, the lack of the effect of A2AR antagonist on the acquisition of instrumental behaviors collaborates with similar findings by genetic inactivation of striatal A2ARs (Yu et al., 2009) and optogenetic activation of striatopallidal A2AR signaling (Li et al., 2016).

The mechanism underlying the selective modulation of goal-valuation by the A2AR is not clear. The previous study that overexpression of the D2R in the striatopallidal pathway is associated with a shift in behavioral control from habitual action to goal-directed responding but did not affect acquisition phase of instrumental learning (Kwak et al., 2014). Also, loss of striatal endocannabinoid-mediated long-term depression selectively in DLS striatopallidal neurons prevent the transition from goal-directed seeking to habitual responding behavior but did not interfere lever-press performance in the acquisition phase (Gremel et al., 2016). Given the documented antagonistic interaction of the A2AR-D2R and the A2AR-CB1R in the striatum by possibly the A2AR-D2R heterodimers (He et al., 2016) and A2AR-CB1R heterodimers (Moreno et al., 2017), these findings suggest that A2AR may selectively influence coding of the current value of the outcome (but not the contingency association) by the A2AR interaction with the D2R and CB1R functions in the striatum.

Moreover, this selective control of animals’ sensitivity to reward valuation by A2ARs might be related to a motivation factor, as A2AR (Mingote et al., 2008; Nam et al., 2013a) and D2R (Trifilieff et al., 2013) activities in the striatum contribute to motivational control of behaviors. Lastly, since the A2AR are predominantly expressed in the striatopallidal neurons, the A2AR control of goal-directed valuation is further supported by the finding from the striatal circuit studies showing that as pharmacogenetic inactivation of the striatopallidal pathway enhanced motivation by energizing the initiation of goal-directed behavior (Carvalho Poyraz et al., 2016), while optogenetic stimulation of the striatopallidal pathway suppressed motivational behavior (O’Hare et al., 2016; Vicente et al., 2016).

Defining the specific information processing phases (i.e., learning/coding, consolidation and expression of instrumental behaviors) for A2AR antagonist control of goal-directed versus habitual behaviors is critical for our understanding of the neurotransmitter modulatory mechanisms and for the development of effective pharmacological strategy to control aberrant habit formation and drug addiction. Our demonstration of the enhanced goal-directed behavior by administration of KW6002 at the pre-training or post-training or expression phases suggests that A2AR acts at the coding, consolidation and expression phases of instrumental learning to promote animals’ sensitivity to goal-directed valuation. It should be noted that the influence of the pre-training treatment paradigm on the goal-directed behavior might be partly attributed to its effect on the consolidation phase due to the relatively long-lasting effect (>2 h) of the A2AR antagonist KW6002. The similar control of instrumental behaviors by multiple treatment paradigms of KW6002 indicate that A2AR control of instrumental behaviors is largely independent of the confounding motor activity.

Various neurotransmitter systems have been implicated in control of the distinct phases of instrumental conditioning. For example, NMDA receptor signaling preferentially affected the coding (by administering NMDA antagonist at the pre-training phase) but not the expression (by administering NMDA antagonist at the post-training phase) of the instrumental conditioning (Yin et al., 2005). Furthermore, virus-induced overexpression of D2R (Trifilieff et al., 2013) and 5-HT6 receptor (Eskenazi and Neumaier, 2011; Eskenazi et al., 2015) preferentially affect the coding course of operant conditioning. Additionally, optogenetic activation of endocannabinoid signaling in the training session and pharmacogenetic suppression of endocannabinoid signaling in the devaluation test gated habit formation (Gremel et al., 2016), indicating that endocannabinoid modulated instrumental learning in both coding and expression sessions, consistent with the CB1R knockout study (Hilario et al., 2007). Thus, the A2AR may interact with multiple neurotransmitter systems in the cortico-striatal projection pathways to integrate/modulate glutamate, dopamine and endocannabinoid signaling for instrumental behavioral control at multiple phases of information processing. Furthermore, cognitive control and working memory processes are important for the efficient control of goal-directed behavior (Buschman and Miller, 2014). We and others have documented that the A2AR antagonists or focal A2AR knockdown in the DMS significantly enhance working memory (Wei et al., 2014; Kaster et al., 2015; Li et al., 2018). Thus, it is possible that when KW6002 is administered prior to the training phase, the A2AR antagonist may enhance goal-directed behavior by improving working memory. On the other hand, other mechanisms (such as “off-line” processing during sleep) may contribute to the A2AR antagonist-mediated enhancement of goal-directed behavior when A2AR antagonists are administered after the training or during the expression/retrieval phase.

Pharmacological Blockade and Genetic Knockout of A1Rs Did Not Affect Acquisition or Goal-Evaluation of Instrumental Behavior

Adenosine signaling acts at the facilitating A2AR and inhibitory A1R to exert its homeostatic control of brain function. However, very limited information is available regarding the A1R control of cognition, particularly instrumental behaviors. With its relatively high expression in the cerebral cortex, hippocampus and striatum (Reppert et al., 1991; Dixon et al., 1996), A1R activation has a profound inhibitory control of excitatory transmission by presynaptic and post-synaptic mechanisms (Dunwiddie and Masino, 2001; Ribeiro et al., 2002). Striatal A1Rs can preferentially interact with the striatal D1Rs via possible A1R-D1R heterodimers in the striatonigral neurons to control striatal signaling and behavior (Gines et al., 2000). Accordingly, A1Rs modulate striatal synaptic plasticity, and prevent scopolamine- and morphine-induced impairment in working memory (Hooper et al., 1996; Lu et al., 2010). However, in the fix-interval and fix-ratio operant training paradigms, A1R antagonist failed to increase lever pressing rate, but decreased fix ratio 20 (FR20, every 20 lever presses resulted in one reward) responding at higher doses (Randall et al., 2011). Operant performance alone was insufficient to define instrumental learning modes as goal-directed or habitual actions without devaluation and omission test (Yin and Knowlton, 2006). Thus, the role of the A1R in goal-directed versus habitual behaviors is still unknown. Our study demonstrated that pharmacological blockade or global knockout of A1R did not affect the acquisition of instrumental learning or sensitivity to reward value or reversal of action-outcome relationship. This finding is in agreement with a recent study that DPCPX failed to reverse the effect of D2R antagonist on effort-relevant tasks but KW6002 and caffeine (a non-selective adenosine antagonist) can (Salamone et al., 2009). These findings suggest that A1R plays limited modulatory role in control of instrumental behavior and adenosine predominantly acts on A2ARs but not A1Rs to modulate instrumental learning.

In summary, our study demonstrated that pharmacological blockade of A2AR but not A1R promote goal-directed behaviors by enhancing goal-directed valuation without affecting the action-outcome contingency and by acting at the coding, consolidation, and expression phases of goal-directed learning processes. These findings collaborates with our previous genetic and optogenetic studies, and with recent pharmacological studies of A2AR antagonists to control abnormal instrumental behavior in drug addiction paradigms (Nam et al., 2013a; Pintsuk et al., 2016), providing pharmacological evidence for a therapeutic strategy to enhance goal-directed behaviors in neuropsychiatric disorders. The translational potential of A2AR antagonists is further enhanced by the recent demonstration of the safety profiles of the A2AR antagonist KW6002 in clinical phase III trials for motor benefit in >3500 Parkinson’s disease patients (Chen et al., 2013) and by regular consumption of caffeine (a non-specific adenosine A2AR and A1R antagonist) by 50% world population.

Author Contributions

Funding

This study was sponsored by the National Natural Science Foundation of China (Grant Nos. 81600983, 31771178, and 81600991), by the Start-up Fund from Wenzhou Medical University (Grant Nos. 89211010 and 89212012), the Zhejiang Provincial Special Funds (Grant No. 604161241), the Natural Science Foundation of Zhejiang Province of China (Grant Nos. LY15H090020, LQ16H090006, and LQ17H090005), and the Wenzhou Science and Technology Program (Grant Nos. 2016Y0725 and 2016Y0613).

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.