Shifting responsibly: the importance of striatal modularity to reinforcement learning in uncertain environments.

Amemori K, Gibb LG, Graybiel AM - Front Hum Neurosci (2011)

Bottom Line:
We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways.Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals.Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.

ABSTRACTWe propose here that the modular organization of the striatum reflects a context-sensitive modular learning architecture in which clustered striosome-matrisome domains participate in modular reinforcement learning (RL). Based on anatomical and physiological evidence, it has been suggested that the modular organization of the striatum could represent a learning architecture. There is not, however, a coherent view of how such a learning architecture could relate to the organization of striatal outputs into the direct and indirect pathways of the basal ganglia, nor a clear formulation of how such a modular architecture relates to the RL functions attributed to the striatum. Here, we hypothesize that striosome-matrisome modules not only learn to bias behavior toward specific actions, as in standard RL, but also learn to assess their own relevance to the environmental context and modulate their own learning and activity on this basis. We further hypothesize that the contextual relevance or "responsibility" of modules is determined by errors in predictions of environmental features and that such responsibility is assigned by striosomes and conveyed to matrisomes via local circuit interneurons. To examine these hypotheses and to identify the general requirements for realizing this architecture in the nervous system, we developed a simple modular RL model. We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways. Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals. Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.

Figure 7: Neuronal activity in structures of the cortico-basal ganglia-thalamo-cortical network model. (A) Firing frequency of D1 MSNs in the striatum (n = 300). Color scale indicates firing frequency, x-axis indicates neuron index, and y-axis indicates time in arbitrary units. MSNs on the left (from x = 1 to 150) are in module A and MSNs on the right (from x = 151 to 300) are in module B. Neuron “a” in the input region of the cortex (Figure 6A, left) projects to MSNs 54 and 205, and neuron “b” projects to MSNs 99 and 249. (B) Firing frequency of D2 MSNs (n = 300), which receive exactly the same pattern of connections from the input cortex as do D1 MSNs. (C) Firing frequency of GPe neurons (n = 100). Adjacent GPe neurons receive overlapping convergent inhibitory input from adjacent striatal D2 MSNs. As a result of this overlapping convergent connectivity, the focal striatal activity causes less focal GPe inhibition (i.e., the inhibition is spread or “blurred” over adjacent GPe neurons; blue troughs). (D) Firing frequency of GPi/SNr neurons (n = 50), which receive convergent inhibitory input from striatal D1 MSNs (blue troughs) and excitatory input from STN (red peaks). (E) Firing frequency of STN neurons (n = 25), which receive convergent inhibitory input from GPe (red peaks represent lowest inhibition). (F) Firing frequency of thalamic neurons (n = 200) in the simulation using only positive responsibility signals. (G) Membrane potential of neurons in the output region of the cortex (n = 200) in the simulation using only positive responsibility signals. Vertical red bars represent persistent supra-threshold cortical depolarization maintained by self-feedback connections. (H) Firing frequency of thalamic neurons (n = 200) in the simulation using both positive and negative responsibility signals. (I) Membrane potential of neurons in the output region of the cortex (n = 200) in the simulation using both positive and negative responsibility signals. The blue troughs observed in the thalamic and cortical activity are deeper in the simulation using both positive and negative responsibility signals. Note: in (A,B), we show only about one out of every six of the inactive MSNs, to make the active MSNs more visible in the figure.

Mentions:
Specifically, two neurons in the input region of the cortex, representing the values of two specific actions (“a” and “b”), are activated at times 50 and 200 (Figure 6A, left). The activity levels of these cortical neurons are 0.8 and 1.0. These neurons excite specific neurons in the striatum (Figure 6A, middle and right; Figures 7A,B) and the output region of the cortex (Figure 7G). Corresponding module A and module B MSNs, and corresponding D1 and D2 MSNs, all receive the same cortical input. Specifically, both the D1 and the D2 MSNs having index numbers 54 and 99 (Module A, neurons “a” and “b” in Figure 6) receive the same cortical input at the same times, as do the D1 and the D2 MSNs having index numbers 205 and 249 (Module B, neurons “a” and “b” in Figure 6). Although in our model, for simplicity, corresponding D1 and D2 MSNs receive identical cortical inputs, our modeling framework does not require that these inputs be identical or originate from the same cortical projection neuron. Thus, our model is consistent with the evidence that D1 and D2 MSNs receive their inputs from different types of layer 5 cortical pyramidal neuron (Lei et al., 2004; Reiner et al., 2010).

Figure 7: Neuronal activity in structures of the cortico-basal ganglia-thalamo-cortical network model. (A) Firing frequency of D1 MSNs in the striatum (n = 300). Color scale indicates firing frequency, x-axis indicates neuron index, and y-axis indicates time in arbitrary units. MSNs on the left (from x = 1 to 150) are in module A and MSNs on the right (from x = 151 to 300) are in module B. Neuron “a” in the input region of the cortex (Figure 6A, left) projects to MSNs 54 and 205, and neuron “b” projects to MSNs 99 and 249. (B) Firing frequency of D2 MSNs (n = 300), which receive exactly the same pattern of connections from the input cortex as do D1 MSNs. (C) Firing frequency of GPe neurons (n = 100). Adjacent GPe neurons receive overlapping convergent inhibitory input from adjacent striatal D2 MSNs. As a result of this overlapping convergent connectivity, the focal striatal activity causes less focal GPe inhibition (i.e., the inhibition is spread or “blurred” over adjacent GPe neurons; blue troughs). (D) Firing frequency of GPi/SNr neurons (n = 50), which receive convergent inhibitory input from striatal D1 MSNs (blue troughs) and excitatory input from STN (red peaks). (E) Firing frequency of STN neurons (n = 25), which receive convergent inhibitory input from GPe (red peaks represent lowest inhibition). (F) Firing frequency of thalamic neurons (n = 200) in the simulation using only positive responsibility signals. (G) Membrane potential of neurons in the output region of the cortex (n = 200) in the simulation using only positive responsibility signals. Vertical red bars represent persistent supra-threshold cortical depolarization maintained by self-feedback connections. (H) Firing frequency of thalamic neurons (n = 200) in the simulation using both positive and negative responsibility signals. (I) Membrane potential of neurons in the output region of the cortex (n = 200) in the simulation using both positive and negative responsibility signals. The blue troughs observed in the thalamic and cortical activity are deeper in the simulation using both positive and negative responsibility signals. Note: in (A,B), we show only about one out of every six of the inactive MSNs, to make the active MSNs more visible in the figure.

Mentions:
Specifically, two neurons in the input region of the cortex, representing the values of two specific actions (“a” and “b”), are activated at times 50 and 200 (Figure 6A, left). The activity levels of these cortical neurons are 0.8 and 1.0. These neurons excite specific neurons in the striatum (Figure 6A, middle and right; Figures 7A,B) and the output region of the cortex (Figure 7G). Corresponding module A and module B MSNs, and corresponding D1 and D2 MSNs, all receive the same cortical input. Specifically, both the D1 and the D2 MSNs having index numbers 54 and 99 (Module A, neurons “a” and “b” in Figure 6) receive the same cortical input at the same times, as do the D1 and the D2 MSNs having index numbers 205 and 249 (Module B, neurons “a” and “b” in Figure 6). Although in our model, for simplicity, corresponding D1 and D2 MSNs receive identical cortical inputs, our modeling framework does not require that these inputs be identical or originate from the same cortical projection neuron. Thus, our model is consistent with the evidence that D1 and D2 MSNs receive their inputs from different types of layer 5 cortical pyramidal neuron (Lei et al., 2004; Reiner et al., 2010).

Bottom Line:
We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways.Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals.Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.

ABSTRACTWe propose here that the modular organization of the striatum reflects a context-sensitive modular learning architecture in which clustered striosome-matrisome domains participate in modular reinforcement learning (RL). Based on anatomical and physiological evidence, it has been suggested that the modular organization of the striatum could represent a learning architecture. There is not, however, a coherent view of how such a learning architecture could relate to the organization of striatal outputs into the direct and indirect pathways of the basal ganglia, nor a clear formulation of how such a modular architecture relates to the RL functions attributed to the striatum. Here, we hypothesize that striosome-matrisome modules not only learn to bias behavior toward specific actions, as in standard RL, but also learn to assess their own relevance to the environmental context and modulate their own learning and activity on this basis. We further hypothesize that the contextual relevance or "responsibility" of modules is determined by errors in predictions of environmental features and that such responsibility is assigned by striosomes and conveyed to matrisomes via local circuit interneurons. To examine these hypotheses and to identify the general requirements for realizing this architecture in the nervous system, we developed a simple modular RL model. We then constructed a network model of basal ganglia circuitry that includes these modules and the direct and indirect pathways. Based on simple assumptions, this model suggests that while the direct pathway may promote actions based on striatal action values, the indirect pathway may act as a gating network that facilitates or suppresses behavioral modules on the basis of striatal responsibility signals. Our modeling functionally unites the modular compartmental organization of the striatum with the direct-indirect pathway divisions of the basal ganglia, a step that we suggest will have important clinical implications.