Aversive reinforcement learning

Abstract

We hypothesise that human aversive learning can be described
algorithmically by Reinforcement Learning models. Our first experiment
uses a second-order conditioning design to study sequential outcome
prediction. We show that aversive prediction errors are expressed robustly
in the ventral striatum, supporting the validity of temporal difference
algorithms (as in reward learning), and suggesting a putative critical area
for appetitive-aversive interactions. With this in mind, the second
experiment explores the nature of pain relief, which as expounded in
theories of motivational opponency, is rewarding. In a Pavlovian
conditioning task with phasic relief of tonic noxious thermal stimulation, we
show that both appetitive and aversive prediction errors are co-expressed in
anatomically dissociable regions (in a mirror opponent pattern) and that
striatal activity appears to reflect integrated appetitive-aversive processing.
Next we designed a Pavlovian task in which cues predicted either financial
gains, losses, or both, thereby forcing integration of both motivational
streams. This showed anatomical dissociation of aversive and appetitive
predictions along a posterior-anterior gradient within the striatum,
respectively.
Lastly, we studied aversive instrumental control (avoidance). We designed a
simultaneous pain avoidance and financial reward learning task, in which
subjects had to learn independently learn about each, and trade off aversive and appetitive predictions. We show that predictions for both converge on
the medial head of caudate nucleus, suggesting that this is a critical site for
appetitive-aversive integration in instrumental decision making. We also
study also tested whether serotonin (5HT) modulates either phasic or tonic
opponency using acute tryptophan depletion. Both behavioural and imaging
data confirm the latter, in which it appears to mediate an average reward
term, providing an aspiration level against which the benefits of exploration
are judged.
In summary, our data provide a basic computational and neuroanatomical
framework for human aversive learning. We demonstrate the algorithmic
and implementational validity of reinforcement learning models for both
aversive prediction and control, illustrate the nature and neuroanatomy of
appetitive-aversive integration, and discover the critical (and somewhat
unexpected) central role for the striatum.