23 August 2012

What is reward?

So I recently passed my PhD viva and got a paper published (whoop whoop!). The titles of the two texts are ‘Multi-electrode analysis of pattern generation and its adaptation to reward’ and ‘Multi-neuronal refractory period adapts centrally generated behaviour to reward’. That last word, and my use of it in the texts, caused a fair amount of trouble. I naïvely thought it would be OK to leave the term undefined, seeing as we’re still working out how the brain’s reward system operates. For example, the observation that midbrain dopamine neurons are sometimes activated by stimuli purely by virtue of those stimuli being new and unexpected (rather than appetizing, sexual etc) suggests that novelty itself might be thought of as a reward. And anyway we all have a fairly good intuitive understanding of what constitutes a reward, right? Wrong. I need to define reward.

Here’s what we ended up writing in the paper following extended skirmish with reviewers:

"We will refer to… a ‘reward’ in the general meaning of a stimulus that promotes approach and consummatory behaviour rather than the more specific meaning of an unconditioned stimulus used as a positive reinforcer in a classical or operant long-term conditioning paradigm." (Harris et al., 2012)

I'd like to contrast this definition with Wolfram Schultz's, who writes:

"A reward is any object or event that generates approach behavior and consumption, produces learning of such behavior, and is an outcome of decision making." (Schultz, 2007)

Schultz's second condition, that rewards produce learning of approach behaviour and consumption, begs the question: does this refer to conditioning proper, in which memory persists long after the reward is removed, or does an effect on short-term memory suffice? For example, is a food object rewarding merely by virtue of inducing a high and sustained feeding rate, or must it also increase the probability that similar food objects be eaten in the future? This question has physiological consequences: both classical and operant conditioning require brief bursts of spikes in midbrain dopamine neurons (Tsai et al., 2009; Kim et al., 2012), whereas the rate and intensity of ongoing behaviour, and the stability of working memory representations, are regulated by the tonic concentration of dopamine, which is set by the number of dopamine neurons engaged in slow pacemaker firing at any given moment (Niv et al., 2007; Cools & Robbins, 2004). In fact, Schultz's definition of reward does require persistent memory formation, i.e. bursts of dopamine. I disagree. I think a stimulus-induced increase in the rate and intensity of approach and consummatory behaviour can be thought of as a reward-response regardless of whether it produces lasting behavioural change. Yael Niv has for example argued convincingly that the average rate of reward over time modulates tonic background concentrations of dopamine, and thereby adapts the rate and intensity of foraging behaviour (Niv et al., 2007). There are many indications that this extends also to non-food rewards. This view is also in accordance with Norman White's, who writes that rewards are stimuli that elicit approach behaviour whereas reinforcers induce memory consolidation (White, 1989). Roy Wise similarly notes that 'priming' is an important effect of rewards, but one which does not find its way into long-term memory (Wise, 2009).

Schultz's third condition, that rewards be the outcome of decision making, is also problematic. If this condition is taken to mean that a reward must be the consequence of an overt motor behaviour, as many people would argue, then two objections follow. First, cases of classical conditioning where a neutral stimulus is paired with for example food, producing a subsequent preference for the neutral stimulus, do not involve any overt motor behaviour or action and so cannot according to this definition be said to involve reward. This is in stark contrast to numerous papers that describe such experiments as 'classical reward conditioning' and the food stimuli used as rewards. Second, say you give a hungry rat a food pellet, either at a randomly chosen time or as a consequence of the rat wandering into a pre-defined part of the cage. Do we really want to say that the pellet is a reward in the latter case but not in the former? Physiologically there will be no difference: the dopamine burst response and its effect on synaptic plasticity will be the same. Isn't it in fact the case that brains are always in the process of deciding how to act, and operate by responding to correlations between their own activity states (be they sensory- or motor-states) and varying concentrations of dopamine? Whether or not a reward is in fact the causal outcome of a decision is irrelevant from the perspective of the brain.

In light of all this, I would suggest the following new definition of reward:

A reward is an object or event that induces approach and consummatory behaviour, and produces short- or long-term learning of that behaviour.

The lack of reference to rewards necessarily being the outcome of overt decision making constitutes a deviation from the way the term reward is used in everyday language (for example, an unexpected tax-return is a reward according to this new definition), but not, I think, from the way many scientists use the term. One might argue that such stimuli should be referred to as 'non-contingent rewards', but, at least in the case of the term 'reinforcement', this approach appears only to have complicated matters (Poling & Normand, 1999). Maybe then, we should drop the term reward entirely, and use 'positive stimulus' instead? However, this term has the serious disadvantage of not being a verbal noun. That is, whereas everyone understands the noun 'reward' and the associated verb 'rewarding', there is no established understanding of the (compound) verb 'positively stimulating' that is associated with the (compound) noun 'positive stimulus'. If anything, 'positive' has optimistic or ethical connotations that would jar with the amoral and downright destructive topics often discussed in relation to reward, such as addiction. The term 'appetitive stimulus' (and 'appetitively stimulating') avoids this problem but implies a focus on satisfying bodily needs, particularly hunger, whereas the key property of reward is that it can apply to any desire or goal.

Have I missed something; some word with the same meaning as 'reward' but better able to match the physiology? Is it time to make up a new word? If not, then I would suggest we stick with reward, using the definition above, accepting it as a slight neologism. The lack of a requirement that a reward necessarily be a consequence of overt decision making or motor behaviour should be appropriately tempered by the understanding that in fact the vast majority of rewards do occur as a consequence of decision making and motor behaviour - specifically as the result of exploration, trial-and-error, or more complex goal-oriented behaviours.

5 comments:

I think that, for the most part, the distinction doesn't really matter. Reinforcers are defined solely in terms of their ability to change the probability of response, whereas rewards are defined in terms of their economic value; it just so happens that rewards are always reinforcing. If you are interested in the rate of response or physiological changes associated with the rate of response, then the distinction is meaningless. By way of analogy, if we think of reinforcement as being a bit like magnetism, then discussing the reward/reinforcer distinction is like discussing the difference between types of magnet when the property of interest (attraction/repulsion or probability of response) is the same no matter which source we choose.

Let me share some thoughts, hopefully based on some evidence and primary research:- Higher order concepts in cognitive neurosci are a real distraction right now, probably not valid and leading to a lot of silly work. - These concepts includes the old ideological chestnuts, loved by less evidence-based disciplines: reward, value, emotions, consciousness, personality, choices, etc.- Using these concepts is a fundamental error since is precedes any understanding o the basic neurology or brain behavior and assumes what will need to be proved and operationalized at a neural level. We don't understand anything about the neural basis of emotions in other animals let alone h. sapiens. Why even use the term? No one less than Joe LeDoux is formally "rethinking" his use of this HO concept in favor of reducing to "survival circuits." If survival circuits need emotions and feelings to do their job -- all animals would be dead. Stat.- "Reward" is way off the map, in terms of an idea that is along way from proving even exists let alone how it works. "Value" and "choice" are the same. - It appears the brain and behavior operate in millisecond timeframes. One good estimate is that the stimuli > behavior step one takes pale in 150 ms. In these kinds of instantaneous time frames, when would HO concepts, like "choice" have time to work?!

Finally, there is the top down problem. HO concepts are now being used by pretty much every brain researcher and behavioral investigator to target their lab work.

So rather than do the basic cellular, anatomical, ethological, cross-species, grinding and hard descriptive work needed on the brain most time and money is being spent look for value, choice, consciousness, emotions, feeling, personality traits….the list is endless. This is not how productive science is done. Productive science is bottom-up from endless observations and accurate descriptions.

We do not have decent descriptions of the DA systems. cells, receptors, etc. let alone to slap some notion of value or reward on top of fMRI pictures. It's way too soon.

Reward, everyday language, for example seems to mean something given following effort. The brain idea seems to be:the brain, some cells, trigger and direct behaviorSomething comes back to the brain -- some cells (maybe different ones). What comes back increases the likelihood of that same organism making that same behavior

Maybe the same cells trigger the same behavior maybe not but the positive (or negative?) feedback from the behavior must "get" the brain something new/different. Fair enough?

However, this is very, very complex IRL -- probably why we should start with bacteria and invertebrates. Now let's remember this is all a very fast moving continual loop and feedback process so there is no discreet place or time where what the brain gets from the behavior registers. (thank you Paul Cisek.)

Where and when the heck does the "getting back" register ?-- seems a logical 1st order question.

Thank you for commenting. You seem to be making an eliminative materialist or behaviourist point, arguing that psychological concepts may have no place in a mature science of the brain. That's a completely respectable position to me, however even in behaviourism (or any theory of learning, including machine learning) the concept of 'reward' (or reinforcement) is a central and extremely useful one.

When and where does the positive feedback register? Well it seems dopamine is the mediator in all reward-seeking behaviours. That makes it doubly reasonable to speak of 'rewards' in neuroscience, since they share a common biological pathway.