Up until this point, I have discussed
instrumental learning and voluntary behaviour almost entirely in
terms of the pursuit of the pleasant, or more automatic versions of
this concept which correspond to the seeking of positive goals.
This has been in the interests of simplicity, but clearly any theory of
willed action would have to include the avoidance of undesired
outcomes as well as the search for wanted rewards, and any more
general and less cognitive account of the effects of motivation on
learning needs equally to consider at least two kinds of
motivation, associated with sought-after and feared events, or with
more reflexive forms of approach and avoidance behaviours.

Perhaps even more than in other chapters, the reader may here
encounter confusion due to arbitrarily selected technical terms in
learning theory. The most conventional distinction between
putatively pleasant versus disagreeable events refers to
‘appetitive’ versus ‘aversive’ reinforcers, corresponding to
appetites and aversions or appetitive and aversive motivation These
should be regarded as the most conventionally correct terms
(Mackintosh, 1974, 1983), but there are many variations in usage
(e.g. Gray, 1975). I shall speak fairly loosely of attractive and
aversive events, or attractive and aversive

204

emotional states, and hope that the meaning is clear from
the context. However some of the terminological difficulties arise
from genuine theoretical questions surrounding the degree of
interchangeability of reward and punishment. It is logically
possible to conceive of a single urge underlying them both; Hull (
1943) for instance based his theory on the universal biological
necessities of nourishing and preserving the bodily tissues, but
drew an analogy between pain and hunger as the mechanisms for
dealing with these needs, and was thus able to use the single
concept of drive reduction for all motivation. Behaviour in Hull’s
theory is always impelled by goads, either internal or external,
never attracted by equivalent positive goals. The best that one can
hope for in this scheme is to minimize one’s levels of irritation
and distress. Few have been optimistic enough to make quite so
thorough a job of the converse of Hull’s theory — the Pollyanna
conviction that the motive for response is always to make things
better, life consisting of degrees of happiness, with even the most
unpleasant ordeals perceived in terms of how much joyful relief the
future may bring. However, Herrnstein (1969), Gray (1975),
Dickinson (1980) and Mackintosh (1983) have all emphasized that
escape from unpleasantness can in some cases be explained in terms
of future attractions, in the context of experiments in which rats
perform responses which reduce the frequency of the electric shocks
they would otherwise receive.

Often there are problems in tying down the
subjective aspects of positive and negative emotions to measurable
behaviours, or in making even theoretical distinctions between
their effects. One can imagine building a robot in which all
desirable ends were represented as positive numbers, and all
adverse outcomes by negative numbers: the only motivational
instruction necessary for this artificial creation would be to
maximize the aggregate, and any constant, positive or negative
which was added to individual values would be irrelevant. A feature
of this idealized system is that the rewards and punishments, or
attractive and aversive reinforcers, have equal, but opposite,
effects. I shall use as a theme for this chapter the question of
whether, in practice, in the natural as opposed to the idealized
world,

205

reward and punishment have this sort of symmetry. To the
extent that they do not, it will clearly be necessary to say in
what ways the motivational systems for reward and punishment
differ.

Anatomical and functional separation of attractive and
aversive mechanisms

It will be as well to start with the line of
evidence just appealed to in discussing types of association
(chapter 6) the biological facts of brain structures and the
theories of the behavioural functions which these structures serve.
It has been clear ever since Papez (1937) pointed it out that the
limbic system or ‘Papez circuit’ of the vertebrate forebrain, which
is an interconnected network of brain parts, is the place to look
for motivational mechanisms. Lesioning of different parts produces
different motivational effects. Amygdala lesions make the animal
tame and inappropriately relaxed; septal lesions make it jumpy and
aggressive; lesions of the lateral hypothalamus and pyriform cortex
make it under- or over-sexed; and lesions of the ventro-medial
versus lateral hypothalamic regions make it eat too much or too
little. There are no agreed interpretations of precisely what this
evidence means, but in the case of motivation associated with
eating, then physiological theories have it in common that they are
all complicated, assuming separate mechanisms for such factors as
motivational states resulting from extreme hunger, the effects of
food palatability, and detailed control over when an animal starts
and stops a particular meal (Green, 1987). This supports the
psychological expectation that eating may occur either as a
reaction to strongly unpleasant inner sensations of hunger, or in
the absence of physiological need, in considered and sybaritic
anticipation of taste-enjoyment, or in some combination of these.

The stronger and more direct form of evidence is however that
from animals’ reactions to mild electrical stimulation of different
points in the limbic system, which gives rise to the assumption
that there are pleasure and pain centres in the brain (Olds, 1958).
Olds (1961) re-affirmed his belief that these results require the
addition of a pleasure-seeking mechanism

206

to any drive reduction or pain-avoidance formula such as
that proposed by Hull. In Olds’s words, pleasure has to be seen as
Qa different brand of stimulation’ from the mere absence of drive
(1961, p. 350) . On the face of it, this arises simply because
there is no obvious source of a need or drive state when rats that
are not deliberately deprived of anything, run to a particular
place in a maze, or repeatedly press a lever, apparently because
this results in their receiving electrical stimulation of the brain
(via electrodes implanted through their skulls). Deutsch and
Howarth (1963) proposed an ad hoc defence of drive theory,
which relied on the assumption that the same electrical stimulation
both initiated the drive and reduced it, but this cannot cope with
the findings that rats will run across an electrified grid. to get
brain stimulation (Olds, 1961), will perform more or less normally
on schedules of reinforcement for bar-pressing in Skinner boxes
when long intervals intervene between successive episodes of brain
stimulation (Pliskoff et al., 1965; Benninger et al.,
1977) without any prior priming, and also appear to be
comforted by appetitive positive brain stimulation received during
illness in the taste-aversion paradigm (see p. 232 below; Len and
Harley, 1974).

The behavioural effects of rewarding brain
stimulation thus appear to support the view that there is an
attractive motivational mechanism. But few have ever doubted this;
our question is to what extent the attractive mechanism is equal
and opposite to the aversive one. Some asymmetries appear to be
present anatomically. Olds (1961) suggests that the reward system
takes up rather a large part of a rat’s brain, the punishment
system much less of it: out of 200 electrodes placed in the brain
at random, 35 per cent had rewarding effects on behaviour, 60 per
cent had no apparent motivational effects at all, and only 5 per
cent had definite punishing effects. Using standard behavioural
tests, the precise location of rewarding and punishing sites can be
plotted; in Olds and Olds (1963) study a point was judged
attractive if animals pressed a lever to turn electricity on, but
aversive if, when a train of stimulation was started by the
experimenter, the rat pressed a similar lever to turn this off.

207

The main features of the anatomical lay-out suggested by
this procedure are that:

(i) points where electrical stimulation is attractive are
centred generally on the hypothalamus and its main fibre tract
connection with the septal area, the median forebrain bundle, with
hardly any involvement of the thalamus (a sensory relay station);

(ii) conversely points with exclusively aversive
behavioural effects were found frequently in the thalamus, and also
in the periventricalar region of the midbrain

(iii) many points which showed both attractive and
aversive effects were found in the hypothalamus, in the medial area
for instance;

(iv) ‘pure’ effects one way or the other were most likely
in fibre bundles, while the ambivalent points, in nuclei,
demonstrate that the two systems are often brought close together
in physical proximity.

These physiological results do not provide strong evidence as
to whether punishment is the mirror image of reward in terms of its
behavioural effects, but they certainly suggest that there are two
separate physiological systems, which interact, and that the
aversive system is fairly directly connected to sensory input, in
the thalamus and midbrain, as would be expected for pain and
discomfort, whereas the attractive mechanism is intimately involved
with metabolic and autonomic control, as would need to be the case
if some of the attractive systems serve purposes in connection with
bodily needs and homeostatic balances, and cyclical variations in
behaviour. This can be related to the analysis of different types
of drives (Gray, 1975) and to theoretical schemes of biological
function. At a very rudimentary stage of examination of this, it
would not surprise us to find that there were motivational
imperatives of different degrees of urgency. A hungry animal being
chased by a predator should only have one choice when internal
comparisons are made between the importance of eating and the
importance of escaping, but, while keeping a watchful eye open to
the possibility of danger, a prey animal may need to make
sophisticated adjustments about its own choices of palatable but
costly versus abundant but boring items. In

208

terms of function, it seems unlikely that the underlying
mechanism for panic flight should have much in common with the
incentive to fill oneself with the most energy-rich food available
in times of great abundance. And in the natural world, as opposed
to the laboratory, for many species the time devoted to active
escape from danger or the immediate food-seeking may be short by
comparison with that taken up by nest construction, complicated
social interactions of several kinds, migration, and exploring and
updating of unfamiliar or familiar territorial domains. All these
various activities need some kind of psychological system to
sustain them, and it is not likely that just one or even just two
kinds of motivational apparatus would be sufficient for the whole
lot.

Similarities between reward and punishment

Having established that there are grounds for
expecting qualitative differences between attractive and aversive
motivational systems, we ought now to inspect the contrary evidence
— that the behavioural effects of the two systems are roughly equal
but opposite. That is to say, attractive stimuli attract, and thus
encourage the performance of responses which it has been learned
will bring them about, while aversive stimuli repel, and discourage
behaviours which make them more likely. The above statements may
appear to be tautologous, and thus not worth experimental
examination. This is almost the case, and perhaps would have been
had not both Thorndike (1931) and Skinner (1953) argued the
contrary. Thorndike was persuaded by some data that should have
been treated more tentatively that neither young chicks nor
undergraduates possessed any mechanism which would prevent them
from doing again things which had previously proved
disadvantageous, and from the beginning had emphasized that it was
accidental successes, rather than accidental error, that was the
engine of trial-and-error learning. Skinner was similarly sceptical
about the ability of rats to associate unfavourable outcomes with
their own behaviour, but this was linked to an idealistic and
perhaps practically sound rejection of the use of punishment by
parents and teachers to control the behaviour of children.
Skinner’s argument

209

seems to have been that punishing a child for wrong-doing
will produce generally counter-productive emotional upheavals,
which may become transferred even more counter-productively to
associated events by classical conditioning; but that the
punishment will not act as a deterrent for any specific response.

We may accept Thorndike’s suspicions about the
fallibility of chicks and undergraduates, and Skinner’s doubts
about the advisability of punitiveness in parents and teachers,
without discounting the symmetry that certainly exists to some
degree between the encouragement of responses by reward and their
deterrence by punishment, but without perhaps going quite as far as
to say that ‘The most important fact about punishment is that its
effects are the same as those of reward but with the sign reversed’
(Mackintosh, 1983, p. 125). The deterrent effect of aversive
stimuli on instrumental responses can in fact be readily
demonstrated in the typical Skinner box, and indeed was so
demonstrated by Estes (1944). If rats press a lever because this
delivers food pellets, they may be deterred from pressing it by the
addition of mild electric shocks, delivered to the feet at the
moment the lever is pressed. Depending on their degree of hunger,
the size of the food pellets and the strength of the shock, they
will continue to press if rewarded with little or no punishment,
and continue to refrain from pressing if the punishment is strong
enough, and is given invariably if they occasionally try to get
away with it. Moreover, if the rewards cease, the effect of the
rewards will dissipate as the animals learn that the response no
longer brings them about, and similarly if shocks cease, the effect
of punishment will disappear as the animals learn that these are
not forthcoming: these effects are symmetrical, since they can both
be construed as learning about the consequences of responding
(Mackintosh, 1974, 1983). However, there are limits to the
symmetry. First there is a logical difference between learning
about positive and negative response consequences. Since positive
consequences are sought, and, if learning has been successful,
found, then any change in the positive consequences of responding
will quickly become apparent to the responding animal. On the other
hand, since negative consequences are withdrawn from, and,

210

if learning has been successful, avoided, then changes in
negative consequences may not immediately present themselves to the
animal which is not responding, and this is one reason why, other
things being equal, we might expect the deterring effects of a
temporary unpleasant consequence of responding to be somewhat more
lasting than the encouraging effects of a temporary incentive with
equivalent emotional force. This may be the explanation for the
finding of Boe and Church (1967) that a very strong series of
shocks given consistently for rats’ intermittently food-rewarded
lever pressing deterred further lever-pressing completely and
indefinitely.

It is certainly arguable that in addition to
bias against gathering new information about pain and distress, any
scale of these affective qualities will be difficult to map on to a
scale of the desirability of food pellets according to their size
or taste, even with a change of sign. However, within limits, it is
possible in behavioural experiments to construct a scale of
practical equivalences, by setting off given amounts of
attractiveness in a goal against degrees of unpleasantness
encountered in the course of achieving it. Vast amounts of evidence
were collected before the Second World War by Warner (1927, l928a,
l928b) and Warden (1931), among others, using the ‘Columbia
obstruction box’, in which rats were required to run across an
electrified grid to obtain access to food, water or a member of the
opposite sex, under systematically varied conditions. Rats were
reluctant to run across the standard grid for food until they had
been without food for at least two or three days, but crossed with
little hesitation for water if deprived of this for 24 hours. Male
rats ran across the same grid to get to a female in heat rather
more often one day after previous sexual contact than four weeks
after, and very much less if tested within six hours of previous
copulation; females only crossed at all to males in the most
receptive half of their estrus cycle, with a peak number of
crossings confined to the estrus phase. The highest rates of
crossing the standard electrified grid were observed in maternal
rats separated from their young (Nissen, 1930; Warden, 1931).

It would be unwise to place very much emphasis on these
results, but it is clear that the animals were capable (a) of

211

learning that there was a desired goal object of a certain
kind in the ‘incentive compartment’ on the other side of the
electrified grid, and (b) of combining this knowledge, in however
simple a form, with the level of their current appetite, so that,
for instance, they would cross to food livery hungry, but not when
moderately so. Stone (1942) was able to get essentially similar
results, without using the somewhat artificial device of the
shocking grid, by training rats to dig through tubes filled with
sand, or to scratch their way through a succession of paper-towel
barriers blocking a runway, in order to get to goal objects. More
precise quantification was obtained by Brown (1942) and Miller
et al., (1943: see Miller, 1944), who trained rats to run down
an alley towards food while wearing a harness with a cord attached,
which allowed their movements to be carefully measured, in some
cases in terms of the force with which they pulled against a
calibrated spring. It was found that hungry rats, trained to run
towards food, pulled against the spring with almost as much force
when 2 metres away as when very much closer to the food. However
other rats, who received electric shocks in the goal box instead of
food, pulled vigorously to get away when subsequently placed close
to the goal box, but did not pull at all when placed 2 metres away,
even after extremely severe previous shocks. It would have been odd
if any other result had been obtained, since hungry rats are
presumably still hungry when far away from food, whereas shocked
rats are not necessarily afraid once they are far away from the
site of their aversive experience (see below, p. 218). The
difference between the pulling towards food and away from shock is
often referred to as the difference between approach and avoidance
gradients, and drawn as in Figure 7.1. The argument in favour of
such approach-avoidance gradients is strengthened by experiments in
which the same rats are both shocked and fed in the same goal box.
Subsequent behaviour at various points on the path to this goal can
be predicted in terms of the strength of current hunger, and the
intensity of previous shocks. With moderate values of both, animals
approach about half-way towards the goal and then stop, as would be
expected from Figure 7.1. Either stronger hunger or weaker shock
leads to closer approach; with weak hunger

212

or more aversive shocks animals naturally keep further away.
Although this was true on average, there was considerable variation
between and within individual animals. Some rats adopted a pattern
of consistent vacillation, of increasingly hesitant approaches
followed by abrupt retreats, while others moved forward in steps,
making long pauses before each

213

small approach, eventually coming to a complete halt
(Miller, 1944).

Figure 7.1 Approach and avoidance gradients.

Schematic plots of how the strength of approach and avoidance
responses may vary with distance from the goal object based
on experiments in which rats receive food and electric shock at the
same place. In (a), it is apparent that strong approach tendencies
may result in high points of avoidance gradients being encountered.
Paradoxically, (b) demonstrates that a reduction of avoidance
tendencies in circumstances of conflict may have the effect of
raising the point at which approach and avoidance tendencies
balance out. After Miller (1944).

All this suggests that positive incentive, or
the attractive-ness of a goal, can be somehow weighed against the
negative incentive derived from previous aversive experiences.
Logan (1969) used precisely these terms in another claim that
‘the effects of punishment are symmetrically opposite to the
effects of reward’, based on an experimental variation on the theme
of conflict between reward and punishment, which included a more
explicit choice between alternatives. Rats were allowed to choose
between running down a black or a white alley, after having
previously done ‘forced trials’ to ensure equal experience of what
the black and white choices entailed. First, preferences were
established by such differentials as seven food pellets at the end
of the black alley, but only one in the white; or three pellets
given at the end of both alleys, but available immediately in the
white goal box, but only dropped in 12 seconds after arrival in
the black goal box. Both these procedures establish strong
preferences in hungry rats, since these behave as if they would
rather have seven than one food pellet, and a given amount of food
sooner rather than later. Logan then examined how easy it was to
reverse these preferences by the obstruction box method of making
the rats run over an electrified grid for 2 feet before reaching
the preferred goal, and varying the intensity of the shocks thus
delivered. A very orderly effect of shock intensity on percentage
of choices of the originally preferred goal was observed, and a
stronger shock was necessary to persuade the animals to choose one
instead of seven pellets than to shift the preference for immediate
versus delay rewards of the same size. This difference was even
more pronounced when shock had to be endured on only 50 per cent of
the approaches to the preferred goal. Choice of seven versus one
pellet was very resistant to that procedure, the risk of only a
very strong shock reducing preference to just under 50 per cent,
whereas the choice of immediate over delay reward was still
strongly determined by shock intensity, rats settling for delayed
rewards fairly frequently (on about 60 per cent of choices) even
with the risk of only a low-shock intensity (Logan, 1969, p. 47).

All these results, and many others (Azrin and Holtz, 1966;

214

Solomon, 1964; Morse and Kelleher, 1977) seem to suggest
that reward and punishment are ‘analogous if not equivalent
processes’ (Morse and Kelleher, 1977), are symmetrical but opposite
in their effects, and so on. Gray (1975, pp. 135 and 229) has
formalized this view first with respect to the symmetry of the
possible behavioural effects of delivering or withholding
attractive and aversive events; and second by presenting a
theoretical model, shown in Figure 7.2, in which, as can be seen at
a glance, precisely comparable mechanisms are proposed for the
operation of reward and punishment, with a ‘decision mechanism’
which allows for the results quoted above, in which the attractive
effects of reward are balanced against the aversive effects of
punishment. In instrumental learning therefore, there are many
reasons for assuming that punishment may sometimes operate in more
or less the same way as reward, even though there are differences
in anatomical factors and in ecological function. In terms of
Figure 5.8, which was used to summarize the sorts of associations
possible in instrumental learning with reward, all that is
necessary is to substitute ‘unwanted’ for ‘wanted’; to interpret
‘appropriately associated with unwanted events’ to mean that such
behaviours will involve withdrawal rather than approach; and thus
to suppose that the result of learning that a response has unwanted
consequences (at (1) in Figure 5.8) will be an impulse to inhibit
such responses rather than make them. As in Gray’s model (Figure
7.2), it is necessary that rewards are automatically linked to
impulses to approach, and to the repetition of rewarded responses,
while aversive stimuli must be inherently linked to withdrawal,
behavioural inhibition, or an internal ‘stop’ command. Clearly, as
a consequence of this in general and in Figure 5.8, when the
punishment mechanism works, punished responses are suppressed, and
the relevant motivating event is notable by its absence).

Figure 7.2 Gray's symmetrical model of reward and
punishment.
The only difference between reward and punishment in this model is
in their differing effects on the motor system. After Gray
(1975).

The theory of avoidance learning

The symmetry of attractive and aversive events can certainly
be maintained in plotting the predicted effects of increases and
decreases in their frequency. Animals should behave so

215

as to maximize their receipt of appealing experiences and
minimize their encounters with aggravation and distress: thus they
should learn to repeat responses which either bring about or
prolong rewards or prevent or cut short punishments; and they
should also learn to inhibit responses which either prevent or
truncate pleasurable or satisfying states of affairs, and they
should learn to inhibit responses which initiate or continue pain
or distress. Both this last sentence and Gray’s diagram (Gray,
1975, p. 135) may appear complex, but they are simply behavioural
elaborations of the pleasure/ pain principle. The first step in
this is to say that responses

216

which bring about rewards should be repeated, and
responses that bring about punishments should be stopped. The
second step goes beyond this, to deduce that responses which
prevent otherwise available rewards should be inhibited, whereas
responses which prevent or truncate otherwise imposed punishments
should be repeated. It has often been pointed out (Mowrer, 1939;
Dickinson, 1980) that this second step is very much more demanding
of the cognitive abilities of both the animal and the learning
theorist, because the critical consequences of responding are
unobservable — what is important is that nothing happens.

There are a number of explanations for why the
absence of an event may be critical in serving as a goal or
reinforcement for instrumental learning. Perhaps the most
straightforward explanation for the theorist, if not for the system
being explained, takes the form of assuming that the behaving
system contains comparator mechanisms, which assess whether current
levels of attractive or aversive stimulation are greater or less
than expected. If an expected event does not take place, this fact
can thus be fed into the relevant motivational device — the absence
of a reward should be regarded with displeasure, but the absence of
an expected punishment clocked up as something to be sought after.
Such arrangements are included in Figure 7.2. The main problem with
this is that it takes an enormous amount of continuous cognitive
processing for granted. Whenever a normally obtained reward or
punishment is missed, the system should sit up and take notice, and
this implies some form of continual vigilance. But we have already
seen that such comparator mechanisms, albeit of varying degrees of
complexity, are a universal feature of basic learning processes. In
habituation to motivationally insignificant stimuli, it is assumed
that all stimulus input of this kind is compared to a ‘neuronal
model’ of what is expected, the distinction between novel and
familiar stimuli being between stimuli which do or do not match the
model (Sokolov, 1963 see p. 40). For classical conditioning, it is
assumed that the signalling stimulus arouses some representation of
the signalled event, subsequently compared with obtained experience
(Dickinson, 1980, see p. 105). For instrumental learning with
rewards, we assume that the

217

representations of wanted events are available, often
before the relevant response is made (Tinkelpaugh, 1928, see p.
153). When an expected reward is not obtained for an instrumental
response, most theories assume that some process of frustration or
inhibition is aroused, which is responsible for the eventual
decline of non-rewarded responses (Rosenzweig, 1943; Amsel, 1962;
Pearce and Hall, 1980; Dickinson, 1980; Gray, 1975). On these
grounds it would seem almost an aesthetic necessity that for the
purpose of achieving harmony and symmetry, we should also assume
that when an expected punishment is omitted this is sufficient to
encourage the repetition of any response associated with the
omission. Fortunately there is behavioural evidence to suggest that
something of this sort does indeed take place (Herrnstein, 1969).
However, there is an even greater amount of evidence to suggest
that this is not the only significant process in instrumental
learning motivated by aversive stimulation.

Escape learning

Little attention has been given to what is
formally known as escape learning, in the case where electric
shocks or other localized aversive stimuli are delivered, but most
of the arguments about Thorndike’s experiments on cats which escape
from small boxes would apply — for instance, is the successful
response an automatic habit, or is it made in knowing anticipation
of its consequences? If a rat in a Skinner box is exposed to
continuous painful electrical shocks from the floor, it will
normally learn rapidly any response which makes this stimulus
cease, whether it is moving to a part of the floor that is safe,
rolling on its back to make use of the insulating properties of its
fur, or pressing a standard lever which serves as the off-switch.
It is arguable that responses made to relieve already present pain
or discomfort are more likely to be made automatically and
reflexively than topographically similar behaviours learned under
the influence of rewards which follow them. First of all painful
aversive stimuli may have greater motivational immediacy than
others (see below, p. 230), but apart from that, there is little
need to construct cognitive representations of a motivationally
significant

218

stimulus which is already present before the response is
made, whereas the logical structure of learning for rewards means
that there has to be an internal and cognitive representation of
the motivating event, if the reason for the response is to be known
while it is being initiated. In Hullian terms the drive stimulus
may be rather more obvious and vivid when it is externally imposed
than when it is generated by internal time-sensitive cycles (see
Gray, 1975; chapter 4). Nevertheless, one of the behavioural
phenomena reliably observed when rats press levers in Skinner boxes
to turn off electric shocks is difficult to explain purely on a
Thorndikean stamping-in basis. In all experiments of this type, it
is necessary to specify how long the shock is turned off for. It
can be for the rest of the day, or the rest of the experiment, but
more commonly it is for something like 20 seconds, after which the
shock starts again, and the lever must be pressed again (Dinsmoor
and Hughes, 1956; Davis, 1977). Under these circumstances rats have
a strong tendency to hold the lever firmly pressed down during the
shock-free intervals. In a very large part, this is due to the
species’ instinctive reaction of ‘freezing’ or crouching very
still, which is elicited by painful stimuli or signals of danger
(Davis, 1977), but it is also maintained by its utility as a
preparation for rapid execution of the next response (Dinsmoor
et al., 1958; Davis, 1977).

Apart from the emphasis on instinctive
reactions, few conclusions can be drawn from the fact that animals
learn rapidly to repeat naturally favoured responses which turn off
shocks (Davis, 1977). A great deal more theoretical interest has
been attracted to the case of avoidance learning, since by contrast
with escape learning, where the motivating stimulus occurs
conspicuously before each response, this event, when learning is
successful, is rarely, if ever, seen.

The two-process theory of avoidance learning

The two-process theory of avoidance learning
appeals to the classical conditioning of fear or anxiety as the
first process, and the instrumental reduction of this unpleasant
emotional state as the second. It is associated with the names of
Mowrer (1940, 1960) and Miller (1948), and an apparatus known as

219

the Miller-Mowrer shuttle box (actually used originally by
Dunlap et al., 1931). This is a box with two compartments,
each with a floor through which electric shocks can be delivered,
perhaps with a hurdle or barrier between them. Every so often, say
about once every 2 minutes, a buzzer is turned on for 10 seconds.
If the animal in the box stays in the compartment where it is when
the buzzer starts, it receives shock at the end of the 10 seconds.
However, if it jumps or otherwise shuttles to the alternative
compartment before the 10 seconds are up, the buzzer is (in most
experiments) turned off, and (in all experiments) no shocks are
given on that trial. It is clearly greatly to the advantage of the
animal concerned if it shuttles between the two compartments when
it hears the buzzer, and thus the basic result is capable of
explanation by the principle that behaviours which reduce the
frequency of unpleasant experiences should be learned (Herrnstein,
1969).

Thus, instrumental learning, conceived as the
principle of reward and punishment in an abstract logic of events,
is capable by itself of explaining why avoidance learning ought
to occur. It cannot explain why, in many instances, mainly when
the required responses conflict with instinctive reactions,
avoidance learning . fails to occur (Bolles, 1978; Seligman and
Johnston, 1973) and it misses out altogether the undeniable fact
that in most cases the delivery of aversive stimuli arouses
distinctive emotional states, which are highly conditionable, in
the sense that predictable though not identical emotional reactions
are quickly induced for other stimuli which are taken as signals
for impending pain or distress (see backward conditioning, p. 87).
There is thus ample reason to retain the two-process theory, to the
extent that it predicts both conditioned emotional states and
responses motivated by them, while also acknowledging that there is
evidence for more calculated forms of avoidance learning, based on
anticipation of the consequences of responding, compared to the
state of affairs that might otherwise be expected to obtain (Gray,
1975; Mackintosh, 1983).

The paper ‘Anxiety-reduction and learning’ by O.H. Mowrer
(1940) stated a simple and direct form of the two-process theory,
including the assumption that anxiety

220

reduction should qualify as a ‘satisfying state of
affairs’ in Thorndike’s Law of Effect. Behavioural responses which
bring relief from anxiety should therefore be stamped in or
fixated. As an experimental test of the theory, Mowrer trained
several groups of rats and guinea pigs to run around a circular
track composed of eight grid segments which could be independently
electrified. Once a minute a tone was turned on for 5 seconds, with
shock to be delivered to the segment the animal was initially
standing on at the end of the tone. But as soon as the animal moved
forward at least one segment the tone was turned off, and no shock
was delivered. After three days of this at 24 minutes per day, the
animals received only two out of the 24 potential shocks in a day,
and stable performance was reached at four shocks per day which
counts as about 80 per cent (20/24) correct avoidance. There were
minor species differences between the rats and guinea pigs, rats
learning better with random rather than fixed intervals between
trials, and the guinea pigs the other way around. Mowrer (1940)
argued that the running response to the tone was established
because it relieved conditioned anxiety or dread, but pointed out
that learning of this sort appeared to work best when the response
that ended dread closely resembled the response which would be used
to escape from the dreaded event itself, when it was present. This
would be reasonable, even if the main process involved was the
conditioning of an emotionally loaded representation of the
stimulus event, but it leaves Mowrer’s evidence open to the
objection that no conditioned anxiety, or anxiety relief, was
necessary, since more direct ‘knee-jerk’ conditioning of the motor
response alone would account for the results.

There are many reasons why this objection
could be discounted, but the clearest demonstration that avoidance
learning involves more than motor response shift would require that
the response made to avoid is quite different from that used in
escaping from the event avoided. Such a demonstration was provided
by Miller (1948), who used a procedure in which the response
learned was separated from the response elicited by shock, both by
topography and the lapse of time. In a two-compartment shuttle box
Miller first gave 10 trials only in which rats received
intermittent shock

221

for 60 seconds in a white compartment without being able
to escape, and were then confronted with continuous shock and a
suddenly opened door. All the animals here learned to run through
the door quickly, to the safe black compartment. Also when
subsequently put in the white compartment even without shock, they
ran to the black compartment. The next phases of the experiment
showed that this was not just a matter of automatic running. First,
the rats were left in the white compartment with the door closed,
but a wheel by the door which, if turned, opened the door. All rats
showed variable behaviour around the door which could be construed
as attempts to get through it. Half of them (13/25) moved the wheel
by accident the fraction of a turn necessary to open the door,
during the first few trials, and thereafter became more and more
adept at turning the wheel to open the door as soon as they were
placed in the white compartment. The others tended more and more to
adopt the posture of rigid crouching. These results are strong
evidence that the initial phase of shocks meant that the rats
thereafter were put into an aversive motivational state by being in
the white compartment, and that the novel behaviour of turning a
paddle-wheel device was learned by being associated with escape
from the anxiety-provoking compartment. Miller (1948) then
changed the procedure for the 13 rats that were turning the wheel,
so that the wheel no longer worked, but the door would open if a
bar, on the other side of it, was depressed. The first time this
was done, the animals turned the wheel more vigorously than usual —
although only a fractional turn had previously opened the door the
typical (median) number of whole turns was almost 5, one rat making
530 whole turns. However, all but this one, by the fifth attempt,
quickly opened the door by pressing the bar, instead of turning the
wheel.

Problems for two-process theory of avoidance

The experiment by Miller (1948) and others in
which animals learn to shuttle back and forth between two
compartments as the signal for impending shock (Mowrer and
Lamoreaux, 1946; Kamin, 1956) appear to support the two-process
explanation

222

nation of classical conditioning of what can loosely be
called fear or anxiety to external cues, and instrumental
reinforcement of responses which remove the animal from these cues,
or otherwise reduce fear. The reader may however have noticed that
there is a blatant contradiction between what is implied in this
account and the claim made in the previous chapter (pp. 184—8) that
classical conditioning is ineffective if there is only intermittent
pairing of a signal with a signalled event. It is in the nature of
successful avoidance learning that the signal for shock is no
longer a signal for shock, because responses are made which prevent
the shock happening. There are a number of ways around this
contradiction. Most directly, there is 100 per cent correlation
between the signal and the shock when no response is made, and the
fact that a response has been made can quite properly be regarded
as having altered the character of the signal — ‘signal alone means
shock’, and ‘signal plus response means no shock’ are both reliable
and consistent associations. There are other aspects of reactions
to aversive stimulation however which indicate the need to appeal
to special factors to do with strong instinctive reactions to pain
and danger, and with related emotional and physiological changes
involving stress.

The contradiction between the supposed
sensitivity of classical conditioning to the degree of correlation
between events, and the persistence of avoiding behaviours in the
absence of anything to be avoided, is taken to extremes in what is
called ‘traumatic avoidance learning’ (Solomon and Wynne, 1954;
Solomon et al., 1953; Solomon and Wynne, 1953). This is
simply a shuttle-box avoidance procedure used with dogs rather than
rats, and with strong shocks. With a 3—minute interval between
trials, and a 10—second signal of lights going off and the gate
between compartments being raised, Solomon and Wynne (1953)
reported that dogs received only seven or eight shocks before
reaching the criterion of 10 consecutive avoidance responses, which
involved jumping over a shoulder-high hurdle. This is not
particularly exceptional — the result that is theoretically
important is that once this criterion had been reached, dogs showed
no sign at all fever reducing their tendency to jump when the
signal came on, even after 200 trials at 10 per day,

223

or in one case after 490 trials. There was evidence that
the animals became much less emotional as the extinction procedure
(without. any received or potential shocks) proceeded, but that the
latency of jumping either stayed the same or decreased.

One argument is that these two reactions are
connected, and that ‘anxiety conservation’ occurs because the
continued fast jumping prevents the dog ‘testing reality’ by
discovering that shocks will no longer occur if the signal is
ignored. However, it was no simple matter to confront the dogs with
reality in a way which quickly removed the conditioned response of
jumping. If a glass barrier was inserted between the compartments
so that successful jumping was impossible, the dogs at first
appeared excited and anxious, and over 10 days quietened down, but
then, if the glass barrier was removed, they immediately began
jumping again. An alternative procedure of punishing the jumping
response was tried, by simply arranging that shock was present in
the opposite compartment to the one the dog was in, when the signal
sounded. Here it was highly disadvantageous for the dogs to
continue jumping, but this is precisely what most of them did. If
the glass-barrier procedure and the punishment procedure were
alternated, then the jumping response was finally suppressed, but
its persistence in the absence of any overt benefit was clearly not
predictable on the straightforward version of the two-process
theory (Solomon et al., 1953; Solomon and Wynne, 1954;
Seligman and Johnston, 1973).

It is therefore necessary to modify the two-process theory in
some way to account for the persistence of avoidance responding
when no further motivating events are observed. There are several
possible modifications, all of which have some merit. However, it
first has to be said that modification of the two-process theory is
not always necessary, since avoiding behaviours are not always very
persistent. For instance in Mowrer’s experiment already quoted
(1940), in which rats or guinea pigs ran around a circular maze at
the sound of a 5—second tone, omitting all shocks from the
procedure meant that many animals stopped running by the end of one
session of 24 tones. Since they had previously only been receiving
about four shocks per day, this implies that

224

the first few times they waited the full 5 seconds of the
tone and received no shock, this immediately led to a drop in the
tendency to run. Many other experiments with rats have found
conditioned avoidance responses quickly declining when shocks are
no longer given (Mackintosh, 1974, 1983). In most of these the
shock is not associated with the animal being in a particular
location (as it was in the experiment of Miller, 1948) but with a
light or sound signal. Thus the only modification in two-process
theory that is necessary is the one which suggests that the signal
with no response functions as the cue which is consistently
associated with shock, while the signal together with a response is
associated with not getting shock (Dickinson, 1980). It is already
the case of course, that the instrumental part of two-process
theory should be strengthened by intermittent reinforcement (see p.
183).

Habitual responding which prevents fear

As Mowrer’s original formulation of two-
process avoidance learning (1939) was explicitly inspired by
Freud’s book The Problem of Anxiety (1936), the next
modification is no novelty. Freud of course proposed that although
anxiety was a reaction to danger, the important dangers for people
were internal personal conflicts, rather than external events, but
whatever the source of anxiety, it is a thoroughly unpleasant and
unwanted emotional state. One of Freud’s main points was that
‘symptoms are only formed in order to avoid anxiety’ (1959, p.
144). In other word, neurotic habits keep anxiety in abeyance. The
same principle appears to apply to avoidance learning. Solomon and
Wynne (1954) pointed out that overt signs of emotionality (e.g.
pupil dilation, excretion) in their dogs tended to decline after
initial experiences, partly because the avoidance response of
jumping was made so fast there was no time for the autonomic
nervous system to react to the signal. But the general reactions of
the animals between the signals also became much calmer.
Measurement of the heart rate of dogs (Black, 1959) and of the
behavioural reactions of rats (Kamin et al., 1963) support
the contention that with well-trained avoidance responding there is
little evidence of conditioned fear (Seligman and Johnston,
1973).

225

Thus the two-process theory has to be extended beyond its
most rudimentary. formulation, which implies that fear is
classically conditioning, and responses are only made when impelled
by high levels of this conditioned fear. It is possible that well-
learned avoidance responses may be sustained for some time purely
by habit (Mackintosh, 1983, p. 168), but it is also likely that, as
Freud suggested (1959, p. 162) there is a second kind of anxiety,
which motivates the avoidance of full-blown fear reactions. One
aspect of this ‘non-fearful’ motivation for avoidance responses
that has been put to experimental test is that responses may be
made as if they were being rewarded by safety or relief, as
‘attractive events’ (Dickinson, 1980, pp. 106—9). It is certainly
the case that explicit environmental signals which guarantee the
absence of shock facilitate performance when given as feedback for
avoidance responding by rats (Kamin, 1956; Morris, 1975). However,
this is only a relative kind of attractiveness:

according to theoretical definitions, the
absence of shock can only be attractive, indeed it can only be
noticed, if there is already some form of anticipation of shock
(Dickinson, 1980). Responding in anticipation of safety from pain
can hardly be equivalent in its emotional connotations to
responding for palatable titbits, as is suggested by the finding
that even animals responding successfully on avoidance schedules
may develop stomach ulcers (Weiss, 1968; Brady a al., 1958).
However, it seems undeniable that standard avoidance learning
procedures involve the avoidance of conditioned fear, as well as
escape from conditioned fear. Well-trained animals do not wait
until they become afraid before they respond, they respond so as to
prevent themselves becoming afraid.

Herrnstein’s theory of avoidance learning

Herrnstein (1969) proposed that the notion of
fear, or indeed of any emotional evaluation whatever, should be
eliminated from theories of avoidance learning by adapting the
hypothesis above so that it refers only to external aversive
events; animals respond so as to prevent themselves from
experiencing aversive events, or ‘the reinforcement for avoidance
behaviour is a reduction in time of aversive stimulation’

226

(1969, p. 67), aversive stimulation being interpreted only
in terms of events observable in the environment, outside the
animal. Thus one of the processes in two-process theory appears to
be eliminated. However, Herrnstein recognizes that, in order for
his theory to work, it must be assumed that animals first detect
the changes in the likelihood of being shocked with and without a
response, and secondly, learn to produce activities which have the
outcome of lessened exposure to disagreeable events. His explicit
alternative to ordinary two-process theory is to substitute a more
bloodless cognitive assessment of shock probabilities for internal
emotional states which have motivational properties. This is a
technical possibility, in that once it is assumed that behaviour
will be directed by the outcome of less of a certain kind of
experience, it is not absolutely necessary to add in anything else
by way of emotion. I shall conclude that independent evidence
suggests that in practice there usually are conditioned emotional
effects produced by external aversive events, but it is appropriate
to give first Herrnstein’s side of the argument. The usual two-
process theory makes most sense when there is a clear signal which
predicts an avoidable shock. Thus when the buzzer sounds in a
shuttle box it appears reasonable to assume that the buzzer will at
least initially arouse fear, which will motivate the learning of
new responses. But it is possible to study avoidance learning by
other methods, in which there is no clear signal for impending
shocks. Sidman (1953) discovered a procedure known as ‘free
operant’ or Sidman avoidance, in which rats press a lever in a
Skinner box to avoid shocks. There is no external signal. One timer
ensures that a brief shock will be delivered every x seconds if
there are no responses, and another timer over-rides this with the
specification that shocks are only delivered after y seconds since
the last response. Thus if the rats do not press the lever they
will be shocked every x seconds (say 10) but if they press the
lever at least once every y seconds (say 20) no shocks whatever
will be delivered. It is possible to adapt two-process theory by
assuming here that there is an internal timing device which serves
as a signal — the sense that (y-1) seconds have elapsed since the
last response could serve as a signal for impending shocks, and
there is some

227

indication that this sort of thing happens, since rats
often wait for (y-1) seconds before responding. However,
Herrnstein and Hineline (1966) modified this procedure to discount
internal timing, by making all time intervals random. In their
experiment, rats were shocked on average once every 6 seconds if
they did not respond, but when they pressed a lever, they produced
a shock-free period which averaged 20 seconds. They could not
postpone shock indefinitely, but they could reduce the frequency of
shocks significantly by lever-pressing — it is as if they could
escape temporarily from trains of shocks only 6 seconds apart on
average. This procedure was in fact more successful in inducing
lever-pressing than Sidman avoidance (Herrnstein, 1969). Since all
time intervals were probabilistic, Herrnstein argues that any
internally timed process of fear would be redundant, for both the
animal and the theorist, and both would do better simply to accept
that lever-pressing reduces shock frequency, and is therefore a
behaviour worth performing (1969, p. 59) . There is something to
this, as a parsimonious strategy or procedure, but there is
compelling additional reason to suppose that inner emotional states
may be aroused, whether or not they serve a useful function
in a particular experimental procedure. Monkeys which perform for
long periods on a Sidman avoidance response, but so successfully
that they hardly ever receive shocks, nevertheless are liable to
develop stomach ulceration which may prove fatal (Brady et al.,
1958). Seligman (1968) found that random and unpredictable
shocks produced ‘chronic fear’ in rats, assessed both by stomach
ulceration and by very substantial depression of food-reinforced
responding in the ‘conditioned emotional suppression’ technique.
These are precisely the sorts of data that suggest the involvement
of a central emotional state, over and above whatever cognitive
assessment of shock frequency might be sufficient grounds for
instrumental behaviour. It thus seems more likely than not that
even Herrnstein and Hineline’s (1966) rats felt relatively relieved
once they had pressed the lever, but were aversively motivated when
they did not. The best tests of emotionality in these circumstances
are undoubtedly physiological indices, but some aspects of the
behavioural data are more suggestive of strong emotion

228

than cognitive finesse. After experience on the schedule
described above, in which lever presses were necessary to produce
intervals between shocks averaging 20 seconds, the rats were left
to respond with all shocks programmed at this average, irrespective
of responding; in other words responding was completely without
utility, and it eventually ceased. However, this extinction process
was extremely prolonged, and therefore has something in common with
the persistent behaviour observed in Solomon et al.’s (1953)
dogs. One rat made 20,000 responses during 170, 100-minute sessions
in which responses were useless, before slowing to a halt. This may
be because of the strength of an automatic habit, but it is likely
that this persistence of the behaviour is related to the emotional
and motivating force of the painful aversive stimuli. With a very
similar procedure, in which rats had the opportunity of
distinguishing between circumstances in which randomly delivered
food pellets either did or did not depend on lever-pressing, no
such persistence of unnecessary behaviour was observed (Hammond,
1980). This again suggests some form of asymmetry between the
motivating effects of attractive and aversive events, with, if
anything, aversive stimuli having more powerful and long-lasting
emotional effects than attractive ones. Herrnstein (1969) was right
to point out that the behavioural evidence to define emotional
effects is often lacking, and his arguments support those of Freud
(1936), Mowrer (1939), Solomon and Wynne (1953) and Seligman and
Johnston (1973), that behaviour which is motivated by the avoidance
of aversive events can often be sustained with little overt or
covert sign of high emotional arousal. But this does not mean that
emotional states can simply be dropped from all discussions of the
motivating effects of aversive events; behavioural evidence
including posture and other instinctive forms of emotional
expression and results obtained with the ‘conditioned emotional
response’ (CER) procedure, as well as more direct physiological
indications of autonomic arousal, plus all the data not included
here on the effects of tranquillizing drugs on aversively motivated
performance (see e.g. Gray, 1982; Green, 1987), provide ample
grounds for continuing to include conditioned fear or anxiety in
theories of aversive motivation.

229

Instincts and anticipation in avoidance
responding

It is difficult to determine to what degree
such responding, which avoids anxiety, is based on habit, as
opposed to calculation of the undesirable consequences of not
responding. It is probable that dogs and cats, if not rats, have a
certain degree of anticipation of specific painful possibilities
which they wish to avoid. Thus dogs or cats will strongly resist
being put back into an apparatus in which they have been shocked
several days before (Solomon et al., 1953; Wolpe, 1958).
Rats do not normally do this, in fact there is often a ‘warm-up’
effect, meaning that after a 24 hour break, rats do not respond in
an avoidance procedure until they have received a number of shocks.
Although avoidance responding of a minimal kind, usually the
continuous flexing of a leg which will be shocked if it is
extended, can be obtained in cockroaches and spinal mammals
(Horridge, 1962; Chopin and Buerger, 1975; Norman et al.,
1977), it is worthy of note that decorticate rats, though they
may be trained to perform many standard food-rewarded tests, have
never been reported to have mastered any of the avoidance learning
tasks used with normal animals (Oakley, 1979a). This proves nothing
in itself, but adds to the general impression that avoidance
learning, or perhaps anxiety itself, is partly a product of the
imagination. Ordinary rats typically perform avoidance tasks at 80
per cent success, receiving one shock in five, while it is much
more common for cats and dogs, and rhesus monkeys, to perform a
response almost indefinitely after receiving just a few shocks
(Solomon and Wynne, 1954; \Volpe, 1958; Brady et al., 1958).
Much anecdotal evidence, for instance about monkeys looking for
snakes, suggests that larger mammals have specific expectancies
about precisely what aversive stimulus is to be anticipated, as
opposed to only unpleasant but inchoate inner feelings. An
experiment by Overmeier et al. (1971) supplied some measure
of support for this suggestion, since dogs trained to avoid shock
by nosing a panel to their left at one signal, but to their right
at another, did so more quickly if each signal consistently
predicted shock to a particular leg (whether on the same or
opposite side as the response needed to avoid it) compared with
animals for which

230

either signal was followed by shock to either leg at
random. The authors of this report argue that signals predict some-
thing specific, and not only something which is quite generally a
bad thing (cf. Trapold, 1970, p. 155). There is evidence that rats
also, as well as dogs, may acquire reactions to a signal that are
specific to particular signalled aversive events:

Hendersen et al. (1980) found that
prolonged associations between a signal and an airblast meant that
the signal had little effect if was turned on while animals were
responding to avoid electric shock, whereas a similar brief
association, or a prolonged association long past, meant that
responding to avoid electric shock was more vigorous. The
suggestion is that one aspect of the association, which was the
more long-lasting, was arousal or diffuse fear produced by the
signal, but a second, more ephemeral part of the association linked
to the signal reactions or representations specific to the
airblast.

However, whatever the level of cognitive representation of
specific feared events that may take part in avoidance learning, it
is by definition the instinctive and built-in reaction to both the
aversive events themselves and to unpleasant emotional states
associated with them to withdraw and shrink from them, and to
perform any response which lessens contact with them. It is
completely possible also that part of the instinctive and built-in
reaction to aversive events include modulation of the classical
conditioning process, so that an intermittent association between
an arbitrary signal and pain is taken more seriously than an
intermittent association between a similar signal and food reward.
This is more likely for aversive events which produce emotional
reactions of very high intensity; Solomon and Wynne (1954) for
instance, proposed that traumatic conditioning of anxiety might be
irreversible if feedback from the periphery of the autonomic
nervous system caused some kind of brain overload. This is rather
vague, but some explanation is needed for the empirical evidence
that only one or two associations between a signal and a powerfully
aversive event may produce indefinite aversion to the signal
(Wolpe, 1958; Garcia et al., 1977b). The cases where only
one pairing of a signal and an aversive event occurs make nonsense
of the otherwise well-supported theory that only a statistically
reliable correlation or contingency

231

between the two events can lead to the formation of psycho-
logical associations (Rescorla, 1967; Dickinson, 1980). As a
generality, it seems wisest to accept that the difference in the
anatomical systems used to process attractive and aversive events,
and, functionally, the difference between the ecological
requirements for the seeking of food and drink on the one hand, and
the requirements for not becoming some other animal’s food on the
other, will lead to asymmetries between reward and punishment
beyond those logically necessitated by the outcome-testing nature
of approach to significant objects and the outcome-assuming nature
of withdrawal.

Part of the asymmetry may lie in the central
criteria used for conditioned emotional associations, with, as it
were, more stringent internal statistical criteria required for
hope than for fear. But Solomon and Wynne (1954) may well be right
to point to the autonomic system as well. Aversive stimuli arouse
the ‘fight or flight’ or ‘behavioural inhibition’ syndromes of the
sympathetic nervous system and limbic brain system respectively
(Gray, 1982) . Once this sort of physiological arousal has reached
a certain point, it may become aversive in its own right, and thus
a signal initially associated with a strong aversive stimulus may
become motivationally self-sustaining (Eysenck, 1976; Walker, 1984,
chapter 9). This is still somewhat speculative. There is little
doubt, however, that many of the peculiarities of avoidance
learning, and indeed of any form of reaction to aversive
stimulation, can be attributed to the instinctive behaviours of the
species involved, or ‘species-specific defensive reactions’ or
‘SSDRs’ (Bolles, 1970, 1978). It may be that fearful emotional
states generally produce more rigid and reflexive behaviour than
relaxed exploration or systematic foraging for nutritional
necessities. In any case, it is possible to point to many specific
responses, such as leaping and frightened running, or passive
crouching (freezing) by laboratory rats, which, are automatically
elicited by particular aversive stimuli, and therefore are likely
to occur in response to associated signals for such stimuli whether
they are useful or not. This has led to assertions that many kinds
of learning induced by aversive stimuli are special cases, and not
explicable in terms of general principles, but it is not necessary
to abandon all

232

general principles provided that included among them are
principles which take into account instinctive behaviours. Taste-
aversion learning is a case in point.

Taste-aversion learning

It is perhaps a measure of the persisting
effects of the dust-bowl empiricism of pre-war learning theories
that the phenomena of taste-aversion learning should initially have
been found surprising, and that the concept of natural functional
relationships between stimuli in learned associations should have
taken so long to take root (Garcia, 1981). It is a fact of life
that though eating is essential, eating the wrong thing can be
disastrous, and to the extent that the process of learning is
useful in foraging and food selection, animals, especially those
with a varied diet, ought to be capable of learning from experience
foodstuffs that are best avoided. Thus young jays quickly learn not
to eat moths with a foul taste on the basis of visual cues, and it
is well known that some species of moth which are palatable have
evolved markings like those of others which are not, because of
selective predation a phenomenon known as mimicry (Maynard-Smith,
1975). The biological advantage of taste lies entirely in
distinguishing what should be eaten, and when, and how eagerly,
from what should not be eaten; and these categories, although they
may be to some extent innate, might usefully be modified according
to the post-ingestional consequences of specific eating
experiences. There is a certain amount of evidence that, as Hull
(1943) would have predicted, metabolic usefulness of what is eaten
may lead to slight alterations in taste or smell preferences for
example protein-deprived rats increase their preference for the
odour of a diet associated with receipt of balanced proteins (Booth
and Simpson, 1971; Green and Garcia, 1971; Holman, 1968; see also
Booth et al., 1972; Booth et al, 1976). There is much
clearer and stronger evidence that animals very rapidly become
averse to the taste of a food eaten before they became ill.

This became apparent in studies of the effects of
radioactivity on animal behaviour. Exposure to radiation affects
the intestinal tract and makes animals ill; after they have

233

recovered they may refuse to eat foods previously consumed
(Haley and Snyder, 1964; Rozin and Kalat, 1971). This would not
have surprised Pavlov in the least, but a considerable stir was
created when Garcia and Koelling (1966) published a careful
experiment which suggested that the effect was selective, in that
taste cues much more than visual cues appeared to acquire
associations with illness. The experimental technique was to place
thirsty rats in a small box containing a drinking spout for 20
minutes a day, measurements being taken of the number of times they
lapped at the spout. The water might be given a sweet or a salty
taste, and an attempt was made to provide audiovisual feedback of a
roughly equivalent kind by arranging that a flash of light and a
click would occur each time a rat lapped at the drinking spout.
Rats were first pre-tested to assess rate of drinking ‘bright-noisy-
tasty’ water under these conditions. During the training phase, on
most days the animals were allowed to drink plain water
undisturbed, but every three days the distinctive sight, sound and
taste feedback was given, and the animals subsequently became ill,
some because, while drinking saccharin-flavoured water, they were
exposed to a sufficiently strong dose of X-rays, and others because
lithium chloride, which tastes salty, was added to their water. For
comparisons, yet other rats were allowed to drink bright-noisy-
salty water while a gradually increasing shock, which eventually
suppressed their drinking, was applied to the floor (‘delayed
shock’), and a fourth group had alternate 4 minutes of immediate
shock when they drank with the three kinds of feedback, but no
shock when they drank plain water without audiovisual feedback.

All the animals in this experiment (Garcia and
Koelling, 1966) had very much suppressed drinking by the compound
cues during the training procedure, the shock animals some-what
more than those poisoned. The crucial phase occurred when no
further aversive events, either sickness or shock, were imposed,
and the rats were tested separately with water that had the
previously experienced taste, but no sight and sound feedback, or
the sight and sound feedback with plain water. These tests showed
very clearly that animals which had been shocked drank just as much
of the flavoured water as they

234

had done of plain, but drank less when plain water had the
sight and sound cues. And by contrast, animals poisoned drank
normally under these conditions. but drank very little of water
flavoured with saccharin or with normal salt, in the absence of the
light and click feedback (the sweet and salty taste having been
used for the X-ray and lithium chloride groups respectively). Since
all the rats received both taste and audio-visual feedback during
conditioning, it would appear that there was a selective tendency
to connect the internal and visceral sensations of illness with the
cue of taste, and to connect the pain coming from the outer
environment with some aspect of the audio-visual compound. Internal
consequences were associated with the internal cue of taste, while
external effects were associated with the external modalities of
sight and sound. This kind of result has been very widely
replicated (e.g. Domjan and Wilson, 1972; Miller and Domjan, 1981;
Revusky, 1977), However, the explanation which should be given for
this fairly straightforward finding has been a matter of dispute
(Milgram et al., 1977; Logue, 1979). It is first necessary to
emphasize that the phenomenon is quantitative rather than
qualitative. Pavlov (1927) reported that symptoms of illness could
readily be associated with the sight of the syringe which normally
preceded their induction, in dogs (see p. 73), and rats will become
averse to black or white compartments (Best et al., 1973) or
to a compound of external cues that represents the particular box
they drank in before being poisoned (Archer et al., 1979). As
noted in Chapter 3, extremely specialized metabolic reactions, such
as those which happen to prevent the analgesic effects of morphine,
are capable of being conditioned to external cues which
characterize a particular room. And on the other hand, experiments
such as that of Logan (1969) quoted above, indicate that peripheral
electric shocks (rather than only illness) can alter food
preferences. Thus there is no need to assume that certain forms of
unpleasant experience can be associated only with
biologically appropriate cues. The effects are a matter of degree:
what has to be explained is a kind, of selectivity, in which when
there are several possible stimuli which could be taken as cues for
a biologically significant event,

235

whichever stimulus is most biologically appropriate or
relevant is likely to be dominant.

There is little disagreement as to the form
taken by the phenomenon, but a variety of views as to what should
be concluded from it. Differing amounts of emphasis are given to
the innate and built-in aspect of whatever mechanism is
responsible. Garcia and Koelling (1966) refer to ‘a genetically
coded hypothesis’ which might account for the observed
predisposition, and the phenomenon of taste-aversion learning is
usually taken to contradict tabula rasa assertions about
animal behaviour (Revusky, 1977; Logue, 1979), but less specific
forms of innate determinacy, such as a gathering together of
internal and external stimuli (Revusky, 1977), perhaps as a sub-
example of a principle favouring spatial continuity in the
formation of associations (Mackintosh, I 983) , have been
defended.

Garcia himself has tended to interpret what is now frequently
referred to as the ‘Garcia effect’ in terms of innate mechanisms,
and has drawn attention to the fact that the taste system is
neuroanatomically related to visceral stimuli in vertebrates, since
input from the tongue and the viscera both are collected in the
brainstem, and there is in fact a particular structure there, the
nucleus solitarius, which receives both taste and gastro-intestinal
input fairly directly (see Garcia et al., 1974, l977a, 1977b).
There are thus anatomical grounds for expecting that taste should
be especially likely to be affected by visceral experiences — more
so even than smell, since the olfactory input is to the forebrain,
where it goes to the limbic system. There is behavioural evidence
(Garcia et al., 1974) to support Garcia’s theory that the
olfactory system is used for appetitive food-seeking it supplies
information, along with vision and hearing, about objects at a
distance, but is more closely connected than they are with
motivational urges to find things which taste good but reject
things which taste bad. The taste system of the tongue, according
to Garcia et al. (l977a, p. 212), interacts with the limbic
(and olfactory system), but is also affected by visceral receptors
which ‘assess the utility of the ingested material for the needs of
the internal economy’. The evidence is that rats do not appear to
associate a smell with illness

236

which occurs some time afterwards (while they do so
associate tastes: Hankins et al., 1973), but rats do
associate smell with pain, when a food substance is paired with
electric shocks (Hankins et al., 1976; Krane and Wagner
(1975) suppressed food intake by delayed shocks but did not assess
the relative contributions of taste and smell).

There is thus every indication that the ease
with which taste-aversions are formed reflects innately determined
mechanisms, perhaps even in the form of visible neuroanatomical
circuits. How does this affect the theoretical questions of the
symmetry of reward and punishment, and the validity or otherwise of
the general principle of learning? Garcia et al. (l977b) put
the case that there is a symmetry between the dislike of tastes
associated with bodily distress and cravings for tastes associated
with relief from distress caused by illness or nutritional
deficiencies. The onset of illness tends to be sudden, and recovery
gradual, which makes dislikes very much more frequent than likes,
but if thiamine-deficient rats are given a thiamine (vitamin B)
injection after drinking saccharin-flavoured water, they
subsequently showed an increased preference for it (Garcia a
al., 1967). There is thus support for a ‘medicine’ effect,
which is in the opposite direction to the taste-aversion effect. It
is not necessarily equivalent in all other respects, but clearly
post-ingestional (and post-ingestional) relief from aversive bodily
states, and associations with subsequent pleasurable internal
feelings, change preferences for ingested substances, very
strikingly so in the case of human addictions to alcohol or other
drugs.

The charge has been made that taste-aversion learning is a
specialized and circumscribed phenomenon, with little in common
with other kinds of aversively motivated change in behaviours, and
that principles of learning should be assumed to be specific both
to particular categories of events to be associated, and specific
to particular species (Rozin and Kalat, 1971; Seligman, 1970). This
charge can however be effectively refuted, since it is possible to
point to many similarities between taste-aversion and other forms
of learning, and indeed many similarities between very different
species of animal, provided that some principle of selectivity in
the formation of associations is accepted, such as ‘preparedness’

237

(Seligman, 1970) or ‘relevance’ (Revusky, 1977;
Mackintosh, 1983) and provided that innate motivational mechanisms
and innately determined instinctive behaviours are included as
determinants of both learning and performance. Revusky (1977)
persuasively argues that if the errors of extreme behaviourism and
empiricism are renounced, and it is accepted that ‘from a
naturalistic point of view, all aspects of the learning process are
innate’ (1977, p.46), then many if not all the phenomena of
learning can be subsumed under the extremely general principle that
learning ‘has evolved to process information about causal
relationships in environments’ (1977, p.46). Similarly Mackintosh
(1983) suggests that ‘a function view of conditioning’ would
readily accommodate any result showing that ‘a natural causal
relationship’ is easily learned, but with the rider that ‘To the
extent that the causal laws describing the world in which we and
other animals live are generally true, admitting of no exception,
so there should be general laws of conditioning’ (1983, pp.221—2).
In a sense this is Hume’s theory of the perception of cause-and-
effect, turned on its head, since Hume’s point was that what we
believe to be causal relationships in the outer world are merely
subjective impressions based on pairings of events; whereas Revusky
and Mackintosh argue that the mechanisms which determine how and
when an individual forms associations based on the experienced
pairing of events have themselves only evolved because (more often
than not) the operation of these mechanisms will ensure that
learned behaviour will reflect biological truth. With this
principle to hand, we need not be alarmed if animals learn to
associate tastes with illness, since the mechanisms of learning
evolved in a world in which illness is in fact often caused by
ingested food. This still leaves us with the job of describing what
the mechanisms are, and exactly how they operate, but brings in at
the start not only biological function, but the assumption that
some of the details of the processes of learning in any species
have been tuned to the realities of that species’ natural life.

Taste-aversion learning has therefore been
extremely important as a theoretical cause célèbre,
requiring much more explicit acknowledgment of innate
determinants of learning

238

than was previously thought proper. But the phenomenon
itself is readily incorporated into the newly liberated versions of
general learning theory, since the phenomenon is in fact readily
obtainable in a wide variety of animal species, and is readily
explicable as a special case of the two-process theory of avoidance
learning. A general account of taste-aversion learning in several
species, with a common form of explanation, has been provided by
Garcia himself (Garcia, 1981; Garcia et al., 1977b). An
aspect of Garcia’s biological approach is a respect for species
differences, but these appear to be less marked than one might
expect. Birds usually have few taste-buds but excellent eyes, and
one might suppose on these grounds that taste-aversion learning
should be subordinate to sight-aversion learning in birds. Wilcoxon
et al. (1971) did indeed find, in a widely quoted study
using bob-white quail, that if these birds became ill after
drinking blue-coloured and sour-tasting water, they subsequently
avoided blue water more than sour. It is clearly absurd to doubt
that birds (apart from the aberrant kiwis, which are flightless and
nocturnal, and use smell) use vision in food selection, but there
is evidence nevertheless that there is a special connection between
taste and digestive upset even in highly visual species. Extremely
hungry blue jays catch even poisonous-looking butterflies in their
beak, rejecting only those whose taste has been previously followed
by nausea. This indicates a certain general primacy of taste
(Garcia el al., 1977a) and also follows the principle of
matching learning to biological causality, since butterflies which
look dangerous but taste normal are safe (Browner, 1969). More
surprisingly, large hawks (Buteo jarnaicensis, Buteo lagopus)
which have visual receptors in their eyes measured in millions,
but taste receptors on their tongue measure only in tens, also seem
to use taste as the main cue for aversion to poisonous bait. A hawk
used to eating white mice, and given a black mouse made bitter with
quinine, and then made ill with a lithium injection, afterwards
seized and tasted a black mouse, without eating it, and only after
that refused to approach black mice. However, hawks given black
mice which did not have a distinctive flavour, before being
poisoned in the same way, required several poisoning episodes
(instead of just one) to acquire an

239

aversion, and this took the form of not eating either
white or black mice, It thus appears that taste is more readily
associated with illness than are some readily distinguishable
visual features of food, even for the most visual of vertebrates.
Although greater persistence through time of taste cues has been
ruled out as an absolutely necessary aspect of taste-aversion
learning in rats, which do not vomit when ill (Revusky, 1977), it
is very probable that part of the salience of a strong bitter taste
for avian poisoning experiences is due either to its prolonged
after-effects in the mouth or to its presence during vomiting,
which is . a reaction seen in blue jays after eating poisonous
butterflies, and in hawks after lithium injections.

Taste-aversion learning is thus not species-
specific to rats. The result with hawks also contradicts the
ecological hypothesis that rats show the phenomenon only because,
as omnivores, they are likely to sample a wide variety of possibly
dangerous substances (Revusky, 1977). There may be biological
dangers in a carnivorous diet, and this would mean that the ecology
was wrong rather than the relation between ecology and psychology,
but results obtained with captive wolves and coyotes demonstrate
that general as well as specific processes may be engaged by taste-
aversions (Gustavson a al., 1974). A pair of captive wolves
attacked and killed a sheep immediately on the two opportunities
they had before an aversion treatment of being given lithium
chloride capsules mixed with sheep flesh and wrapped in woolly
sheep’s hide. On the next occasion that a sheep was allowed into
their enclosure, they at first charged it, but never bit it. Then
they became playful, but when the sheep responded with threatening
charges, the wolves adopted sub-missive postures and gave way.
Similarly, wildborn captive coyotes were deterred from attacking
rabbits by being given rabbit carcasses injected with lithium
chloride, although for most of them two such poisonings were
necessary. By contrast, if laboratory ferrets were repeatedly made
ill after they had killed and eaten mice, they did not stop killing
mice, even though not only would they not eat the mice that they
killed, but their aversion to mice was apparent in retching when
mice were bitten, and rejection and avoidance of the

240

dead carcass. Less than one in five laboratory rats kill a
mouse put in their cage (but three out of four wild rats kept in a
laboratory: Karli, 1956), but those which do will kill very
consistently, except if given aversion treatments, when they are
rather more flexible than ferrets, since they will stop eating mice
if poisoned after eating, but will also stop killing if poisoned
after killing without being allowed to eat their victim.

The theory offered to explain taste-aversion
phenomena in these various species is a variant of the theory of
classical conditioning, discussed in terms of a ‘hedonic shift’
(Garcia et al., 1977a, pp. 300—6; see this volume, chapter
3, pp. 77—80). Both specific metabolic and reflexive reactions (for
instance nausea and retching) and more general emotional evaluation
on some like-dislike dimension become shifted to the signalling
stimulus, which is usually taste in the first instance, from the
later events of illness. In some species, in particular the wild
canines, attack in a state of hunger is relatively well integrated
with expectations of eating — in the terms of Adams and Dickinson
(198lb), attack is a purposive action rather than an automatic
habit. Therefore in these species an aversion to the goal has a
relatively powerful inhibitory effect on behaviours which lead to
the goal. In other species, or at any rate in domesticated rats and
ferrets, the instinctive behaviour of killing is relatively
independent of representations of the taste of the goal, and
therefore aversion to the taste has less effect on responses which
happen to provide the opportunity for that taste. The hedonic shift
would be expected to be associated with the qualitative aspects of
the unpleasant experience it resulted from, but is not always
limited in that way, since the wolves with a single experience of
taste-aversion modified their behaviour to the extent of adopting
species-typical postures of social submission at the advance of a
now-unpalatable sheep. The other example often quoted in this
context is the positive social behaviour directed by hungry rats at
conspecifics whose presence has become the signal for food
(Timberlake and Grant, 1975). This result supports the idea that
there is a motivational good-bad dimension which is partly
independent of the type of attractive/aversive event experienced,
whether social, oral,

241

intestinal or tactile. Garcia et al. (1977b, p.
284) include as indicators of the bad end of this scale
‘conditioned disgust responses’ which include urinating on, rolling
on or burying food associated with illness in coyotes, and a paw-
shaking gesture in a cougar.

Once a motivational shift has taken place, it
is conceivable that new motor responses could be learned
instrumentally under its influence — an animal might learn to press
a lever to allow itself to escape from close proximity to strongly
disliked food. There is little evidence to show arbitrary responses
being learned in this way, but, as is the case for many laboratory
forms of avoidance learning (Bolles, 1978), once a conditioned
motivational state has been established, certain instinctive but
sometimes goal-directed patterns of behaviour are likely to be
elicited. A superficial similarity between aversive motivational
states established by electric shock and those which result from
poisoning is that both appear to elicit species-specific responses
of burying unwanted objects, although only a limited amount of
information is available on this. Pinel and Treit (1978, 1979)
have however confirmed that rats having received only one strong
electric shock from a wire-wrapped prod mounted on the wall of
their test chamber thereafter appeared motivated to cover up this
object, either by pushing and throwing sand or bedding material
over it with the forepaws, when this was possible, or by picking up
wooden blocks with their teeth and placing them in a pile in front
of the prod, if only wooden blocks were available to them. Rats
will also bury certain objects (a mousetrap or a flashbulb) when
first exposed to them in a familiar territory, but not others (the
wire-wrapped prod or a length of plastic tubing: Terlecki el
al., 1979). Yet another set of species-specific behaviours
which may be changed when underlying motivational shifts are
induced by the artificial means of electrical shocks is seen in the
social behaviour of chickens. Dominance relationships or ‘pecking
orders’ in groups of these birds are usually stable over time.
However, when Smith and Hale (1959) rigged contests between
successive pairs of birds in four-member groups by staging a
confrontation between hungry birds over a plate of food, and
delivering shocks to the initially dominant bird whenever it

242

ate or had social interactions with its partner, they
found that they could completely reverse the rankings initially
observed, and that the reversals lasted for at least nine weeks
without further shocks. It is thus arguable that taste-aversion
learning, and related alterations in the motivational value of
natural stimuli by pairings with other events, rather than
weakening theories of learning, add to their generality by
demonstrating that natural and instinctive behaviours are subject
to learned change, as well as arbitrary or more flexible responses
such as pressing a lever, or running through a maze.

Stress, learned helplessness and self-punishment

I have already had cause to comment on the
fact that exposure to aversive stimuli has physiological effects,
such as changes in heartbeat and in skin conductivity, which can be
used as indices of emotional response, and which may thus be useful
in assessing the degree to which emotional reactions to
aversiveness have become conditioned to prior stimuli. There are a
great many other kinds of physiological reaction, induced by
exposure to aversive stimuli, including release of adrenaline and
corticosteroids by the adrenal glands, and also changes in brain
biochemistry, for example the release of natural opoids (Maier
et al., 1983; Seligman and Weiss, 1980). Many of these
reactions, which are part of the body’s defence against damage and
disease, can usefully be subsumed under the term stress (Selye,
1950) . Physiological stress is an example of an asymmetry between
the motivational systems of reward and punishment. It is possible
to consider emotional excitement representing hope, elation or
satisfaction as being physiological arousal similar to fear, though
opposite in its affective value, and conceivably corresponding
positive reactions for extreme fear can be found in sufficiently
intense cravings for food, drink, drugs and socially and sexually
attractive goals. But there is no departure from physiological
normality due to the experiencing of attractive events which serves
as a counterpart to the changes under the heading of stress which
can be produced by exposure to aversive stimuli.

Stomach ulceration and loss of weight in rats and other
mammals is a relatively indirect way of measuring stress,

243

but serves to indicate long-term effects. Measurements of
ulceration, together with behavioural evidence, suggest that there
are psychological factors in the stress produced by externally
painful experience, even in rats. Seligman (1968) found that
unpredictable shocks, randomly interspersed with visual or auditory
stimuli, produced extensive ulceration in rats, as well as profound
suppression of food-rewarded lever-pressing. Control groups
receiving exactly the same physical intensity of shock, and the
same audio-visual stimuli, but with the shock signalled by these
cues, formed no ulcers, and kept up their usual levels of food-
rewarded behaviour, except in the presence of the shock signals.
Ulceration is also less in rats which receive shock only in the
absence of their own avoiding response, than in animals receiving
identical physical stimulation which is uncorrelated with their own
behaviour, and not otherwise predictable (Weiss, J.M., 1971). Not
surprisingly, in view of these findings, it has frequently been
observed that rats will respond so as to be exposed to signalled
and predictable, rather than to unsignalled and unpredictable
shocks, when given the choice (Lockard, 1963; Miller et al;
1983; Badia et al., 1979).

In monkeys, severe ulceration has been
observed even when hardly any shocks are received, if this is only
the case because the animals are responding continually for long
periods (on a Sidman avoidance schedule, see p. 226) in order to
prevent shocks, and may therefore be assumed to be then in a state
of constant anxiety (Brady et al., 1958). The existence of
this sort of stress response shows both that there may be
distinctive physiological changes produced by aversive laboratory
procedures, and that fairly complex psychological reactions also
occur, particularly involving the predictability of aversive
events, and therefore, of course, the predictability of their
absence. Degree of predictability of events, especially their
predictability on the basis of the subject’s own behaviour, appears
to be something which can itself be learned, with a consequent
influence on more ordinary forms of learning in the future. This is
a conclusion drawn primarily from research into the phenomenon
known as ‘learned helplessness’, which has been extremely
extensive, due partly to the belief held by some that this kind of
learning is an important aspect of

244

human depression, the most common form of mental illness
(see Maier and Seligman, 1976; Seligman, 1975; and Seligman and
Weiss, 1980, for reviews). The initial experiments were performed
on dogs, using a shuttle-box avoidance test like that of Solomon
et al. (1953; see p. 222). Normally in this apparatus,. dogs
learn within 2 and 3 trials to jump over the barrier as soon as
shock is turned on, and eventually learn to jump to the signal
before the shock. However if, before this test, dogs are placed in
a harness and given at random 50 or more shocks which they cannot
escape from, most of them never make even the first escape
response, and few if any ever learn to avoid or even escape from
the shocks consistently. Seligman’s (1975) argument is that the
dogs given inescapable shocks had learned to give up trying, or had
learned that they were helpless to escape shocks. Something of this
kind may indeed occur, but it is likely that this is not the only
consequence of the large number of shocks given in the preliminary
treatment. Either a general emotional exhaustion or specific and
temporary biochemical changes which inhibit temporarily active
learning have been proposed, with some reason, as alternative
explanations (Weiss and Glazer, 1975; Seligman and Weiss, 1980) .
A third possibility, for which there is strong evidence where rats
are concerned (Glazer and Weiss, 1976), though not with dogs
(Maier, 1970), is that during the supposedly helpless phase animals
are in fact learning passive motor strategies which interfere with
later tasks which require highly active behaviour.

The three alternative explanations for the
inability to learn which is the phenomenon which characterizes
‘learned helplessness’ are thus: (1) some kind of physiological
debilitation; (2) an inappropriate, probably passive, response
habit; and (3) a more cognitive set, which in animals must at least
amount to a disinclination to appropriately associate response
output with desirable consequences, and in people might form part
of more elaborate attributional processes, in which helplessness
could be connected to beliefs about one’s own general or specific
inadequacies, or about the unyielding cruelties of an unjust and
uncaring external world (Abramson et al., 1978; Miller and
Norman, 1979; Peterson and Seligman, 1984).

245

Physiological debilitation

There are certainly temporary after-effects of
stressful experiences, which dissipate with time, and which can
depress learning. In the first experiments with dogs, Overmier and
Seligman (1967) could demonstrate ‘learned helplessness’ in shuttle-
box training given within 24 hours of the inescapable shock
treatment, but not if there was a recovery period of two days or
more. Weiss and Glazer (1975) demonstrated that either shock
treatment or exposure to very cold water (2°C) 30 minutes before a
shuttle-learning test reduced the performance levels of rats. They
attribute this to a temporary depletion of adrenalin-like chemicals
in the brain, although since relatively inactive motor tasks were
not affected, more peripheral forms of fatigue may also have
contributed to the reduction in performance on active tasks.
Temporary kinds of exhaustion may thus be important in the early
stages of learned helplessness. But they are not the only factor.
Dogs which have failed once in the shuttle test, given soon after
inescapable shocks, will fail again a month later. On the other
hand dogs allowed to learn to escape first, before being given the
usual stress of shocks in harness, are unaffected even immediately
after the stress (Maier et al., 1969).

Competing response habits

In several experiments (Maier and Testa, 1975;
Seligman and Beagley, 1975; Glazer and Weiss, 1976; Jackson et
al., 1980) rats exposed to inescapable shocks may subsequently
learn a passive response relatively well, but appear to have
difficulty in performing a task differing mainly in the degree of
activity involved. Therefore it is likely that deficits in
readiness to perform very active responses is one of the
consequences of inescapable shock treatment. But there must be more
cognitive or more associative consequences as well. Maier (1970)
showed that even if he explicitly trained dogs to stand still to
escape shock, as a preliminary phase, there was little subsequent
disruption in their ability to learn the usual active shuttling
task. Jackson et al. (1980) observed that pre-stressed rats
were just as active as others in running through a Y

246

maze, but nevertheless very slow to learn to turn in the
same direction every time to escape from being shocked.

Associative or cognitive changes

Since alternative explanations have limited
application, it seems necessary to include a more cognitive
explanation of the phenomenon of learned helplessness (Maier et
al., 1969). A relatively non-committal way of describing this
is to refer to the lack of an expectancy that attempts at active
responding will lessen or terminate experiences. More positively,
inescapable shocks could result in an animal acquiring the
expectancy that shock termination is independent of its behaviour.
This interpretation has added weight because of the finding that
exposure to a zero correlation between tone cues and shocks delayed
the subsequent learning of an association when the tone was now
paired with shock (‘learned irrelevance’). Mackintosh (1973) also
found that a zero correlation between a tone stimuli and the
experience of drinking water, for thirsty rats, similarly retarded
the acquisition of anticipatory licking when the tone was made a
signal for impending delivery of water. There are thus grounds for
believing that an expectational, or associative, mechanism is
affected by the experience of the lack of any correlation between
events. (Maier et al., 1969; Dickinson, 1980). It might be
possible to distinguish the associative aspect of this from the
related motivational deficit or ‘reduced incentive to initiate
responding’ (Rosselini et al, 1982, p. 376). One way of
doing this is to show that there are cross-motivational effects
—Goodkin (1976) showed that deficits in the usual task — shuttling
to escape shock — could be produced by exposure to the relatively
unstressful preliminary experience of receiving deliveries of food
at random, irrespective of any organized action by the animals.
Inescapable shocks did not encourage rapid learning of new
responses needed to obtain food in later tests (Rosselini, 1978;
Rosselini and DeCola, 1981), and impaired the subsequent learning
by rats of whether they should poke their nose through a left-hand
or right-hand hole to produce food, even when training was long
continued, the correct response being changed (reversal learning).
The

247

deficits in this case lasted for long after the animals
had recovered from the temporary suppression of activity produced
by receiving the shocks (Rosselini et al., 1982) .
Experience of severe and unrelievable conditions at an early
encounter with aversive stimuli may thus have long-lasting effects
on future behaviour compatible with some kind of reduced confidence
in the effectiveness of action, or in secure regularities of
events, but it is worth noting that the effects of prior shock
treatments on the subsequent behaviour of rats in the experiments
quoted above were relatively minor, compared with the complete
disruption of escape-learning in dogs observed by Seligman et
al. (1968) and others.

Self-punishment, discrimination and attention

It needs to be emphasized that while an
initial experience of severe and inexorable painful events leads to
later passivity, exactly the same external trauma has quite
different consequences if dogs have already been trained in their
escape task beforehand, in the sense that there do not appear to be
any consequences in this case, since the dogs’ performance on the
already learned task is unaffected, and they go on to escape and
avoid normally (Seligman et al., 1968). Therefore the order
of various learning experiences is crucial, and this is
particularly so when strong aversive events are involved —possibly
making another special feature of punishment as opposed to reward.
The long-lasting and counter-productive fixation of initial
learning was apparent in the procedures of Solomon et al.
(1953), already described (p. 222). Dogs which had learned to
jump over their hurdle at a signal in order to avoid shocks in a
shuttle-box were undeterred by a new arrangement in which jumping
brought them towards an electrified floor instead of away from it.
Some dogs made anticipatory yelps while jumping, and the
experimenters concluded that the high emotionality caused by the
reintroduction of shocks after the dogs had learned to avoid had
strengthened rather than weakened the tendency to jump. It seems
plausible that in instances of this kind, where observational
evidence of autonomic arousal was described in terms of ‘symptoms
of terror’ (Solomon et al., 1953), the repetition

248

of a previously learned response should be regarded as
panic-stricken reliance on first impulses. However, it demonstrates
that repeated inescapable shocks (once the animals had jumped on
the electrified floor, a gate was lowered behind them) are
compatible with highly active responding, as well as with the
passivity of learned helplessness.

One argument is that both passivity and
jumping away are alternative kinds of natural and instinctive
responses to pain, one or other being selected in a very obvious
way by variations in procedure, since passive animals have been
prevented from moving away from shocks (in many cases by being
physically restrained in a harness) and active dogs have been first
trained to jump (Bolles, 1970, 1978). This is clearly a major
factor, but it is worth also bringing in the difficulty for the
animals of distinguishing precisely what might be the best option,
especially under conditions of high emotional arousal. Solomon
et al. ‘s animals (1953) had already learned that a signal
might be followed by shock. They alternated from one side of the
shuttle box to the other, and therefore were shocked on both sides
both in early training and in the punishment procedure. Once in a
state of fear, they had initially learned that jumping could reduce
fear, or was otherwise advantageous in avoiding shocks. Finally,
the punishing shocks were of relatively brief duration (3 seconds)
and the dogs would have experience of much longer episodes early in
training. Therefore it is to some extent understandable that the
animals had difficulty in discriminating what was obvious to the
experimenters — that jumping, which had once been required in
rather similar circumstances, should now be abandoned.

The absence of discrimination between different sources of
fear was implicit in the theory of self-punitive or ‘vicious
circle’ behaviour originally put forward by Mowrer (1947),
derived from the two-process theory of the effects of aversive
events. If a response has been learned under the influence of
conditioned fear, then punishment, especially if it involves the
reinstatement of the original aversive event, may add to
conditioned fear, and thus enhance the motivation for the punished
response. But particular sources of confusion between necessary and
unnecessary activities can sometimes

249

be identified as adding to the likelihood of maladaptive
behaviours. Brown (1969) reviewed a number of experiments in which
rats ran towards a source of electric shocks, thus exposing
themselves to aversive events which they could avoid by not so
running. But in most cases it is clear that that activities which
are elicited by the aversive stimuli, and also the responses which
terminate them, are similar in topography or type to the behaviours
which ensure continued exposure. For instance, in the experiment by
Melvin and Smith (1967), rats first trained to run down an alley
into a safe goal box, in order to avoid receiving shock from the
floor of the alley, continued to run (or even started to run again
after a period when no shocks were given) when the reality of the
apparatus was that the middle section of the alley only was always
electrified, and shocks could be avoided more completely by
freezing in the start box than by running very fast over the
electrified segment of the runway. The difficulty of distinguishing
running towards the safe goal box, after getting shock, and running
in the same direction before the shock, presumably contributed to
this result.

Attention and aversive events

Since fleeing from danger is ecological
necessity for many species in the wild, and even civilized life may
be motivated to a substantial extent by apprehension and annoyance,
it would be very odd if all forms of learning motivated by even
mildly undesirable emotion led to helplessness, depression, or
further unnecessary disasters. It should therefore be acknowledged
that the phenomena in this section are anomalies, which may reveal
an asymmetry between rewarding and punishing motivational
mechanisms, in extreme or unusual circumstances, but which but do
so only over and above an underlying functional similarity between
the ability to seek out pleasure on the one hand and security and
safety on the other. In the diagram due to Gray (1975) on p. 215
(Figure 7.2) the symmetry of rewarding and punishing mechanisms is
maintained when both add to arousal of some kind. Animals ought to
be alerted by receipt of either wanted or unwanted outcomes, even
though subsequent learning should be

250

directed at increasing such receipts in one instance and
decreasing them in the other. The differing advantages in
maintaining alertness to relatively distant negative as opposed to
positive outcomes might mean that some species give higher
priorities to one rather than the other case. However, the shared
advantage of attention to either kind of motivating event may be
responsible for the so-called paradoxical effect of mild punishment
for correct choices, in increasing rather than decreasing correct
choices in certain kinds of food-reinforced discrimination learning
(Muenzinger, 1934; Drew, 1938; Fowler and Wischner, 1969). The
belief that painful stimuli should motivate learning in general,
instead of merely motivating escape, is not altogether without
foundation, even though many educational practices based on this
belief (for instance the ‘beating of the bounds’ of the City of
London, when new apprentices were ceremonially whipped at a series
of landmarks as an aid to memory) are very happily discontinued.

Reward and punishment: conclusions

The briefest possible summary is the assertion
that rewards are wanted and punishments unwanted experiences, which
implies a similarity if not an identity of motivational processes
based on attractive and aversive events. However, it would not be
surprising if the biological priorities differed as between flight
from dangerous or painful stimuli on the one hand, and the pursuit
of attractive social or consumable goals on the other. There are in
fact both anatomical and behavioural grounds for assuming that
strongly aversive stimuli have a greater emotional loading, and a
less flexible connection with instinctive patterns of behaviour,
than strongly attractive stimuli used in similar ways. But for both
attractive and aversive stimuli, behavioural experiments can
demonstrate automatic emotional anticipation of significant events,
instinctive behaviours released as a result of this, and
modification of initial behaviours according to their costs and
benefits.

There are thus many similarities between rewards and
punishments used as motivating events in animal learning
experiments, and it is arguable that asymmetries between

251

attractive and aversive motivation can be interpreted as
matters of degree — unpleasant events merely being more likely to
produce conditioned emotional states and associated instinctive
reactions than pleasant stimuli of roughly the same motivational
weight. This approach would certainly apply to the many experiments
in which animals appear to weigh positive and negative outcomes
against each other.

However, in the context of severe anxiety and
stress, it seems necessary to appeal to special factors which apply
to aversive but not to appetitive motivation. Some of these are
undoubtedly physiological, and directly related to the reactions of
the autonomic nervous system to aversive stimuli. Others may be
more cognitive in nature, in the sense that they reflect either
instinctive defensive reactions of particular species or more
general asymmetries in the processing of attractive and aversive
information.