Abstract

When humans gambling they choose an outcome that has high value but a low probability of occurrence over a more favourable high probability outcome of lower value (not gambling). Similarly, pigeons show a preference for an alternative that occasionally provides a signal for reinforcement over a more optimal alternative that always provides a signal for a lower probability of reinforcement. Two mechanisms appear to be responsible for this suboptimal behaviour: the signal for non-reinforcement (losing) appears to result in little or no inhibition and the probability of the signal for reinforcement is relatively unimportant. Human gambling behaviour appears to be controlled by similar mechanisms. Also, we have found that as with human gambling, pigeons that are more motivated to win choose less optimally. Furthermore, pigeons exposed to an enriched environment choose more optimally than those normally housed. Similar to humans, individual differences in impulsivity by pigeons predict attraction to the suboptimal alternative by pigeons. These findings may have implications for the treatment of humans who have problems with gambling behaviour.

Keywords

Introduction

It is well known that humans often make bad economic decisions or suboptimal choices [1], for example, when they engage in
commercial gambling (games relying solely on chance like playing slot machines or buying lottery tickets). They do so supposedly
because gambling is thought to be a form of entertainment [2]. The entertainment value appears to be related to the prospect of
winning a large sum of money but in commercial gambling the expected value is negative (almost always a loss).

In this regard, humans are thought to be different from other animals because according to behavioral ecologists, animals
have been selected by evolution to forage for food optimally [3,4]. Entertainment should not be a consideration. Furthermore, experimental psychologists have argued that even in somewhat unnatural laboratory contexts, such as laboratory learning tasks, if one allows for sufficient experience with the procedure and the conditions of reinforcement are discriminable, animals should be sensitive to the probability of reinforcement associated with their choices. That is, they should learn to choose optimally [5]. If this analysis is correct, animals should learn to choose optimally under conditions similar to those of commercial gambling tasks.

In contrast to prevailing thought, we and others have found that animals often choose sub optimally, just as humans do.
The implication of these findings is that the mechanism that supports suboptimal gambling may be similar in humans and
other animals and animals will do so because although winning has great positive value, losing does not have the negative
value that it ought to have to inhibit the suboptimal response. Evidence will show that although winning is the motivation that
drives suboptimal choice, surprisingly, the probability of winning appears to play little role in the choice. In this review, I will first
demonstrate conditions under which suboptimal choice occurs in pigeons. I will then attempt to identify the mechanism involved
in the suboptimal choice. I will then examine the relation between this behavior in pigeons and gambling in humans to conclude
that similar processes are very likely involved. Finally, I will examine several subject related demographics that we have found
affect suboptimal choice in pigeons and are also thought to affect the tendency to engage in similar behavior by humans.

The Added Value of Information

Our research on suboptimal choice began with a somewhat different question. Would pigeons prefer to choose an alternative
that provides information (sometimes a cue that signals reinforcement, “good news,” sometimes a cue that indicates the absence
of reinforcement, “bad news”) over an alternative that provides no news (i.e., the probability of reinforcement was equally likely, 50%, for both alternatives)? The answer was clearly that providing pigeons with a cue that signaled reinforcement and a different
cue that signaled the absence of reinforcement was preferred over an ambiguous cue that signaled that reinforcement might
occur [6] (Figure 1). This finding is consistent with information theory [7] which proposes that any stimulus that reduces ambiguity
will provide valued information. According to information theory, however, the amount of information provided by cues that
signal the outcome should be maximum when the outcome is most ambiguous (i.e., when either outcome has a 50% chance of
occurrence). If a positive outcome is more likely to occur or less likely to occur, less information should be provided by the cues.
When the probability of reinforcement was increased equally to both alternatives (87.5%) the preference for the good-news/badnews
alternative decreased.

Figure 1: Roper and Zentall’s design. Choice of one alternative was followed by a stimulus (e.g., red) 50% of the time always followed by
reinforcement or a different stimulus (e.g., green) 50% of the time never followed by reinforcement. Choice of the other alternative (i.e., right)
was followed by either of two stimuli (blue or yellow) both of which were followed by reinforcement 50% of the time. Spatial location and colors
were counterbalanced over subjects. Inconsistent with information theory, when the probability of reinforcement was decreased equally to
both alternatives (12.5% reinforcement for both alternatives) the preference for the good-news/bad-news alternative actually increased.

Furthermore, when the response requirement for the good-news/bad-news alternative increased, pigeons still preferred it, in
spite of the fact that the increased response requirement increased effort and delayed the scheduled reinforcement. This finding
suggested to us that pigeons could be induced to choose suboptimally when it meant obtaining less food.

The Suboptimal Choice Experiment

To test this idea, pigeons were asked if they would prefer the informative good-news/bad-news alternative over the
uninformative alternative even if the choice was suboptimal (resulted in lass food). We found that when pigeons were provided
with a choice between a 20% chance of signaled reinforcement and a 50% chance of unsignaled reinforcement, the informative
alternative was strongly preferred (almost 100%) over the optimal 50% reinforcement alternative [8,9] (Figure 2).

Figure 2: Design of the Stagner and Zentall experiment. Pigeons choose between two alternatives. Choice of one alternative (e.g., left) is
followed sometimes (20% of the time) by a stimulus (e.g., red) that is always followed by reinforcement or by at other times (80% of the time)
by a different stimulus (e.g., green) that is never followed by reinforcement. Choice of the other alternative (i.e., right) is followed by blue or
yellow each of which is followed by reinforcement 50% of the time. Spatial location and colors were counterbalanced.

Avoidance of Optimal Alternative Ambiguity?

In the experiment, perhaps it was not so much the desirability of the signal for good-news that attracted the pigeons to the
suboptimal alternative but the aversiveness of the optimal alternative ambiguity associated with 50% reinforcement [8]. To test this
hypothesis, the magnitude of reinforcement was manipulated rather than its probability [10]. Pigeons were given a choice between
one alternative that provided a signal for 10 pellets of food 20% of the time or a signal for no food 80% of the time and another
alternative that provided two signals that each predicted 3 pellets of food (Figure 3). In this case, in spite of the fact that both
alternatives provided a reliable signal for food, pigeons preferred the alternative that provided a “jackpot” of 10 pellets (but only
on 20% of the trials) over a consistent 3 pellets.

Figure 3: The Zentall and Stagner design. Pigeons chose between a vertical and a horizontal line. Choice of one alternative was followed
either by a stimulus (e.g., red) on 20% of the trials that was always followed by 10 pellets of reinforcement or by a different stimulus (e.g.,
green) on 80% of the trials followed by the absence of reinforcement. Choice of the other alternative was followed by blue or yellow stimuli
followed by 3 pellets of reinforcement. S Colors and spatial location were counterbalanced.

The results of these experiments suggested that the stimulus signaling the absence of food (the S- stimulus) was ineffective
in inhibiting choice of the suboptimal alternative. Instead, it was the predictive value of the food signaling stimulus (the S+)
that determined choice. Thus, it follows that the frequency of the S+ stimulus played little role in choice of the suboptimal
alternative. Manipulation of the magnitude of reinforcement indicates the generality of the suboptimal choice phenomenon and
the magnitude of reinforcement manipulation is more like human gambling behavior in which humans wager a certain amount of
money (the optimal alternative) in the hope of gaining a larger amount [11,12] (for similar results with monkeys).

The Role of the Frequency of the Suboptimal S+

In spite of the fact that the S- stimulus appeared 80% of the time in both experiments, it did not seem to inhibit choice of the
suboptimal alternative. Rather it appears that it was the predictive value of the S+ stimulus not its probability of occurrence that
determined choice. To test this hypothesis more directly, we pitted the probability of reinforcement against the signaling value of
the stimuli that followed the choice [13]. In this experiment, pigeons chose between one alternative that provided an S+ stimulus
on 20% of the trials or an S- stimulus on 80% of the trials and a second alternative that provided an S+ on 50% of the trials or an
S- on 50% of the trials (Figure 4). In keeping with the hypothesis that the frequency of the S+ stimuli played little role in choice
between the two alternatives, the pigeons were indifferent between them [14,15].

Figure 4: Design of Stagner et al. Pigeons chose between two alternatives that were distinguished by discriminative stimuli (a vertical or
a horizontal line). Choice of one alternative was followed either by a stimulus (e.g., red) on 20% of the trials that was always followed by
reinforcement or by a different stimulus (e.g., green) on 80% of the trials that was never followed by reinforcement. Choice of the other
alternative was followed either by a stimulus (e.g., blue) on 50% of the trials that was always followed by reinforcement or by a different
stimulus (e.g., yellow) on 50% of the trials that was never followed by reinforcement. Spatial location and colors were counterbalanced.

The stimulus value hypothesis can account for the results of the preference for “information” experiment reported by Roper
and Zentall in which as the probability of reinforcement decreased, the preference for the discriminative stimulus alternative
actually increased [6]. The stimulus value hypothesis suggests that the probability of the occurrence of the signal for reinforcement
is of relatively little importance, only its predictive value. In their experiment, as the probability of reinforcement was reduced from
87.5% to 50% to 12.5% the predictive value of the signal for reinforcement remained the same (100%). However, the predictive
value of the signal for reinforcement that followed choice of the other alternative decreased from 87.5% to 50% to 12.5%. Thus, as
the probability of the occurrence of the signal for reinforcement decreased, the relative predictive value of that signal increased.

As further evidence of the relative unimportance of the frequency of the signal for reinforcement, Vasconselos, Montiero,
and Kacelnik repeated the Stagner and Zentall experiment in which pigeons were given a choice between a signaled 20%
reinforcement and an unsignaled 50% reinforcement using starlings as subjects and found similar results [8,16]. They then reduced
the frequency of the S+ stimulus that followed choice of the suboptimal alternative from 20% to 10%, then 5%, and finally 0%.
Reducing the frequency to 10% resulted in no reduction in choice of the suboptimal alternative. Reducing the frequency to 5%
resulted in a small drop in preference for the suboptimal alternative, to about 70% but it was not until the S+ frequency dropped
to 0% that the starlings showed a clear preference for the optimal alternative.

A recent experiment by Smith and Zentall tested the stimulus value hypothesis under conditions involving a choice between
signaled 50% reinforcement and signaled 100% reinforcement [17] (Figure 5). Although the optimal alternative always provided
food and provided it twice as often as the suboptimal alternative, because both S+ stimuli predicted reinforcement equally, as
predicted by the stimulus value hypothesis, the pigeons were indifferent between the two alternatives. Earlier research suggested
that pigeons often showed a strong preference for the suboptimal alternative but that research [18,19].

Figure 5: Smith and Zentall design. Pigeons chose between two alternatives. Choice of one alternative (e.g., the plus) was followed by a red
stimulus 50% of the time always followed by reinforcement or a green stimulus 50% of the time that was never followed by reinforcement.
Choice of the other alternative (i.e., the circle) was followed by a yellow stimulus followed by reinforcement 100% of the time.

Always used alternatives that differed only spatially and when pigeons were indifferent between two alternatives they often
adopted a spatial preference. Smith and Zentall, however, used a visual discrimination for which the discriminative stimuli changed
location randomly from trial to trial so if the pigeons developed a spatial preference it would not show up as a preference for either
alternative [17]. Consistent with this hypothesis, in each of the previous experiments some pigeons showed a strong suboptimal
preference, whereas others showed a strong optimal preference [20]. Thus, the large preference for the suboptimal alternative
demonstrated by some subjects with this design was likely influenced by strong spatial preference unrelated to the stimuli or
reinforcement contingencies that followed.

It is unlikely, however, that all of the preference for the suboptimal alternative found when the initial-link discrimination was
spatial can be attributed to spatial preferences unrelated to actual preferences for the suboptimal alternative. First, in most of the
reported research in which a spatial discrimination was used, more pigeons chose suboptimally than chose optimally.

Second, Belke and Spetch, reversed the contingencies associated with the initial stimuli for the three (of the eight) pigeons
that showed a strong preference for the suboptimal alternative and found that the spatial preferences reversed as well [18]. Third,
when a 5-s gap was inserted between the offset of the initial stimulus and the suboptimal conditioned reinforcer it resulted in
choice of the optimal alternative, however, when a similar gap was inserted between the offset of the initial stimulus and the
optimal conditioned reinforcer, it had little effect on the preference for the suboptimal. Furthermore, it is likely that variables
other than the value of the terminal link stimulus affect choice of the suboptimal alternative. For example, when the terminal
links are signaled, Dunn and Spetch found that short duration initial links produced suboptimal choice but no longer ones. They
found suboptimal choice only when choice involved a single peck. When it took more time to get to the terminal link (from variable
interval 10 s to variable interval 80 s) the pigeons preferred the optimal alternative. On the other hand, in general, the duration
of the terminal link did not affect the preference for the signaled terminal links [19] Spetch, found a preference for the optimal
alternative when the terminal link duration was quite short.

The Conditioned Reinforcer Value

The research described raises questions about what is responsible for the suboptimal choice effect. In most research, the delay to reinforcement on each trial following choice of the alternative is carefully controlled. That is, the outcome that was
scheduled to occur following each initial choice always occurs a fixed time following choice [21]. A fixed delay was used because it
is well known that any differential delay to reinforcement can result in considerable discounting of the reinforcer. The fact that the
frequency of the signal for reinforcement is relatively inconsequential in the preference for the suboptimal alternative suggests
that the probability of reinforcement associated with each initial link (the primary reinforcing value) is relatively unimportant and
choice is determined by the secondary (or conditioned) reinforcing value of the stimulus that follows the choice [22]. Although
all primary reinforcers occur the same time after choice, the conditioned reinforcers typically follow choice immediately. Thus,
there is little discounting of the conditioned reinforcers whereas there is considerable discounting of the primary reinforcers.
This may explain why the predictive value of the conditioned reinforcers determines choice. If this interpretation is correct it
offers a possible explanation for the suboptimal choice. That is, when pigeons appear to prefer 20% reinforcement over 50%
reinforcement they are really showing a preference for a reliable signal for reinforcement (100%) associated with choice of the
suboptimal alternative over the less reliable signal for reinforcement (50%) associated with choice of the optimal alternative.
Similarly, when pigeons appear to prefer an average of 2 pellets over 3 pellets they are actually showing a preference for 10 pellets
over 3 pellets. Furthermore, when the S+ stimuli that follow choice have equal predictive value, this hypothesis can account for
indifference between the two alternatives, independent of the frequency of those stimuli [13,17,23].

Alternatives to the Stimulus Value Hypothesis

Mazur proposed that preference for initial-link stimuli is determined by the conditioned reinforcers that follow and the value
of the conditioned reinforcers is inversely related to the total time spent in their presence prior to primary reinforcement. In the
case of 20% signaled reinforcement vs. 50% unsignaled reinforcement (Figure 2), the suboptimal alternative would be preferred
because reinforcement would follow the unsignaled conditioned reinforcer only 50% of the time, whereas reinforcement would
follow the signaled conditioned reinforcer 100% of the time [24]. In the case of 50% signaled reinforcement vs. 100% reinforcement (Figure 5) both conditioned reinforcers are followed by reinforcement 100% of the time. Thus, pigeons should be indifferent
between them. This theory is very similar to the stimulus value hypothesis.

A somewhat different interpretation of suboptimal choice by pigeons was proposed by Stagner and Zentall. They proposed
that the preference for 20% signaled reinforcement over 50% unsignaled reinforcement, resulted from the change in expected
value of the initial link (20% reinforcement) to the signal for reinforcement (100% reinforcement) a change that should produce
strong positive contrast (but little or no negative contrast between the initial link 20% reinforcement and the stimulus that
signals 0% reinforcement). Most important, for the 50% reinforcement alternative, there should be no contrast between the
initial link 50% reinforcement and the terminal link stimulus that signals 50% reinforcement. The contrast account could also
account for indifference between the suboptimal alternative and optimal alternative when the choice was between 50% signaled
reinforcement and 100% reinforcement reported by Smith and Zentall because the positive contrast that occurred following
choice of the suboptimal alternative upon presentation of the S+ stimulus (50% expected, 100% received) would be reduced by
the negative contrast found upon presentation of S- stimulus (50% expected, 0% obtained).

To account for the results found with both designs, McDevitt et al. suggested that preference for the suboptimal alternative
could be explained by the reduction in delay to reinforcement signaled by the appearance of the S+ that follows the suboptimal
choice (similar to the contrast account) [25]. They called this the Signal for Good News hypothesis. According to this hypothesis,
in the case of 20% signaled reinforcement vs. 50% unsignaled reinforcement, there would be a large reduction in the delay to
reinforcement signaled by appearance of the suboptimal S+, whereas there would be no reduction in the delay to reinforcement
signaled by appearance of the optimal S+. Thus, the suboptimal alternative would be preferred. But what about the case of 50%
signaled reinforcement vs. 100% reinforcement. In this case as well, there would be a large reduction in the delay to reinforcement
signaled by appearance of the suboptimal S+, whereas there would be no reduction in the delay to reinforcement signaled by
appearance of the optimal S+. But as noted no reliable preference was found. To account for the absence of a preference with
that design, McDevitt et al. proposed that there was also an effect of the difference in primary reinforcement between the two
alternatives and it was assumed that in this case the two effects must cancel out. The difference in delay reduction and primary
reinforcement between the two alternatives can also account for the absence of preference between 25% signaled reinforcement
(suboptimal) vs. 75% unsignaled reinforcement (optimal) and 50% signaled reinforcement (suboptimal) vs. 75% unsignaled
reinforcement (optimal) found by Zentall et al [26]. In that case, there would be greater delay reduction (or contrast) associated with
the appearance of the signaled reinforcement that appeared following choice of the 25% reinforcement alternative than the greater
primary reinforcement associated with the 50% reinforcement alternative but the greater primary reinforcement associated with
the 50% reinforcement alternative would make the 25% and 50% signaled reinforcement alternatives comparable. The problem
with this hypothesis is that with two opposing preference inducing mechanisms, the theory can account for almost any outcome.

Is there Inhibition Associated with the Stimulus that Predicts the Absence of Food?

If the frequency of the signal for reinforcement has little effect on the choice between the two alternatives, it suggests that
the signal for the absence of reinforcement is associated with little inhibition. Laude et al. tested this hypothesis more directly
by using the magnitude of reinforcement design (Figure 3) developed by [27,28]. To assess the development of inhibition, Laude
et al. tested for inhibition using a compound cue test, early in training or late in training, after preference for the suboptimal alternative had stabilized [29]. The combined cue test assesses inhibition by presenting a presumed inhibitory stimulus together
with a known S+ and noting the decrease in responding to the compound. Results indicated that early in training there was a
significant reduction in responding to the S+ stimulus when the S- was presented in compound with it (i.e., there was significant
inhibition) but not late in training. Thus, paradoxically, what started out as significant inhibition early in training dissipated with
further training and Stagner, Laude, and Zentall showed that this effect did not result from the ability of the pigeons to turn away
from the S- stimulus when it appeared [10].

A theory of human gambling based on the absence of conditioned inhibition to losses also has been proposed to account
for human gambling. Breen and Zuckerman, for example, reported that habitual gamblers have been found to attend to their
infrequent wins but much less to their considerably more frequent losses than occasional gamblers [30]. Similarly, problem
gamblers are less sensitive to aversive conditioning which should also serve to inhibit behavior [31].

The results of these pigeon experiments are consistent with human gambling research that has found that conditioned
reinforcers play an important role for problem gamblers, whereas conditioned inhibitors exert very little control over their decisions
to gamble [32-36]. Furthermore, problem gambling in humans is clinically recognized as an impulse control disorder in which people
show impaired behavioral inhibition and a failure to consider the long-term consequences of the decisions they make [37]. Thus,
much like pigeons, problem gamblers have a strong attraction to the signal for a large or highly probably reward without regard for
the generally suboptimality of their choice.

The Bias for Certainty over Uncertainty

The Allais paradox or the certainty effect has shown that humans show paradoxical choice behavior [38,39]. For example, given
a choice between a 100% chance of earning $5 or an 80% chance of earning $10, most people choose the certain $5, although
the average return on the 80% chance of earning $10 is higher ($8). But if one reduces both of the probabilities by one half (i.e.,
a choice between a 50% chance of earning $5 and a 40% chance of earning $10), the opposite preference is typically found.
According to expected utility theory, the results of the second choice should be the same as the first choice but they are not [40].
Subjects often report that they prefer the certain $5 because they would be especially disappointed if they chose the 80% chance
of $10 and lost, whereas their preference for the 40% of obtaining $10 is that they might almost as easily have lost had they
chosen the 50% chance of obtaining $5.

If avoiding the possibility of a loss is why humans choose suboptimally, it could also explain why pigeons choose the
alternative that provides the conditioned reinforcer that predicts 100% reinforcement over the alternative that provides a
conditioned reinforcer that predicts 50% reinforcement. To test this possibility, we conducted an experiment much like that of
Stagner and Zentall in which all of the reinforcement associated with the conditioned reinforcers were reduced by 20% [8]. That
is, the stimulus that predicted reinforcement occurred on only 20% of the trials, however, on those trials, reinforcement occurred
only 80% of the time. Thus, reinforcement was no longer certain. Once again, however, the pigeons showed a strong preference for
the suboptimal alternative [41]. Thus, uncertainty associated with the conditioned reinforcer that followed choice of the suboptimal
alternative did not deter the pigeons from choosing suboptimally. It may be that if the percentage of reinforcement associated with
the low probability, high payoff stimulus was reduced still further the pigeons’ choice would have reversed their preference and
chosen optimally. Certainty, however, it does not appear to be the mechanism responsible for suboptimal choice in the experiment
in which magnitude of reinforcement was manipulated because the conditioned reinforcers associated with both alternatives
predicted reinforcement 100% of the time [28]. One difference between what we did with pigeons and the procedures used with
humans is the that the pigeons experienced the probabilities whereas the humans are told what they are. Harman and Gonzalez
found that when humans choose based on experience rather than being told the probabilities of the outcomes they are more likely
to choose optimally [42].

The Immediacy of the Terminal Link Stimuli

According to the stimulus value hypothesis, indifference between the optimal and suboptimal alternatives results from the
similar value of the terminal link stimuli that predict reinforcement (both predict reinforcement 100% of the time). However, there
may be some differences between the two conditioned reinforcers. McDevitt et al. gave pigeons a choice between 50% signaled
reinforcement and 100% reinforcement [21]. When they inserted a dark 5-s gap prior to the onset of the S+ stimulus that followed
choice of the suboptimal alternative it resulted in a large reduction in the preference for that alternative. That is reasonable
because delaying the onset of the conditioned reinforcer diminishes its effectiveness. However, when a similar gap was inserted
prior to the onset of the S+ stimulus that followed choice of the optimal alternative, it had little effect on the preference for the
suboptimal alternative. McDevitt et al. reasoned that the resolution of uncertainty enhances the value of the stimulus that resolves
it. Although the S+ stimulus that follows the suboptimal alternative resolves uncertainty, the S+ stimulus that follows the optimal
alternative does not (the expected probability of reinforcement does not change) [21]. However, according to this hypothesis, Smith
and Zentall should have found a preference for the suboptimal alternative but instead they found indifference [17].

The Relation between Suboptimal Choice and Impulsivity

It is well accepted that the rate at which rewards are discounted with increasing delay is a measure of the impulsivity of the

organism [43]. If delay discounting is the mechanism responsible for the suboptimal choice, one would expect to see a correlation between the slope of the discounting function in a delay discounting task and the development of a preference for the suboptimal alternative in the suboptimal choice task. Laude et al. fit pigeons’ delay discounting data to the hyperbolic function [V = A/(1 + kD)] in which V is the value of the reinforcer, A is a measure of the magnitude of reinforcement, D is the delay between the choice response and reinforcement, and k is a free parameter that determine the rate at which V decreases with increases in D, or it can be described as the slope of the discounting function [44]. They then trained pigeons on the suboptimal choice task using the Zentall and Stagner procedure involving a choice between a 20% chance of obtaining 10 pellets and a 100% chance of obtaining 3 pellets [28]. A significant positive correlation (r=0.84) was found when suboptimal choices for each pigeon were compared with the mean k value from the discounting task for each pigeon. That is, choice of the suboptimal alternative and the slope of the delay discounting function were highly related. Thus, although all reinforcers on a trial were equally delayed, the S+ stimuli that signaled their appearance bridged that delay to the extent that they were valid predictors of reinforcement or its magnitude and their ability to bridge the delay determined their suboptimal alternative preference.

Human Performance on the Suboptimal Choice Task

Although there are differences between the procedures involved in human gambling decisions and the procedures used with
pigeons, we hypothesize that the underlying processes may be quite similar. Molet et al. tested this proposition with a modified
version of the suboptimal choice task used by Zentall and Stagner in which pigeons preferred the suboptimal choice of a 20%
chance of obtaining 10 pellets over a 100% chance of obtaining 3 pellets [28,45]. The human experiment involved a video game
in which subjects chose between two planetary systems each involving two planets that were distinguished by their color. Each
planet was being invaded by aliens and the subjects were to move the mouse over the invaders and click to fire at their space
ships. The purpose of the video game was to keep the subjects attentive during the 10 s between choice and the end of each
trial. If they chose the suboptimal alternative, 20% of the time they were sent to a planet where they could obtain 9 - 11 points
or 80% of the time to a planet when they could not obtain any points. If they chose the optimal alternative, they were always sent
to one of two planets where they could obtain 2 - 4 points. Thus, choice of the suboptimal alternative provided an average of 2
point per trials whereas choice of the optimal alternative always provided them with 3 points per trials. Subjects were instructed
to try to obtain as many points as they could. It was found that humans who reported that they regularly engaged in commercial
gambling chose the suboptimal alternative significantly more than non-gamblers. These results suggest that mechanisms found
to be involved in suboptimal choice by pigeons may also be applicable to human gambling.

Task Differences

When humans gamble, the choice can be thought of as a go/no-go decision because humans can choose to gamble with
money that they have or they can refrain from gambling. Pigeons, however, choose between an optimal and a suboptimal outcome,
both of which involve obtaining resources that they do not already have. Although the procedures differ, it should make it more
likely that humans would not gamble because not only do they have a choice between a probabilistic and a sure outcome but
the sure outcome would be immediate (money in their pocket) whereas the probabilistic outcome would be delayed by the time
it takes to gamble and learn about the outcome. This may help explain why only a small percentage of humans are problem
gamblers. In fact, if the suboptimal outcome is delayed for pigeons, relative to the optimal outcome, we have that the pigeons
begin to choose optimally [41].

As humans choose to gamble with money they have, unlike pigeons, their losses are money they must give up, rather than
the absence of reinforcement. This distinction may be important because Kahneman and Tversky have found that although gains
that are certain are preferred over proportionally larger gains that are probabilistic (the certainty effect), losses that are certain
are avoided over proportionally larger losses that are probabilistic (the reflection effect) [46]. That is, there is a stronger bias to win
back losses than to obtain gains, an effect that typically encourages gamblers to keep gambling when they lose [45]. Although it
would be difficult to create a task in which pigeons, like humans, can choose to gamble with a reinforcer that they already have,
as already noted, self-reported gamblers were found to be more likely to choose suboptimally than self-reported non-gamblers [45].
Thus, the go/no-go choice provided by commercial gambling and the two-alternative choice provided by our analog task appear to
be comparable and the difference does not appear to be responsible for the suboptimal choice by pigeons. Furthermore, although
one might view human gambling losses as the loss of an investment, pigeons’ suboptimal choice represents a real opportunity
cost. The major difference being that humans can gamble until they have no more money (but of course many gamblers then
borrow money) whereas the pigeons can gamble indefinitely.

The Role of Conditioned Reinforcers in Human Gambling

The suboptimal choice task that we have used with pigeons uses the appearance of conditioned reinforcers following choice
but prior to the appearance of the outcome. The results of a simple thought experiment suggest that conditioned reinforcers are
also present when humans engage in commercial gambling. The three reels on a slot machine, for example, can be thought of as
conditioned reinforcers. The question is would people engage in gambling if the reels on the slot machine were obscured. That is,
if the only outcome of money inserted in the machine would be either nothing or money falling into the coin tray, gambling might
be much less likely. A similar argument can be made for other games of chance (e.g., roulette and blackjack). Thus, although there may be some procedural differences between the pigeon suboptimal choice task and human commercial gambling, the important
elements of the two are actually quite similar.

The Near-Hit Effect: When Humans and Pigeons Differ

One way in which pigeons appear to differ from humans in their preference for the suboptimal alternative is in the effect
of outcomes that indicate a loss but appear to come close to winning, a near hit (sometimes paradoxically referred to as a near
miss). An example of a near hit outcome can best be described using a three-reel slot machine. A winning outcome consists of
lining up three of the same symbols, one on each reel (e.g., three cherries). Any mixture of different symbols represents a loss
but not all losses are considered equal by human subjects. For example, two reels with cherries followed by a reel with a bell
represent a loss that to many gamblers is judged to be closer to winning than when the reel with the bell comes between the two
reels with cherries [47]. When MacLin et al. gave subjects a choice among three machines, one that gave near hit trials 15%, 30%,
or 45% of the time, the subjects preferred the machine that gave near hit trials most often [48]. Griffiths proposed that near hits encourage further game play because even though subjects are still losing, they feel that they must be doing something right [49]. Langer proposed that the near hit outcomes give gamblers the illusion of control [50]. That is, getting close to winning suggests that there may be skill involved in this game of chance. In games involving skill, such as shooting basketballs, near hits can provide feedback on how to modify behavior to be more successful in the future but in games of chance, such feedback has little effect on the likelihood of future winning.

Although it has been proposed that rats, like humans, show a preference for near hit trials, the effect is actually quite
different because with three successive lights signaling a win (111), the rats responded just as much to any two lights, irrespective
of their order (110, 101, and 011). For humans, on the other hand, 110 would be considered a near hit, whereas 101 and 011
would be considered clear losses [15].

Recently, Stagner et al. asked if pigeons preferred near hit trials over clear loss trials when the probability of reinforcement
was equated (Figure 6). Not only did they find that pigeons preferred a clear loss over a mixture of clear loss and near hit trials but
in a follow-up experiment they also found that the later in the trial that the near hit occurred, the more they avoided the alternative
with the near hit trials. Thus, as already noted, the preference for near hit outcomes by humans may result from a generalization
from the large number of skill tasks in which humans often engage.

Figure 6: Design of Stagner et al. Pigeons chose between two alternatives. Choice of one alternative (e.g., the plus) resulted in a red stimulus 50% of the time always followed by reinforcement, or a red stimulus 25% of the time that after 5 s changed to green that was never followed by reinforcement (the near hit outcome), or a green stimulus 25% of the time that was never followed by reinforcement (the clear loss outcome). Choice of the other alternative (e.g., the circle) was resulted in either a blue stimulus 50% of the time always followed by reinforcement or a yellow stimulus 50% of the time never followed by reinforcement (the clear loss outcome). Thus, both alternatives were associated with an equal probability of reinforcement.

The Demographics of Suboptimal Choice and Human Gambling

The relation between suboptimal choice and level of food restriction

Although humans often describe gambling as a form of entertainment, the fact that people of lower socio-economic status
tend to gamble proportionally more than those of higher socio-economic status. Lyk-Jensen suggests that entertainment is not
the primary motivation for gambling [51,52]. If the suboptimal choice by pigeons is a good analog of human gambling, then one might expect that the level of pigeons’ food restriction would be related to their degree of suboptimal choice. Consistent with this
hypothesis, Laude et al. found that pigeons that were normally food restricted showed the typical suboptimal choice, whereas
those that were minimally food restricted tended to choose optimally and thus, paradoxically, they obtained more food [53,54]. Such a finding might be considered consistent with risk sensitive foraging in which birds on a negative energy budget may be inclined to be more risk prone if a fixed option is not sufficient for the animal to survive but as Kacelnik and Batson have noted, relatively large birds like pigeons trained under the present conditions are not likely to be on a negative energy budget [55]. In any case, the view of human gambling as a form of investment is analogous to the pigeons’ choice as opportunity cost [16,56].

The relation between housing and suboptimal choice

Research with rats suggests that several extra-experimental environmental factors such as social and nonsocial enrichment
can affect a rat’s tendency to self-administer drugs [57]. Rats that are housed in an enriched environment (a large cage with other
rats and novel objects) are significantly less likely to self-administer drugs than rats that are individually (normally) housed. The
mechanism responsible for the reduced self-administration of drugs by environmental enrichment has been hypothesized to be
a reduction in impulsive behavior [58]. A similar mechanism has been suggested to be involved in the reduced effectiveness of
conditioned reinforcers [59]. Impulsivity has also been implicated in human gambling behavior and there is evidence that similar
physiological mechanisms underlie compulsive gambling and drug addiction [60,61].

Pattison et al. attempted to determine the effect of housing conditions on suboptimal choice by giving one group of pigeons
experience in an enriched environment (a large cage with four other pigeons for 4 h a day), while the control pigeons remained in
their normal one-to-a-cage housing [62]. When they exposed the pigeons from both groups to the gambling-like suboptimal choice
task they found that the enriched pigeons were much slower to learn to choose the suboptimal alternative. Thus, enriched housing
appears to retard the development of suboptimal choice, even for a relatively short 4 h a day. This finding has implications for
the treatment of humans who are problem gamblers. It suggests that one might be able to reduce the attraction of gambling by
exposing human gamblers to an environment that is socially and physically enriched.

Conclusion

The suboptimal choice task provides a reasonable analog to human commercial gambling. The mechanism responsible
for this suboptimal behavior appears to be the relative lack of effectiveness of non-reinforcement in reducing the likelihood that
they will choose the suboptimal alternative, even when the non-reinforcement occurs on almost every trial [44,50,61]. Furthermore,
the relative probability of reinforcement associated with the choice appears to be relatively unimportant. Instead, the predictive
values of the stimuli that follow that choice appear to be the primary determinant of the initial preference. Similarly, for most
humans who gamble, it is the potential reward rather than the odds of winning that influences the tendency to gamble. It may
also be that positive contrast between the expected probability of reinforcement and that obtained with the appearance of the S+
stimulus following choice of the suboptimal alternative plays a role as well [62].

Stagner JP, et al. Pigeons prefer discriminative stimuli independently of the overall probability of reinforcement and of the number of presentations of the conditioned reinforcer. J ExpPsycholAnimBehav. 2012;38:446–452.