Abstract

Two experimental tasks in psychology, the two-stage gambling game and the Prisoner's Dilemma game, show that people violate the sure thing principle of decision theory. These paradoxical findings have resisted explanation by classical decision theory for over a decade. A quantum probability model, based on a Hilbert space representation and Schrödinger's equation, provides a simple and elegant explanation for this behaviour. The quantum model is compared with an equivalent Markov model and it is shown that the latter is unable to account for violations of the sure thing principle. Accordingly, it is argued that quantum probability provides a better framework for modelling human decision-making.

1. Introduction

Cognitive science is concerned with providing formal, computational descriptions for various aspects of cognition. Over the last few decades, several frameworks have been thoroughly examined, such as formal logic (e.g. St Evans et al. 1991), information theory (e.g. Chater 1999), classical (Bayesian) probability (e.g. Tenenbaum & Griffiths 2001), neural networks (Rumelhart & McClelland 1986) and a range of formal, symbolic systems (e.g. Anderson et al. 1997). Being able to establish an advantage of one computational approach over another is clearly a fundamental issue for cognitive scientists. Two criteria are needed to achieve this goal: one is to establish a striking empirical finding that provides a strong theoretical challenge and the other is to provide a rigorous mathematical argument that a theoretical class fails to meet this challenge. This paper reviews findings that challenge the classical (Bayesian) probability approach to cognition, and proposes to exchange this with a more generalized (quantum) probability approach.

The empirical challenge is provided by two experimental tasks in decision-making, the Prisoner's Dilemma and the two-stage gambling task, which have had an enormous influence on cognitive psychology (and economics—there are over 31 000 citations to Tversky's work, one of the researchers who first studied these tasks, e.g. Tversky & Kahneman 1983; Shafir & Tversky 1992; Tversky & Shafir 1992). These experimental tasks are important because they show a violation of a fundamental law of classical (Bayesian) probability theory that, when applied to human decision-making, is called the ‘sure thing’ principle (Savage 1954).

The sure thing principle (Savage 1954) is fundamental to classical decision theory: if you prefer action A over B under state of the world X, and you also prefer A over B under the complementary state ∼X, then you should prefer A over B when the state is unknown. This principle was tested by Tversky & Shafir (1992) in a simple two-stage gambling experiment: participants were told that they had just played a gamble (even chance to win $200 or lose $100), and then they were asked to choose whether to play the same gamble a second time. In one condition, they knew they won the first play; in a second condition, they knew they lost the first play; and in a third condition, they did not know the outcome. Surprisingly, the results violated the sure thing principle: following a win/loss, the participants chose to play again on 69%/59%, respectively, of the trials; but when the outcome was unknown, they chose only to play again on 36 per cent of the trials. This preference reversal was observed at the individual level of analysis with real money at stake.

Similar results were obtained using a two-person Prisoner's Dilemma game with pay-offs defined for each player as in table 1. The Nash equilibrium in standard game theory is for both parties to defect. Three conditions are used to test the sure thing principle: in an ‘unknown’ condition, you act without knowing your opponent's action; in the ‘known defect’ condition, you know your opponent will defect before you act; and in the ‘known cooperate’ condition, you know your opponent will cooperate before you act. According to the sure thing principle, if you prefer to defect, regardless of whether you know your opponent will defect or cooperate, then you should prefer to defect even when your opponent's action is unknown.

Once again, people frequently violate the sure thing principle (Shafir & Tversky 1992)—many players defected knowing the opponent defected and knowing the opponent cooperated, but they switched and decided to cooperate when they did not know the opponent's action. This preference reversal by many players caused the proportion of defections for the unknown condition to fall below the proportions observed under the known conditions. This key finding of Shafir & Tversky (1992) has been replicated in several subsequent studies (Croson 1999; Li & Taplin 2002; Busemeyer et al. 2006a; table 2).

Empirically observed proportion of defections in different conditions in the Prisoner's Dilemma game.

Note that Prisoner's Dilemma is a task that has attracted widespread attention not just from decision-making scientists. For example, it has been intensely studied in the context of how altruistic and cooperative behaviour can arise in both humans and animals (e.g. Axelrod & Hamilton 1981; Stephens et al. 2002; Kefi et al. 2007). Violations of the sure thing principle specifically have not been demonstrated in animal cognition. However, both Shafir (1994) and Waite (2001) showed that transitivity, another fundamental aspect of classical probability theory, can be violated in animal preference choice (with grey jays and bees, respectively). Also, it turns out that a core element of our model for human decision-making in the Prisoner's Dilemma has an analogue in animal cognition, raising the possibility that such a model may be applicable to animal cognition as well.

We present an alternative probabilistic framework for modelling decision-making, based on quantum probability. Why consider a quantum probability model for decision-making? The original motivation for the development of quantum mechanics in physics was to explain findings that seemed paradoxical from a classical point of view. Similarly, paradoxical findings in psychology have made a growing number of researchers seek explanations that make use of the quantum formalism in cognitive situations. For example, Aerts & Aerts (1994; see also Khrennikov 2004; La Mura in press; Mogiliansky et al. in press) modelled incompatibility and interference effects that arise in human preference judgements. Gabora & Aerts (2002; see also Aerts & Gabora 2005a,b; Aerts et al. in press) modelled puzzling findings found in human reasoning with conceptual combinations. Bordley (1998; see also Franco in press) modelled paradoxical results obtained with human probability judgements. Van Rijsbergen (2004; see also Widdows 2006; Bruza et al. 2008) showed that classical logic does not appear the right type of logic when dealing with classes of objects and a more appropriate representation for classes is possible with mathematical tools from quantum theory. Such approaches can be labelled ‘geometric’ (cf. Aerts & Aerts 1994) in that they use the geometric properties of Hilbert space representations and the measurement principles of quantum theory, but not the dynamical aspects of quantum theory (time evolution with Schrödinger's equation).

A much smaller number of applications have attempted to apply quantum dynamics to cognition. For example, Atmanspacher et al. (2002) modelled oscillations in human perception of impossible figures. Aerts et al. (2004) modelled how a human observer alternates between perceiving the statements in a liar's paradox situation as false and true. Busemeyer et al. (2006b) presented a quantum model for signal detection type of human decision processes. Our proposal builds on this latter work (and particularly that of Busemeyer et al. 2006a). We were interested in a model that would have implications for the time course of a decision, as well as accurately predicting choice probabilities in the Prisoner's Dilemma and two-stage gambling task. A novel aspect of our proposal is that we attempt to derive a relevant Hamiltonian a priori, on the basis of the psychological parameters of the decision-making situation.

Finally, note that the goal of models such as the above is to formulate a mathematical framework for understanding the behavioural findings from cognition and decision-making. This objective must be distinguished from related ones, such as constructing new game strategies using quantum game theory (Eisert et al. 1999), modelling the biology of the brain using quantum mechanics (Pribram 1993; Hameroff & Penrose 1996) or developing new algorithms for quantum computation (Nielsen & Chuang 2000).

The violation of the sure thing principle readily suggests that a classical probability model for Tversky and Shafir's results will fail. We go beyond intuition and develop a standard Markov model for the two-stage gambling task and the Prisoner's Dilemma, to prove that this model can never account for the empirical findings. In this way we motivate a more general model, based on quantum probability. Researchers have recently successfully explored applications of the quantum mechanics formalism outside physics (for example, notably computer science, e.g. Grover 1997). Explorations of how the quantum principles could apply in psychology have been slow, partly because of a confusion of whether such attempts implicate a statement that the operation of the brain is quantum mechanical (e.g. Hameroff & Penrose 1996). This could be the case or not (cf. Marr 1982), but it is not the issue at stake: rather, we are asking whether quantum probability could provide an appropriate mathematical framework for understanding/modelling certain behavioural aspects of cognition. Key problems in such an endeavour are (i) what is an appropriate Hilbert space representation of the task, (ii) what is the psychological motivation for the corresponding Hamiltonian, and (iii) what is the meaning of time evolution in this context? We address all these problems in our quantum probability model of decision-making in the Prisoner's Dilemma task and the two-stage gambling task. The model is described with respect to the Prisoner's Dilemma task, but extension to the two-stage gambling task is straightforward.

2. Description of the Model

(a) Step 1: representation of beliefs and actions

The Prisoner's Dilemma game involves a set of four mutually exclusive and exhaustive outcomes Ω={BDAD,BDAC, BCAD, BCAC}, where BiAj represents your belief that your opponent will take the ith action, but you intend to take the jth action (D, defect; C, cooperate). For both the Markov and quantum models, we assume that the probabilities of the four outcomes can be computed from a 4×1 state vectorFor the Markov model, ψij=Pr[observe belief i and action j], with . For the quantum model, ψij is an amplitude, so that |ψij|2=Pr[observe belief i and action j], with . For both models, we assume that the individual begins the decision process in an uninformed state:for the Markov model andfor the quantum model.

(b) Step 2: inferences based on prior information

Information about the opponent's action changes the initial state ψ0 into a new state ψ1.

If the opponent is known to defect, the state changes tofor the Markov model, andfor the quantum model; similarly, if the opponent is known to cooperate, the state changes tofor the Markov model andfor the quantum model. If no information is provided, then the state remains unchanged so that ψ1=ψ0.

(c) Step 3: strategies based on pay-offs

The decision-maker must evaluate the pay-offs in order to select an action, which changes the previous state ψ1 into a final state ψ2. We assume that the time evolution of the initial state to the final state corresponds to the thought process leading to a decision.

For the Markov model, we can model this change using a Kolmogorov forward equation (dψ/dt)=KA.ψ, which has a solution . The matrix is a transition matrix, with Tij(t)=Pr[transiting to state i from state j during time period t]. The transition matrix satisfies to guarantee that ψ2 sums to unity. Initially, we assume(2.1a)The intensity matrix KA transforms the state probabilities to favour either defection or cooperation, depending on the parameters μd or μc, which correspond to your gain if you defect, depending on whether you believe the opponent will defect or cooperate, respectively; these parameters depend on the pay-offs associated with different actions, and will be considered shortly. The intensity matrix requires Kij >0 for i≠j and to guarantee that is a transition matrix.

For the quantum model, the time evolution is determined by Schrödinger's equation with solution . The matrix is unitary with |Uij(t)|2=Pr[transiting to state i from state j during time period t]. This matrix must satisfy U†U=I (identity matrix) to guarantee that ψ2 retains unit length (U† is the adjoint of U). Initially, we assume that(2.1b)The Hamiltonian HA rotates the state to favour either defection or cooperation, depending on the parameters μd or μc, which (as before) correspond to your gain if you defect, depending on whether you believe the opponent will defect or cooperate, respectively. The Hamiltonian must be Hermitian to guarantee that U is unitary.

For both models, the parameter μi is assumed to be a monotonically increasing utility function of the differences in the pay-offs for each of your actions, depending on the opponent's action: μd=u(xDD−xDC) and μc=u(xCD−xCC), where xij is the pay-off you receive if your opponent takes action i and you take action j. For example, given the pay-offs in table 1, uDD=x(10, 10), uDC=x(25, 5), uCD=x(5, 25) and uCC=x(20, 20). Assuming that utility is determined solely by your own pay-offs, then μd=u(xDD−xDC)=u(5)=μ=u(xCD−xCC)=μc. In other words, typically, μi can be set equal to the difference in the pay-offs (possibly multiplied by a constant, scaling factor), although more complex utility functions can be assumed.

For both models, a decision corresponds to a measurement of the state ψ2(t). For the Markov model, Pr[you defect]=Pr[D]=(ψDD+ψCD); similarly, for the quantum model, Pr[you defect]=Pr[D]=(|ψDD|2+|ψCD|2).

Inserting equation (2.1a) into the Kolmogorov equation and solving for ψ2(t) yields the following probability when the opponent's action is known:This probability gradually grows monotonically from (1/2) at t=0 to (μ/(1+μ)) as t→∞. The behaviour of the quantum model is more complex. Inserting equation (2.1b) into the Schrödinger equation and solving for ψ2(t) yields:For −1<μ<+1, this probability increases across time from (1/2) at t=0 to ((1/2)+(μ/(1+μ2))) at t=1, and subsequently it oscillates between the minimum and maximum values. Empirically, choice probabilities in laboratory-based, decision-making tasks monotonically increase across time (for short decision times, see Diederich & Busemeyer 2006), and so a reasonable approach for fitting the model is to assume that a decision is reached within the interval (0<t<1) for the quantum model (t=1 would correspond to approx. 2 s for such tasks).

Equations (2.1a) and (2.1b) produce reasonable choice models when the action of the opponent is known. However, when the opponent's action is unknown, both models predict that the probability of defection is the average of the two known cases, which fails to explain the violations of the sure thing principle. The KA and HA components of each model can be understood as the ‘rational’ components of each model, whereby the decision-maker is simply assumed to try to maximize utility. In §2d we introduce a component in each model for describing an additional influence on the decision-making process, which can lead to decisions that do not maximize expected utility (and so could lead to violations of the sure thing principle). These two components in each model have to be separate since in many cases the behaviour of decision-makers can be explained (just) by an urge to maximize expected utility. The difference between the Markov model and the quantum one relates to how the two components are combined with each other. Importantly, we prove that the Markov model still cannot produce the violations of the sure thing principle even when this second, non-rational component is added. Only the quantum model explains the result.

(d) Step 4: strategies based on evaluations of both beliefs and pay-offs

To explain violations of the sure thing principle, we introduce the idea of cognitive dissonance (Festinger 1957). People tend to change their beliefs to be consistent with their actions. In the case of the Prisoner's Dilemma game, this motivates a change of beliefs about what the opponent will do in a direction that is consistent with the person's intended action. In other words, if a player chooses to cooperate he/she would tend to think that the other player will cooperate as well. The reduction in cognitive dissonance is an intriguing, and extensively supported, cognitive process. It has been shown with monkeys (Egan et al. 2007), suggesting that the applicability of our model might extend to such animals. Shafir & Tversky (1992) explained it in terms of a personal bias for ‘wishful thinking’ and Chater et al. (2008) by considering a statistical approach based on Simpson's paradox (specifically, Chater et al. showed that, in a Prisoner's Dilemma game, the propensity to cooperate or defect would depend on assumptions about what the opponent would do, given whether the parameters of the game would encourage cooperation or defection). Such approaches may not be incompatible (for example, wishful thinking may have an underlying statistical explanation).

Although cognitive dissonance tendencies can be implemented in both the Markov and quantum models, we shall see that it does not help the Markov model, and only the quantum model explains the sure thing principle violations.

For the Markov model, an intensity matrix that produces a change of wishful thinking is(2.2a)The first/second matrix changes beliefs about the opponent towards defection/cooperation when you plan to defect/cooperate, respectively.

Note thatIf γ>1, then the rate of increase for the first and last rows is greater than that for the middle rows, leading to an increase in the probabilities that beliefs and actions agree. For example, setting γ=10 at t=1 produceswhere it can be seen that beliefs tend to match actions, achieving a reduction in cognitive dissonance. For the quantum model, a Hamiltonian that produces this change is(2.2b)The first/second matrix rotates beliefs about the opponent towards defection/cooperation when you plan to defect/cooperate, respectively. Note thatIf γ>0, then the rate of increase for the first and last rows is greater than that for the middle rows, so that, as before, there is an increase in the amplitudes for the states in which beliefs and actions agree. For example, setting γ=1 at t=π/2 produces e−i.t.HB.ψ0 which results in a vector of squared magnitudesIn both the Markov and quantum models, we can see that the above intensity matrix/Hamiltonian, respectively, makes beliefs and actions correlated.

By itself, equation (2.2a) and (2.2b) is an inadequate description of behaviour in the Prisoner's Dilemma, because it cannot explain how preferences vary with pay-offs. We need to combine equations (2.1a), (2.1b) and (2.2a), (2.2b) to produce an intensity matrix KC=KA+KB or a Hamiltonian HC=HA+HB, so that the time evolution of the initial state to the final state reflects the influences of both the pay-offs and the process of wishful thinking. Note that in this combined model, both beliefs and actions are evolving simultaneously and in parallel.

Accordingly, we suggest that the final state is determined by for the Markov model and for the quantum model. Each model has two free parameters: one changes the actions using equation (2.1a) and (2.1b) and depends on pay-offs (i.e. μ), and the other that corresponds to a psychological bias to assume the opponent will act as we do with equation (2.2a) and (2.2b) (i.e. γ).

3. Model predictions

For the Markov model, probabilities for the unknown state can be related to probabilities for the known states by expressing the initial unknown state as a probability mixture of the two initial known states:We see that the state probabilities in the unknown case must equal the average of the state probabilities for the two known cases. Therefore, the Markov model fails to reproduce the violations of the sure thing principle, regardless of what parameters, time point, initial state or intensity matrix we use. This conclusion reflects the fundamental fact that the Markov model obeys the law of total probability, which mathematically restricts the unknown state to remain a weighted average of the two known states. Note that this failure of the Markov model occurs even when we include the cognitive dissonance tendencies in the model.

For the quantum model, the amplitudes for the unknown state can also be related to the amplitudes for the two known states:We see that the amplitudes in the unknown case equal the superposition of the amplitudes for the two known cases. However, here is precisely where the quantum model departs from the Markov model: probabilities are obtained from the squared magnitudes of the amplitudes. This last computation produces interference effects that can cause the unknown probabilities to deviate from the average of the known probabilities.

We initially fit the parameters of the quantum model at t=1.(π/2), which is the time when the choice probabilities produced by equation (2.1b) first reach their maximum. Note that the quantum model predicts that choice probabilities will oscillate with time. However, in modelling results from the Prisoner's Dilemma task, one needs to consider, first, that such tasks are very simple and, second, that respondents in such experiments are typically paid for their participation so that they are motivated (and indeed sometimes requested) to respond quickly. Both these considerations suggest that a decision will be made (the state vector collapses) as early as one course of action emerges as advantageous (Diederich & Busemeyer 2006). Regarding the two-stage gambling task, setting μ=0.59 and γ=1.74 produces probabilities for choosing to gamble equal to (0.68, 0.58, 0.37) for the (known win, known loss, unknown) conditions, respectively. These predictions closely match the observed results (0.69, 0.59, 0.36, from Tversky & Shafir 1992). For the Prisoner's Dilemma game, setting μ=0.51 and γ=2.09 produces probabilities for defection equal to (0.81, 0.65, 0.57) for the (known defect, known cooperate, unknown) conditions, respectively. Again, model predictions closely reproduce the average pattern (0.84, 0.66, 0.55) in table 2. Figure 1 shows that model predictions are fairly robust as parameter values vary. An interference effect appears after t=0.75.(π/2) and is evident across a large section of the parameter space. Finally, note that we can relax the assumption that participants will make a decision at the same time point (we thank a reviewer for this observation). To allow for this possibility, we assumed a gamma distribution of decision times with a range from t=0.50.(π/2) to 2.(π/2) and a mode at t=1.(π/2). We refitted the model using the mean choice probability averaged over this distribution, and this produced very similar predicted results: (0.77, 0.64, 0.58) for the known defect, known cooperate and unknown conditions, respectively, in a Prisoner's Dilemma (with μ=0.47 and γ=2.10).

(a–f) The probability of defection under the unknown condition minus the average for the two known conditions, at six time points (note that time incorporates a factor of π/2; (a) time=0.75, (b) time=1.00, (c) time=1.25, (d) time=1.50, (e) time=1.75, (f) time=2.00). Negative values (blue) typically indicate an interference effect in the predicted direction.

Classical probability theory has been widely applied in understanding human choice behaviour. Accordingly, one can naturally wonder whether it is possible to salvage the Markov model. First, recall that the classical Markov model fails even when we allow for cognitive dissonance effects in this model. Second, the analyses above hold for any initial state, ψ0, and any intensity matrix, K (not just the ones used above to motivate the model), but they are based on two main assumptions: the same initial state and the same intensity matrix are used across both known and unknown conditions. However, we can relax even these assumptions. Even if the initial state is not the same across conditions, the Markov model must predict that the marginal probability of defecting in the unknown condition (whatever mixture is used) is a convex combination of the two probabilities conditioned on the known action of the opponent. This prediction is violated in the data of table 2. Furthermore, even if we change intensity matrices across conditions (using the KA intensity matrix for known conditions and using the KC matrix for the unknown condition), the Markov model continues to satisfy the law of total probability because this change has absolutely no effect on the predicted probability of defection (the KB matrix does not change the defection rate). Thus our tests of the Markov model are very robust.

4. Concluding comments

In this work, we considered empirical results that have been a focal point in the controversy over whether classical probability theory is an appropriate framework for modelling cognition or not. Tversky, Shafir, Kahneman and colleagues have argued that the cognitive system is generally sensitive to environmental statistics, but is also routinely influenced by heuristics and biases that can violate the prescription from probability theory (Tversky & Kahneman 1983; Tversky & Shafir 1992; cf. Gigerenzer 1996). This position has had a massive influence not only on psychology, but also on management sciences and economics, culminating in a Nobel Prize award to Kahneman. Moreover, findings such as the violation of the sure thing principle in the Prisoner's Dilemma have led researchers to raise fundamental questions about the nature of human cognition (for example, what does it mean to be rational? Oaksford & Chater 1994).

In this work, we adopted a different approach from the heuristics and biases advocated by Tversky, Kahneman and Shafir. We propose that human cognition can and should be modelled within a probabilistic framework, but classical probability theory is too restrictive to fully describe human cognition. Accordingly, we explored a model based on quantum probability, which can subsume classical probability, as a special case. The main problems in developing a convincing cognitive quantum probability model are to determine an appropriate Hilbert space and Hamiltonian. We attempted to present a satisfactory prescriptive approach to deal with these problems and so encourage the development of other quantum probability models in cognitive science. For example, the Hamiltonian is derived directly from the parameters of the problem (e.g. the pay-offs associated with different actions) and known general principles of cognition (e.g. reducing cognitive dissonance). Importantly, our model works: it is able to account for violations of the sure thing principle in the Prisoner's Dilemma and the two-stage gambling task and leads to close fits to empirical data.

Quantum probability provides a promising framework for modelling human decision-making. First, we can think of the set of basis states as a set of preference orders over actions. According to a Markov process, an individual is committed to exactly one preference order at any moment in time, although it can change from time to time. According to a quantum process, an individual experiences a superposition of all these orders, and at any moment the person remains uncommitted to any specific order. This is an intriguing perspective on human cognition, which may shed light on the functional role of different modes of memory and learning (cf. Atallah et al. 2004). Second, Schrödinger's equation predicts a periodic oscillation of the propensity to perform one action (assuming that the decision-maker can be persuaded to extend the decision time beyond the first cycle of the process), which is broadly analogous to the electroencephalography signals recorded from participants engaged in choice tasks (cf. Haggard & Eimer 1999). Only after preferences are revealed does the process collapse onto exactly one preference order. This contrasts with time development in the Markov model, whereby the system monotonically converges to its final state. Third, quantum probability models allow interference effects that can make the probability of the disjunction of two events to be lower than the probability of either event individually (see also Khrennikov 2004). Such interference effects are ubiquitous in psychology, but incompatible with Markov models, which are constrained by classical probability laws. Fourth, ‘back-to-back’ measurements on the same decision will produce the same result in a quantum system (because of state reduction), which agrees with what people do (Atmanspacher et al. 2004). However, back-to-back choices remain probabilistic in classical random utility models, which is not what people do. Finally, recent results in computer science have shown quantum computation to be fundamentally faster compared with classical computation, for certain problems (Nielsen & Chuang 2000). Perhaps the success of human cognition can be partly explained by its use of quantum principles.

Acknowledgments

This research was partly supported by ESRC grant R000222655 to E.M.P. and NIMH R01 MH068346 and NSF MMS SES 0753164 to J.R.B. We would like to thank Chris Isham, Smaragda Kessari and Tim Hollowood.