Abstract

Evolutionary game theory classically investigates which behavioral
patterns are evolutionarily successful in a single game. More recently,
a number of contributions have studied the evolution of preferences
instead: which subjective conceptualizations of a game’s payoffs give
rise to evolutionarily successful behavior in a single game.
Here, we want to extend this existing approach even further by asking:
which general patterns of subjective conceptualizations of payoff
functions are evolutionarily successful across a class of games.
In other words, we will look at evolutionary competition of payoff
transformations in “meta-games”, obtained from averaging over
payoffs of single games. Focusing for a start on the class of 2×2
symmetric games, we show that regret minimization can outperform payoff
maximization if agents resort to a security strategy in case of radical
uncertainty.

This is what I aim at, because the point of philosophy is to start with something so simple as not to seem worth stating, and to end with something so paradoxical that no one will believe it.[Russ18]

In the epistemic literature, there are two major and alternative,
sometimes competing, trends in formalizing belief: a probabilistic
approach, and a non-probabilistic approach. As the names suggest,
the probabilistic approach uses probabilities in order to model agents’
beliefs, whereas the non-probabilistic approach relies on more qualitative
structures (see for example [baltsme08]).

Through this abstract we will assume that probabilistic and non-probabilistic
beliefs are just two different and compatible forms of belief. In
other words, we assume that agents may have probabilistic beliefs
in some circumstances and non-probabilistic beliefs in others. More
specifically, by probabilistic belief we mean in general that the
belief of an agent is expressed by a probability distribution, while
by non-probabilistic belief in this abstract we mean that the belief
is not representable by a single specific probability distribution.
Appealing to reasons of self-evidence and introspection, we hope this
sounds an uncontroversial assumption, “so simple as not to seem
worth stating”. Or, at least, we hope the reader will agree that
it is much less controversial than assuming that agents have either
only probabilistic beliefs or only non-probabilistic beliefs.

Evolutionary game theory classically investigates which behavioral
patterns, when competing against each other, are evolutionarily stable
in a single game. More recently, a number of contributions have studied
the evolution of preferences instead: which subjective conceptualizations
of a game’s payoff function give rise to evolutionarily successful
behavior in a single game ([algweib13], [DekElyYlan07],
[heifshanspieg07], [OkVega01], [RobSam11], [Sam01]).
This literature is grounded on the so called indirect evolutionary
approach. While the classical evolutionary approach is aimed at studying
whether a certain strategy in a given game is evolutionarily
stable and robust, the indirect evolutionary approach allows to investigate
whether certain preferences are evolutionarily successful or
not. Yet it was argued that

The indirect evolutionary approach with unobservable preferences gives us an alternative description of the evolutionary process, one that is perhaps less reminiscent of biological determinism, but leads to no new results.[RobSam11]

In this work we adopt a standpoint close to the one taken in evolution
of preferences, but we want to extend this existing approach even
further by asking: which general patterns of subjective conceptualizations
of payoff functions are evolutionarily successful across a class
of games. In other words, we will look at evolutionary competition
of general payoff transformations in meta-games, obtained from averaging
over payoffs obtained from single games. Focusing for a start on the
class of 2×2 symmetric games, we show that regret minimization
can outperform payoff maximization if agents resort to a security
strategy in case of radical uncertainty. I.e., payoff maximization
turns out to be evolutionarily unstable under simple epistemic assumptions.

The standard model for studying the evolution of preferences (in particular,
we are referring to [DekElyYlan07]) is built on a symmetric
two-player normal-form game G with finite action set A={a1,...,an}
and payoff function π:A×A→R. This is
usually called the fitness game since evolutionary selection
is driven by payoff function π. Players in the population represent
subjective preferences, or subjective utility functions,
that can diverge from the objective fitness given by π.
A subjective preference is a function θi:A×A→R,
and the set of subjective preferences is Θ≡RA×A.
Each player chooses the action that maximizes her subjective
preference, but receives the objective payoff defined by π.
Hence, player i’s action choice is determined by θi,
but i’s evolutionary fitness is deterimined by π.

Different authors enrich this basic picture with various features
(e.g., observability [DekElyYlan07], assortative matching [algweib13],
etc.) and study the resulting effects in the dynamics. We do not argue
against any of these approaches here, but adopt a different one. Firstly,
we add a meta-game perspective by studying evolution of preferences
across a class of games. Secondly, we pay attention to the epistemic
situations of the agents and include the possibility that agents play
a security strategy in case of radical uncertainty.

Instead of one fitness game G, we consider a class G of
fitness games. Here, we take G to be the class of
2×2 symmetric games. We are interested in the evolutionary
competition of player types(τ,e), conceived of as
a pair of a subjective preference type τ and an epistemic type
e. Let Λ denote the set of player types. We will enlarge on each component in turn in the following. We
take a player type to specify action choices in each G∈G
and thus think of a player type as a choice principle. This
allows us to study the evolutionary competition between different
subjective ways of representing a game’s utilities and different ways
of using behavioral beliefs about the co-player. To keep matters
manageable, we restrict our attention to a selected subset of conceptually
relevant player types, comparing players of four different and
theoretically significant subjective preference types.

Subjective preference types

Formally, a preference type is a function
τi:G→Θ from games to subjective
preferences. Let T be the set of preference types. We can think of preference types as transformations of
π, for any G∈G: τi may then be understood as a player’s way of
thinking across games, a red thread that relates different subjective
preferences across different games.

For perspicuity, we focus here on four conceptually
relevant transformations in T: (i) an actual payoff
type, whose subjective preferences coincide with actual fitness payoffs
π, (ii) an altruistic type, whose subjective preferences
are the sum of her own fitness and that of the co-player, (iii) a
competitive type, whose subjective preferences are her own
fitness minus that of the co-player, and (iv) a regret type,
whose subjective preferences are given by each action’s regret
([loosug82],[halpass12]). Denoting the payoff function
π of game G by πG, define these types as:

actual payoff type: ∀G∈G,τπ(G)=πG;

altruistic type: ∀G∈G,∀ai,aj∈A,τalt(G)(ai,aj)=πG(ai,aj)+πG(aj,ai);

competitive type: ∀G∈G,∀ai,aj∈A,τcom(G)(ai,aj)=πG(ai,aj)−πG(aj,ai);

regret type: ∀G∈G,∀ai,aj∈A,τreg(G)(ai,aj)=−(πG(a$,aj)−πG(ai,aj)),
where a$ stands for the best reply to aj under πG.1

Epistemic types

In full generality, an epistemic type is a general disposition to form
beliefs about the co-player’s behavior. As for preference types, in this abstract we limit ourselves to a small selection of epistemic types
that are particularly interesting from a theoretical point of view. Here, we just consider two epistemic types e∈{¯μ,Δ(A)}:

a uniform probabilistic belief ¯μ∈Δ(A) about
the opponent’s behavior, or

the full set Δ(A) of all possible behavioral beliefs about
the co-player’s actions.

Hence, we are mainly focusing on two extreme epistemic
types for the moment: players can either have a probabilistic (flat) belief about the co-player’s
actions, or be radically uncertain, i.e., have no specific
probability distribution on the co-player. It would also be possible
to take into account different degrees of uncertainty, and to link
our results to the literature about ambiguity and uncertainty aversion
more tightly ([GhirMar02], [GilSch89], [MacMarRus06]), but we will only consider the two extreme
cases for this abstract.
There are many reasons why agents might be radically uncertain: lack
of cognitive capabilities, lack of information2, etc. It
is to be expected that the more “correct” the beliefs of a player
are on average, the higher its fitness. Still, radical uncertainty may
well be considered a starting point for evolutionary selection, and so
we start our investigation there.

Choice principles

We take player types (τ,e) to rise to choice
principles, i.e., systematic mappings of each game into a subset of
actions. Many possibilities
are conceivable here. To be practical, we need to, again, make
a principled selection based on theoretical relevance. For simplicity,
we assume that players apply maximin expected utility
([GilSch89]) based on their subjective
preference type and their epistemic type.

From the perspective of Maxmin expected utility ([GilSch89]),
the behavior of an agent corresponds to maximizing the minimal expected (subjective)
utility over the set of probability distributions that she
is holding. In our particular case, this set can either be a singleton
(a flat probability distribution ¯μ), or the full simplex Δ(A),
i.e., the set of all possible probability distributions over the co-player’s
choices (in case of radical uncertainty). In the first case, playing
an action that maximinimizes subjective expected utility is the same as maximizing subjective expected utility,
whereas playing an action that maximinimizes subjective expected utility in the second case amounts
to playing standard maximin over the game with subjectively
transformed preferences ([OsbRub94]).

Other possible construals of choice principles are conceivable, e.g.,
maximax, the maximization of the maximal utility. Our choice of
maximin expected utility is motivated by the fact that it gives rise
to well-known decision rules. For player types (τπ,¯μ) we have standard maximization of expected utility; for
player types (τπ,Δ(A)) we have standard maximin;
for player types (τreg,Δ(A)) we have (positive)
regret minimization ([halpass12]).

It is important to notice that player types (τπ,¯μ) and (τreg,¯μ) are actually
behaviorally equivalent.

Remark 1.
Maximization of expected utility and minimization of expected
regret coincide: for any probabilistic belief, an action a∗
maximizes the expected utility if and only if action a∗ minimizes
the expected (positive) regret.

Nonetheless, it is important from an evolutionary point of
view to distinguish player types who conceptualize a game’s
payoffs in terms of regret from those who consider the actual
payoffs π, especially when we consider evolutionary dynamics involving
mutation (see below).

In this section we present some results achievable in our set-up.
For reasons of exposition, we first focus on radical
uncertainty, and then we allow players to have both probabilistic and
non-probabilistic beliefs (i.e., to be of both epistemic types ¯μ) and Δ(A)).

Consider a population where the eight player types introduced above ((τπ,¯μ), (τalt,¯μ), (τcom,¯μ), (τreg,¯μ), (τπ,Δ(A)), (τalt,Δ(A)), (τcom,Δ(A)) and (τreg,Δ(A)))
are present. We are interested in the question which player types will
be evolutionarily successful when repeatedly playing random symmetric
2×2 games.

To address this question, we use numerical simulation to approximate
the average payoff accrued by each choice principle.
To this end, we randomly generated 50000 symmetric 2×2 games by sampling i.i.d. payoffs from the natural numbers in the set {0,1,...,10}. For each sampled
game, we let all choice principles play against each other and recorded
the payoffs obtained after each play. Finally, we took the average.
Table 1
gives the resulting payoff matrix with the row type’s average
payoffs against each of the column types.

(τreg,Δ(A))

(τπ,Δ(A))

(τalt,Δ(A))

(τcom,Δ(A))

(τreg,¯μ)

(τπ,¯μ)

(τalt,¯μ)

(τcom,¯μ)

(τreg,Δ(A))

6.629

6.653

5.806

7.089

6.636

6.636

5.793

7.463

(τπ,Δ(A))

6.455

6.468

6.067

6.685

6.462

6.462

6.065

6.834

(τalt,Δ(A))

6.280

6.746

5.473

6.959

6.294

6.294

5.474

7.114

(τcom,Δ(A))

5.936

5.735

5.336

6.379

5.929

5.929

5.327

6.538

(τreg,¯μ)

6.633

6.658

5.810

7.081

6.634

6.634

5.802

7.454

(τπ,¯μ)

6.633

6.658

5.810

7.081

6.634

6.634

5.802

7.454

(τalt,¯μ)

6.278

6.750

5.476

6.953

6.293

6.293

5.484

7.112

(τcom,¯μ)

6.311

5.885

5.475

6.536

6.299

6.299

5.466

7.123

Table 1: Average payoff for player types in simulations of 5000
randomly generated 2×2 symmetric games.

Radical uncertainty

To appreciate the following results, it helps to consider first a
restricted scenario. Assume that all players have epistemic type
Δ(A), and so play a security maximin strategy on their
subjective representation of the game. The relevant meta-game for this
case is the top-left 4×4 payoff matrix of the full matrix in
Table 1. Essentially, we are then considering the
evolutionary competition between subjective preference types in a world of
security players.

Notice, however, that the payoffs calculated for the “meta-game”
in Table 1 depend on details of our numerical
simulation, in particular on the implicit probability with which particular
types of games are sampled. Fortunately, we can generalize the result
to an analytic statement that is independent of frequency effects,
as long as every possible game has positive occurrence probability.

Proposition 1.
Fix Λ={(τπ,Δ(A)), (τalt,Δ(A)), (τcom,Δ(A))(τreg,Δ(A))},
and G the class of symmetric 2×2 games with i.i.d. payoffs sampled from the set of natural numbers {0,1,...,N}. Then (τreg,Δ(A))
is the only evolutionarily stable type in the population.

Proof. See Appendix.

This is a conceptually noteworthy result: regret minimization
evolutionarily outperforms classic maximin on repeated plays of
2×2 symmetric games. In other words, when playing a security strategy it is strictly better to construe
a game in terms of regrets than in terms of actual payoffs.

Full competition

Consider next the full “meta-game” in
Table 1. A monomorphic population of regret
minimizers ((τreg,Δ(A))) is no longer
evolutionary stable; it could be invaded by expected utility maximizers of types (τreg,¯μ) and (τπ,¯μ). Since the latter are behaviorally equivalent, neither
is an evolutionarily stable strategy, but could at best be
neutrally stable[Maynard-Smith1982:Evolution-and-t]. However, under our simulated
meta-game payoffs from Table 1 any population
consisting entirely of (τreg,¯μ) and
(τπ,¯μ) can be invaded by regret
minimizers. This suggests that all three types would persist under
standard evolutionary dynamics, in various relative proportions.

Simulation results of the (discrete time) replicator dynamics[TaylorJonker1978:Evolutionary-St] indeed show that random
initial population configurations are attracted to states with only
three player types: (τreg,Δ(A)),
(τreg,¯μ) and (τπ,¯μ). The relative proportions of these depend on the initial
population. This variability is eradicated if we add a small mutation
rate to the dynamics. We assume a fixed, small mutation rate
ϵ for the probability that a player’s preference type
or her epistemic type changes to another random preference type
or epistemic type. The probability that a player type randomly mutates
into a completely different player type with altogether different
preference type and epistemic type would then be ϵ2. With
these assumptions about “local mutations”, numerical simulations of
the (discrete time) replicator mutator dynamics[Nowak2006:Evolutionary-Dy] show that for very small
mutation rates almost all initial populations converge to a single
fixed point in which the majority of players are regret types. For
instance, with ϵ=0.001, almost all initial populations are
attracted to a final distribution with proportions:

(τreg,Δ(A))

(τreg,¯μ)

(τπ,¯μ)

0.279

0.383

0.281

Again, this shows that there are plausible and simple conditions under which
agents who represent the game in terms of regret may be favored by
evolutionary selection and that, more specifically, regret minimizers
are evolutionarily supported.

Variable, but correlated epistemic types

The foregoing results were based on the implicit assumption that
players have a fixed epistemic type. Epistemic types were considered
under evolutionary pressure in tandem with preference types. In the
remainder we focus on the evolution of preference types, assuming that
players are of variable epistemic types.

For a clear-cut analytical corollary from the previous results in
Proposition 1 and Remark 1, let us assume that epistemic types are
correlated in random encounters: whenever two player types are matched
to play a game, they are always of the same epistemic type, with
positive probability that both are of either type ¯μ and
Δ(A).3
Under variable, yet correlated epistemic types, it is easy to see that
preference types τreg will outperform type τπ.

Corollary 1. Fix T={τπ,τreg}. If agents’ epistemic types vary between
encounters and both occur with positive probability, but are always
correlated, so that co-players are always of the same epistemic type
in any particular round of play, τreg is the only
evolutionarily stable preference type in the population.

Proof. See Appendix.

Notice, moreover, that since Remark 1 holds for any arbitrary probabilistic belief μ∈Δ(A), Corollary 1 also holds for any arbitrary correlated probabilistic belief μ and not only for flat belief ¯μ. I.e., Corollary 1 holds for any possible probabilistic belief μ.

Variable, uncorrelated epistemic types

Finally, consider the case where epistemic types of players are
variable but uncorrelated. Each player has a fixed preference type, but
is of epistemic type Δ(A) with some probability
p. Probability p is then the average probability of a player being
a security player in the population, where that does not depend on the
player’s preference type. For simplicity, consider p fixed for the
whole population. In that case we can compute, for each p, the
average payoffs of preference types in another 4×4 meta-game,
derived from the full Table 1. It turns out
that there is a very low threshold on p above which regret types
dominate the evolutionary armsrace. With only a small occurrence
probability of security players p=0.01, the derived meta-game
between preference types is:

τreg

τπ

τalt

τcom

τreg

6.634

6.635

5.802

7.450

τπ

6.633

6.633

5.804

7.444

τalt

6.293

6.297

5.484

7.111

τcom

6.295

6.291

5.465

7.111

Regret types are the only evolutionary stable type in this case. In
sum, even with a small probability of lacking a concrete (flat) belief
about the opponent, a subjective representation of payoffs in terms of
regret is favored by evolutionary selection.

The assumption that players and decision makers maximize their
preferences is central through all economic literature, and the
maximization of actual payoffs is often justified by appealing
to evolutionary arguments and natural selection. In contrast to the
standard view, we showed the existence of player types with subjective
utilities different from the actual payoffs that can outperform types
who have subjective utilities equal to the actual payoffs.

While the literature on evolution of preference has focused on
fixed games, or fixed types of games, we have taken a more
general approach here. We suggested that attention to “meta-games”
is interesting, because what may be a good subjective representation
in one type of game (e.g., cooperative preferences in the Prisoner’s
Dilemma class) need not be generally beneficial. In fact, it turns out
that our altruistic and competitive preference types pale in the light
of regret types.

Taken together, we presented a variety of plausible circumstances in
which evolutionary competition between choice principles on a larger
class of games can favor subjective preference transformations
focusing on regret.

References

Footnotes

Formally, this is the negative regret. We use this formulation
because it is the most convenient in this context.

Lack of information might depend for instance on the fact that players
haven’t played enough rounds to learn from experience and to form
a precise probabilistic belief; alternatively, we can imagine that
players have specific probabilistic beliefs if they have already met
and know the co-player, but they do not have a single probabilistic
belief when they meet a co-player for the first time. Similarly, we
can also think that a player has specific probabilistic beliefs when
she is facing a game that she alreday played before, and imprecise
beliefs otherwise.

This assumption of correlation of epistemic
types can be motivated by the idea that some external circumstances
of the game (or its context of presentation or occurrence)
involuntarily trigger players into a particular epistemic type.

Formally, this is the negative regret. We use this formulation
because it is the most convenient in this context.

Lack of information might depend for instance on the fact that players
haven’t played enough rounds to learn from experience and to form
a precise probabilistic belief; alternatively, we can imagine that
players have specific probabilistic beliefs if they have already met
and know the co-player, but they do not have a single probabilistic
belief when they meet a co-player for the first time. Similarly, we
can also think that a player has specific probabilistic beliefs when
she is facing a game that she alreday played before, and imprecise
beliefs otherwise.

This assumption of correlation of epistemic
types can be motivated by the idea that some external circumstances
of the game (or its context of presentation or occurrence)
involuntarily trigger players into a particular epistemic type.

Comments0

Request Comment

You are adding the first comment!

How to quickly get a good reply:

Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.

Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.

Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.

""

The feedback must be of minimum 40 characters and the title a minimum of 5 characters