To cite articles published in the Journal of Artificial Societies and Social Simulation, please reference the above information and include paragraph numbers if necessary

Received: 09-May-2003
Accepted: 03-Aug-2003
Published: 31-Jan-2004

Abstract

This article describes a social simulation model based on an economic experiment about altruistic behavior. The experiment by Fehr and Gächter showed that participants made frequent use of costly punishment in order to ensure continuing cooperation in a common pool resource game. The model reproduces not only the aggregated but also the individual data from the experiment. It was based on the data rather than theory. By this approach new insights about human behaviour and decision making may be found. The model was not designed as a stand-alone model, but as a starting point for a comprehensive Adaptive Toolbox Model. This may form a framework for modelling results from different economic experiments, comparing results and underlying assumptions, and exploring whether the insights thus gained also apply to more realistic situations.

Introduction

Understanding behaviour of human beings in complex decision making
situations is of vital importance for the design of appropriate institutions
for sustainable resources management and for managing transition processes
towards more sustainable resource management regimes. Most initial efforts
have relied on game theoretical approaches and extensions thereof.
However, a number of common pool experiments and other empirical evidence
showed clearly that the assumptions on rational behaviour are not supported
by observation (Ostrom 2000). Researchers explored
for example the importance of trust, reciprocity, and reputation to introduce
and stabilize social norms of cooperation in
a group (e.g. Hayashi et al 1999).
Social simulation may play an important role to develop an improved
representation of the complex dynamics of human-environment systems.

Start from an established formalized theoretical framework (e.g. rational actor
paradigm, game
theoretical approach) to test conditions of applications and the consequences
of relaxing certain assumptions. The extension can be based on
observation or principle considerations about the deficiencies in the framework
(cf. Lindenberg 1991). In general such
approaches remain within the boundaries of a given framework.

Combine concepts from different social sciences to develop an
interdisciplinary
approach and build a simulation framework (e.g. Jager et
al 2000, Epstein and Axtell 1995, Kottonau 2002) as experimental laboratory to explore
the implications of different assumptions on system dynamics. This is a
potentially very rewarding excercise that may support to overcome the
fragmentation of different streams of theories in a field.

Start from very simplified rule-based representations for social
behaviour that are
more determined by considerations of complex dynamics rather than explicit
social science theoretical considerations. This is the approach of
socio-econo-physics or the simple models from Thomas Schelling exploring the
importance of spatial interaction for racial seggregation. The recent article
by Deffuant et al (2002) and the subsequent comments
(von Randow 2003; Deffuant
et al 2003) provided an illuminating example for arguments in favour and
against the sociophysics approach.

Start from observation and extract regularities of behaviour (e.g. Todd and Gigerenzer 2003). This approach does not claim to
achieve an overall synthesis by combining existing theoretical frameworks. But
it is guided by some assumptions on human behaviour (in our case the importance
of heuristics and different basic dispositions towards cooperativeness in human
beings) and relies first of all on observation (also in our case the design of
the experiments was not free from theoretical assumptions).

Each approach has its strengths and weaknesses. The work presented in this paper
is based on the last approach since we are convinced that starting from
observation is required to promote real innovation and integration. The insights
derived from a more inductive approach should, however, be confronted with
established theories to explore possible contradictions or coincidences. In the
end a fruitful exchange among all the approaches outlined above will promote
insights and change.

In this paper we present the idea and the first steps towards the
implementation of an adaptive toolbox (cf. Gigerenzer and Selten 2001) as a multi agent system
with diverse agents
that can behave differently according to different situations and contexts. In
order to capture realistic human behaviour the model is founded in experimental
data rather than theory. In contrast to many approaches that use the data to
test and extend current disciplinary theories we use the data to extract similar
patterns and heuristics that determine human behaviour. By doing so we
apply an interdisciplinary approach in the social sciences. We expect that
such an approach will promote real innovation in our understanding of human
behaviour and its representation in models. It is our goal to contribute
to a coherent simulation framework that allows to explore different perspectives
on human behaviour and compare their strengths and weaknesses as well as
their applicability in different situations. Hence, we make a
strong plea for a pluralistic approach.

WHY an adaptive toolbox?

Representation of human behaviour in models is still surrounded
by huge uncertainties and major controversies. This may be attributed to
a lack of an overall accepted interdisciplinary approach in the social
sciences. Many different and partly contradicting approaches for explaining
human behaviour coexist within social science disciplines and even more
in the different disciplinary approaches. The most formalized theory is
the rational actor paradigm (referred to as RAP in the following) in economics.
Arguably the success of this approach can be attributed to the fact that
formalization provides a base for better communication and unification and to
the simplicity of the concept. The supporters of the RAP have refused for years,
even decades, to acknowledge criticism and scientific arguments providing
evidence for some weaknesses in their assumptions. However, the situation is
slowly changing. The rational actor paradigm of economics becomes enriched by
insights from psychology and sociology (Daniel Kahneman and Vernon Smith were
awarded the nobel price in economics for this, http://www.nobel.
se/economics/laureates/2002/index.html).
Some argue even for a more radical approach to abandon the RAP entirely, to move
from omniscience and perfect foresight towards more
simplified and realistic descriptions that are not due to imperfections of the
human brain but evolved to guarantee survival in complex and dynamic
environments (Gigerenzer and Selten 2001).
It must be the goal of social
simulation to come up with strong alternatives to the RAP. One should move from
the dominance of a single concept to a pluralistic approach with different
perspectives on human behaviour that take into account the importance of context
and the diversity of human beings. Social simulation should provide the base for
an interdisciplinary framework that allows to combine different aspects of human
behaviour that are all required to fully understand the complexity of human
systems. Without imposing the constraining rigour of analytical mathematics, any
simulation approach forces the analyst to be more consistent in his/her
assumptions. Development of coherent simulation frameworks will foster
the development of more comparative and interdiscplinary approaches. This
was also highlighted by Epstein and Axtell (1995) in
their pioneering work on artificial social societies, the sugarscape model.

Progress in the representation of human
behaviour is crucial for improving
the credibility of using social simulation, in particular for real world
applications. The choice of behaviour may crucially
determine model outcomes (Hare
and Pahl-Wostl 2001). The adaptive toolbox is a step towards representing
a range of behavioural types. It will provide a base to explore which
assumptions
on human behaviour are supported by experimental and empirical evidence.
A distinction should be made between experimental and field data. Field
data are derived from controlled experiments with human beings in repeatable
settings. They allow statistical evaluation and have the advantage of
comparability. However, as is the case for any experimental approach the
settings may be quite arbitrary and aim at reducing complexity by eliminating
subjective context as much as possible. The chosen experimental human subjects
(often students) may not always be representative for a wider sample of the
population. Empirical data are derived from observations in case studies or
other real world situations. They have the advantage of portraying the real
world and not an artificial setting. However, given the uniqueness of any real
world situation and its specific context, interpretation and comparability
are difficult. The adaptive toolbox model will support a better use of
experimental observations and a testing if they can be applied to real world
situations. It will allow to explore what determines which kind of
approach is an appropriate representation of human behaviour in a given
context.

In the case reported in this paper data were derived from common pool games
in experimental economics. An advantage of such data sets is the availability
of numerous data which are comparable due to the standardized settings.
The games explore mainly the important aspects of fairness, trust, norms,
cooperative behaviour versus free-riding which are all crucial for understanding
management of common pool resources - our main area of interest. Previous
simulation approaches reproduced the aggregated behaviour of experiments
with different assumptions on the behaviour of agents. They pointed out
that the comparison with aggregated behaviour did not allow them to decide
what would be the more appropriate assumptions on agent behaviour. We explore
here also the behaviour of individuals and hope to get thus more insights
about behavioural processes.

After giving a brief introduction into the concept of an adaptive toolbox
in the next section, we describe our modelling
approach in detail in section 3,
"Building models from data". This article includes first results in section 4 and a discussion of this model in section 5. It concludes with section 6 on the contribution
of the described work to an overall framework and an outlook of this approach's
further perspectives.

Adaptive Toolbox

Human behaviour varies from mentally challenging, deliberate decisions to
unconscious behavioural patterns that follow adopted roles or trained routines.
In different problem environments different levels of consciousness are
employed. This has to be taken into account when these problem environments
are modelled. The adaptive toolbox described below presents a possibility
to deal with this diversity by introducing the notion of heuristics that
lie in between consciously making decisions and unconsciously following
routines.

Concept

Based on Herbert Simon's concept of "Bounded Rationality" (cf. Todd and Gigerenzer 2003) Gigerenzer developed
the notion of an "adaptive toolbox" (Gigerenzer
and Selten 2001). It captures the idea of decision making as the use
of different heuristics under different circumstances. The idea is based
on the suppositions of three concepts: psychological plausibility, domain
specificity, and ecological rationality (Gigerenzer
2001, p. 38).

Psychological plausibility is explicitly opposed to decision making
as a maximizing process with unlimited time, memory and computational
capacities.
It is also different from optimization under constraint (satisfycing),
because instead of calculating an optimal stopping point, the stopping
rule is also a simple heuristic.

Domain specificity covers the idea that heuristics work in some
problem environments, while other heuristics are used in other environments.
Heuristics are composed of simple building blocks, that can be re-assembled
to form other heuristics.

Finally, an ecologically rational heuristic is one
that prevails
in an adaptation process (like evolution), rather than being an "optimal"
decision making process. This leaves room for the coexistence of different
strategies.

The reference to adaptation processes, as described by
evolutionary biology,
is a strong motivation for the adaptive toolbox. Some heuristics that humans
employ, like emotions, can not be justified by the RAP. However, they may
have originated and prevailed in an evolutionary adaptation process.

"The quest for psychological plausibility suggests looking into
the mind, that is, taking into account of what we know about cognition
and emotion in order to understand decisions and behaviour. Ecological
rationality in contrast, suggests looking outside the mind, at the structure of
environments, to understand what is inside the mind." (Gigerenzer 2001, p. 39) For implementing an adaptive
toolbox we have to do both.

The adaptive toolbox consists of building blocks that define the actual
choice (cf. Gigerenzer 2001, p.43). These
building blocks include simple rules for searching for solutions to a given
situation (search rules), stopping the search, if a satisfying (not
necessarily optimal) solution is found (stopping rules), and decision
rules to choose between alternative solutions. The prerequisite of
simplicity of these rules assures the outcome of the decision making process to
be "fast, frugal, and computationally cheap". For instance, emotions can
function as a very simple form of a stopping rule. Decisions are chosen through
simple heuristics, that work, because they take the problem environment into
account. Information processing can be done with reference of the information
providing environment. If, for example, the environment is noisy and information
is scarce, then data are not reliable and therefore calculating with it is bound
to be less effective than searching for cues. Also, if the environment is a
social environment, other agents have to be taken into account, aspects like
fairness and accountability have to be considered (Gigerenzer 2001, p.46). Of course, boundedly
rational agents often have an incomplete and faulty perception of the
environment.

In addition to the rules there is need for learning mechanisms to ensure
ecological rationality. These learning mechanisms may be routine-based learning,
reinforcement learning, or even cognitive learning (Brenner 1999, p.334, p.338).

Implementation

The implementation of an adaptive toolbox aims at a reusable agent model
for social simulation that is not only complex enough to cover different
kinds of human behaviour, while simple enough to be usable for large
populations,
but also captures realistic decision making.

Implementing adaptation through learning processes is an important aspect
of the model. There are two dimensions in which human behaviour changes
due to learning processes, according to different time scales of the
underlying processes. According to the concept of an adaptive toolbox outlined
above, the first dimension is influenced by the environment of a problem.
In some settings the environment changes rather fast. According to changes
in the physical environment the strategy choices vary. The social environment
(other humans) affects a quick learning process which has also influences
strategy choice. The other dimension is the internal disposition the
individuals. This includes both the predispositions, for example the individual
propensity to behave in a fair way, and the prior assumptions about the others'
behaviour. A change in disposition can be seen as a slow learning process. So
far we have not modelled change in disposition.

In addition, we strongly believe in modelling heterogenous agents,
so that there is room for different behavioural traits and beliefs. The
need for modelling different kinds of behaviour arises from the fact that
some experimental evidence can not be explained by neither an "average
behaviour" nor the rational model, like non-linearities and self organization.
The altruistic punishment experiment (cf. Fehr and
Gächter 2002) that is the basis for our first
application is an example that can not be explained by analyzing average
behaviour only.

The implementation itself poses a challenge because both the environmental
settings and the heuristics/behavioural traits have to be encapsulated.
This requires a thorough understanding of the decision making process to
be modelled in order to be able to come up with a valid abstraction.

Building models from data

As has already been indicated in the Introduction
there is no unifying theory of human behaviour in the social sciences. One may
question if such a state is really desirable given the richness of human
behaviour. However, currently the multitude of theoretical approaches
characterizes a state of fragmentation and not a vivid multi-perspective
approach. Numerous theories coexist, in different disciplines, but also within
single disciplines. Often, these theories contradict each other and, more often,
they are heavily disputed within and between disciplines. Example for highly
controversial theories are structural behaviouralism by Talcott Parsons and the
theory of social systems by Niklas Luhmann. Each of these theories is regarded
as conceptional breakthroughs by some collegues, and as fancyful artifacts by
others. This poses a major problem for modellers of social systems and for
modellers, who want to include the "human factor" in their interdisciplinary
models. If a modeller of climate change wants to include human reactions on
perceived or expected weather changes into the model, those reactions should be
based on some notion of how people behave in such situations. However, modellers
have to face the ambiguity of multiple representations of human behaviour.
Matters are complicated further, because usually model outcomes depend crucially
on the theoretical assumptions underlying the implemention of human behaviour.

Economics is the only social science discipline with a dominating theory
of human behaviour, the rational actor paradigm (RAP). Although originally
this was meant to explain only economic activity, it has been adapted
into other areas, and some economists view the RAP as universal. RAP is
very useful in explaining high-cost situations, when actors have a lot
of time, knowledge, and can use computational means.
Furthermore, involved actors have to have rather clear and consistent
preferences, and the important variables have to be quantifiable and comparable.
Firms, for instance, can take their time and invest resources in finding out
about different, possible alternatives for investing in different markets
and then decide on the strategy that yields the highest expected return, all in
terms of money. In these cases, utility maximization is a sound tool that can be
used in models, because it is easy to formalize. However, even companies may
use heuristics when decisions have to be made in very uncertain and complex
situations, when investments are made in innovative products.

However, as has been shown multiple times, the RAP fails to explain many
day-to-day observations as well as experimental evidence. This is only partly
due to the difference of these low-cost situations. People simply do not have a
lot of time and computational capabilities for most of their decisions. They use
habits, when they face familiar situations, and often they act on emotions, when
the situation is new. Additionally, almost always the prerequisites are not met.
Preferences are not usually consistent, and quantifiable; knowledge is
incomplete; etc.
(See Diekmann and Preisendörfer 2001, p. 68 and
Newig 2003 about high-cost and low-cost situations.)

The RAP model starts to be enriched by psychological and social theory
to overcome these explanatory shortcomings. (e.g.
Kahneman and Tversky 2000) The basic underlying assumption and ideal is in
general retained: humans behave in a selfishly rational and optimizing
way. However, it has been shown in many studies, that in order to capture
realistic human behaviour we have to view selfishness and optimization as
possible behavioural traits among others.

By viewing the RAP as only one alternative among different human behaviours
it is possible for us to draw on this theory where appropriate and extend
and complement even replace it where necessary. This is the path of a
pluralistic approach to describing human behaviour. This path is justified by
the described shortcomings of the dominant model and the increasing need to
model human behaviour in any number of situations. This need is reflected by the
emergence of integrated assessment as a discipline (cf. Pahl-Wostl 2002).

Behaviour depends on the context of the decision environment. The most
obvious example is the difference between day-to-day situations, like buying
toothpaste, and important, novel, and single decisions, like buying a house. On
the other hand, behaviour also varies with the diversity of humans themselves.
In order to be able to implement non-liniearities and self-organizational
processes, we need to be able to implement diverse human behaviour in one model.
Agent-based modelling is a suitable tool to do so. But where do we find evidence
for this multitude?

Data may be derived from observations drawn from experiments, case studies,
or mass surveys. Table 1 summarizes advantages and disadvantages of the
three approaches. Since we are interested in deviations from average
behaviour, the statistical approach does not provide appropriate information
for our problem. We note that the remaining two approaches are complementary
in their strengths and weaknesses. Hence a sound strategy should aim at
combining both.

Case studies analyze human actions in a given real world context. They
are very useful for explaining certain actions in certain situations. They
also help to support a theory about the interrelations of a given problem.
However, they do not in themselves give generalizable insights into human
behaviour, because every case is unique. Inductive reasoning still depends
highly on the underlying theory. Therefore empirical evidence derived from
case studies can be only one side of our research programme.

Experimental economics is a way of getting experimental rather than
empirical data. This has the advantage of repeatability, comparability
and statistical evaluation. Recently, a number of such laboratory experiments
have been conducted with the objective to prove the limits of the rational
model. They focus on different aspects of cooperative behaviour. The experimental
setting makes assertions replicable to some extent. By focusing on a few simple
games (for example prisoner's dilemma, common pool resource games, ultimatum and
dictator games) the experimental evidence becomes comparable between different
experiments. Comparability enlargens the data base considerably. Single
experiments can only include about 100 subjects, mostly these are undergraduate
economics students of a single university, so they form a biased sample. Only
some studies deal with comparison between different cultural biases, for
instance Henrich et al. (2001). A further constraint
is that usualy only one or two aspects are covered, for instance the influence
of anonymity (Burnham 2000) or sequencing (Andreoni, Brown, and Vesterlund 2002). Together those
different studies constitute a comprehensive data base on the topic of
deviations from the RAP under different, simple game settings.

There is a small body of related research labled "Parallel Experiments
with Real and Computational Agents" by Tesfatsion
(2002, p.16). There are a few economical studies that deal with both,
experimental settings with human subjects and parallel experiments with
computational agents. However, with the exception of Duffy
(2001) these do not try to capture individual human behaviour, but rather
have (boundedly) rational computational agents evolve over time to show or
explain the observed aggregated behaviour of the human subjects. Learning is
usually implemented as a genetic algorithm. (For example see Pingle and Tesfatsion 2001, Andreoni and Miller 1995).

Duffy (2001) explicitly models individual,
heterogenous behaviour. He uses "hypothetical reinforcement" learning and
diverse agents to reproduce an experiment that is based on the Kiyotaki-Wright
trading model. The agent based simulation is then used to design further
settings for laboratory experiments. By this the simulation results can be
compared with experimental studies that were done only after the simulation
runs.
In another interesting study Deadman and
Schlager 2002 use experimental learning, but also do not try to reproduce
individual decision making of their experimental subjects.
These kinds of models reproduce actual human behaviour better than RAP.
However, they focus on only one single economic aspect, that has been covered by
the parallel experiment. In contrast, our model aims at reproducing diverse,
individual behaviour at an abstract level so that findings from one experiment
can be used to explain those of other experiments, of case studies, and
eventually also behaviour in everyday situations.

Of course, the explanatory power of simple game settings like those of
laboratory experiments has to be mistrusted in respect to day-to-day situations.
But this is exactly the gap that our modelling approach may help to bridge. The
data base composed of data from multiple controlled experiments contains the
inhomogenous human behaviour in simple environments. This is a fitting starting
point for our model. By taking many of these experimental studies into account
we build a model of diverse human actions in diverse environments. We plan to
test this model against empirical data taken from case studies and reconcile it
with this data. By comparing the two different approaches, not only by the
results, but also by the information needed to construct a model, we expect to
gain valuable information on human behaviour and decision theory.

First application

Altruistic Punishment Experiment

A first implementation of an adaptive toolbox was based on the data of
the altruistic punishment experiment by Fehr and Gächter. The experiment
is described in detail in (Fehr and Gächter 2002).
Here only a brief summary is given.

240 participants played an anonymous common pool resource game in groups of
four. 12 of these games were played in a row. Participants did not meet each
other more than twice. Six of the games are played as simple common pool
resource games. The participants received 20 money units of assets and could
contribute between 0 and 20 money units to a common project. The common
investment of the four participants was increased by the experimentor by 60% and
divided evenly among the four. Hence, free riders who did not invest into the
common project received nevertheless an equal share from the common pool
including profits and investement made by other players. The other six games
were also common pool resource games, but now with a subsequent possibility to
punish players for their investment decisions. For every 1 money units (between
0 and 10) invested in the punishment, the punished player had to pay 3 money
units. There have been two experimental settings, each with 120 participants,
devided into five groups of 24 subjects for each experimental session. One
started with six games with the possibility to punish and concluded with six
games without punishment. This will be referred to as setting A. The other
setting started without the possibility to punish and concluded with punishment.
This will be referred to as setting B.

Common investment increases during games with the opportunity to punish
and decreases without.

With an average investment of about ten in the first games without punishment,
the participants' behaviour is far from the prediction of 0 expected for
rational behaviour.

Almost every participant contributed more in games with punishment than
in games without.

In the first games with punishment (game 1 in setting A and game 7 in setting
B) the contribution was higher than in the first games without punishment.
The punishment threat effectively increases investment.

Although it is costly, punishment does occur quite frequently and it is
correlated to the deviation from the mean investment by the punished
player.

Punished subjects usually increased their contribution in the next game.
So, not only the punishment threat but also actual punishment increases
investment.

In games with punishment, the highest return was received by those players
who contributed an amount close to the average investment.

Data analysis

Analysis of the aggregated data does not give us clues about the individual
decisions over time and reactions to behaviour of other participanrs by the
subjects of the experiment. Thus, we analysed individual data rows to find
out about how decisions were altered in reaction to previous experiences.
However, we did only analyse data rows of setting B. Our main working hypothesis
is that most of the participants tried to invest close to the mean investment,
neither defecting nor being the succer for others to exploit.

The observed aggregated behaviour over time can not be explained by assuming
only one average strategy. Expectations decrease as the investment level
decreases during games without punishment. Therefore, there have to be
participants who constantly contributed less than the average. On the other
hand, in games with punishment there have to be participants who contributed
more than average and thus lead to an increase in expectations and,
consequently, also in investment. This theoretical reasoning is supported by an
analysis of individual data.

First of all distribution of investment decisions in the first round has peaks
at 0, 10 and 20 with lesser peaks at 5, 8 and 15. Mean investment is 10.52.
At least three classes of "strategies" can be observed in the individual
behaviour. By "strategy" we mean the way in which the investment decision is
reached, not the investment decision itself. This corresponds to the notion of
decision heuristic. (The possibilities are known as contributing 0 to 20 money units, so
no search and stopping heuristics are needed). One extreme is permanent
defection throughout the games without punishment (maximizing strategy), the
other is permanent cooperation (cooperative strategy). In between are
participants who change their contribution, presumably according to the recently
made experiences (reciprocal strategy). We believe that participants who play
reciprocal strategy were trying to contribute close to the expected mean
contribution.

Apparently, there are also participants who start out as cooperators
or defectors and change to reciprocal behaviour after a number of games
and vice versa. Strategy changes can also be seen as based on heuristics, in
this case triggered by cues. The following cues for strategy changes have been
ascertained from data series of individual participants and their experiences in
the game in setting B:

If common investment is much higher than expected, the tendency to
switch from maximizing to reciprocal or from reciprocal to cooperative
behaviour increases and vice versa with a low investment.

If a defector is the only defector the likeliness of defecting again decreases.
The same is true for cooperators meeting cooperators. Likewise, if a cooperator
encounters one, two or three defectors the willingness to cooperate decreases
accordingly. The same is true for maximizers meeting cooperators. Also,
reciprocalists may imitate behaviour, they encounter often.

If payoff of previous decisions has been higher than the recent payoff,
the current strategy is questioned again. This cue is highly irrational in
games where players do not meet each other again, though in other circumstances
it might be useful.

If a decision leads to a lower payoff than the individual contribution,
there is a strong force towards maximizing strategy.

Punishment may lead to reciprocal or cooperative behaviour in the following
ways: Higher punishment than expected decreases certainty about maximizing
strategies, while lower punishment increases it. A high number of punishers and
a higher punishment than total gain in that round also decreases certainty
about maximizing strategies

These cues have been retrieved from data analysis by first classifying
individual behaviour in the three strategies mentioned above. Then, changes in
investment decisions were classified according to events that happened to the
deciding person. We tried to find a reason for each drop or rise in the
investment decision. Of course, only most of those changes can be explained by
the cues listed above. Also, dependence of the height of the change
to a corresponding cue could only be guessed. These dependencies would have to
be determined in more detail by questionnaires.

There is probably more extensive reasoning involved. On the
other hand, there seems to be also less reasoning involved. Some players seem
to give 8 or 10 money units for a few rounds and then switch their behaviour to a higher
or lower level, which they employ for another few rounds. Additionally, there is
probably a good deal of "random" or "irrational" heuristics involved,
that is not captured by these cues. An example for this is that some participants
drop their investment level without a provocation in the sixth round (which
they assumed to be the last game played). However, the above list of cues
is supported by data and was implemented as heuristics.

In order to ascertain the motivation behind the punishment decisions, Fehr and
Gächter had questionnaires filled out by the participants after the
experiment. Their analysis of the questionnaires led to their deduction that
anger is a major driving force for punishment acts and triggers a "willingness
to punish" (Fehr and Gächter 2002, p. 139).
By analysing individual data we could not find out more about why and when
punishment occured than Fehr and Gächter already did (cf. Fehr and Gächter 2002, p. 139):

Most punishment acts were done by cooperative players and imposed on
defecting players.

Both the frequency of punishment and the height of punishment seem to depend
on the height of the defection of the punished player.

Furthermore, punishment acts are expected by defecting players.

Altruistic Punishment Model

Our implementation reproduces not only aggregated but also individual data
of the experiment. Data analysis of individual behaviour lead us to the
following assumptions as a basis for the model that are summarized in table 3.
The assumptions are described in more detail below the table. With the terms
"Agent" or "Player agent" we refer to the entities in our multi agent
simulation. With "participants" or "humans" we mean the individuals who took
part in the experiment.

Agents have individual inclinations to cooperate and punish. The first
is implemented as one variable cooperativeness, the latter as two
independent variables, one indicating the disposition to be annoyed at
being cheated (inclination to be annoyed), the other defining the
likeliness of spending money to punish a defector (willingness to punish).
These two variables follow the analysis of Fehr and Gächter, who complemented
their experiment by questionnaires (Fehr
and Gächter 2002, p. 139). All three are float values between
0 and 1, 0 indicating no cooperativeness and 1 indicating a high cooperativeness
(or respectively inclination to be annoyed, and willingness to punish).
In our model the original distribution of cooperativeness is an equal
distribution. This assumption is supported by the fact that the mean contribution
in the first game without punishment is close to 10 and there are about
as many participants giving 0 as there are giving 20. As mentioned above, the
distribution in the experiment has peaks at 0, 10 and 20 with lesser peaks at
5, 8 and 15. This may be explained by prominence theory, which states that
humans are much more likely to choose prominent numbers, like 1, 2, 5,
10, 20, ... (Albers 2001). However, this has not been
modelled.

Inclination to be annoyed and willingness to punish are also
distributed evenly. This has been decided due to a lack of knowledge of the
actual distribution of those attributes in the human participants. In
questionnaires participants stated that they would feel angry towards a
defecting individual with increasing intensity corresponding to higher
defection. They also expect anger when they were the defecting individuals. This
is also reflected in the height of the punishment, which increases with the
deviation from the group mean. However, the punishment patterns also differ
between individuals. Therefore is seems logical to assume an equal distribution.
The way, in which punishment decisions are made in the model is described below.

In addition to their own values for cooperativeness, inclination to
be annoyed, and willingness to punish, each agent has a representation
of the other agents' respective mean values, which indicates general belief
about the others. They start out with believing the others to behave similar
to themselves. However, their experiences alter the expectations, but
not their own values (see the discussion of time scales in section 2). By this, agents learn by
improving their beliefs about the social environment, but they do not alter
their own "character". In pseudo code for every round the learning is:

All agents believe the general contribution to be higher in games with
punishment. This offset is 3 money units in setting A and 6 money units in setting B.
These values have been taken from the aggregated data.

In our model, all agents have three strategies to choose from: maximizing,
reciprocal and cooperative. In games without punishment maximizers contribute
0, cooperators contribute 15 to 20, depending on expectations, and reciprocal
strategists invest the same amount of money, they expect others to contribute.
Only maximizers change their "reasoning" in games with punishment, trying
to calculate the lowest contribution that risks no (high) punishment. In
fact, in games with punishment, the only difference in contribution between
reciprocal strategists and maximizers is that maximizers may risk a slightly
lower investment. Contribution close to the mean actually yields the highest
return. This was also true in the original experiment (Fehr
and Gächter 2002, p. 138).

In the following, the three strategies are described in pesudo code. Decisions
are calculated as values between 0 and 1 and are later multiplied by 20 to give
the number of money units that the player agent invests.

Note: Decisions are tried out and outcome is calculated. For this, the
loop starts with an initial decision of 0 and increases it by a predefined
step in this case 0.05 = 1 money unit. The last decision
remembered is the decision that yields the highest expected outcome. The future
games are the games that directly depend on this decision, like the "punishment
game" after an "investment game".

Reciprocal strategy

decision = expected cooperativeness

Cooperative strategy

if (expected cooperativeness > 0.4)
decision = 1
else
decision = 0.75

Note: A decision value of 1 is a contribution of 20 money units and a
decision of 0.75 is a contribution of 15 money units, which is the minimum that
we identified as cooperative behaviour.

The initial strategy of each agent depends on its value for cooperativeness.
The thresholds were taken from the data. Of the participants in setting
B 21% used maximizing strategy, and 32% used cooperative strategy in their
first game. Consequently, we used the thresholds of 0.21 and 0.68 as indicators
for the starting strategy. That is, a cooperativeness below 0.21 leads
to a maximizing strategy, above 0.68 to a cooperative strategy and in between
to a reciprocal strategy.

It is important to note the difference between the strategy of an agent
and its investment choice in a given game. The strategy is a heuristic
and determines in which way the investment decision is made. The same contribution
can be made by player agents employing different strategies. The way in
which the strategies change according to experiences, can be seen as another
form of strategy. In this case it is a heuristic that uses cues. This is
the same for every agent.
For example, in our model reciprocalists change their contribution
according to their experiences because expectations change. However, strategy
changes induced by experiences also occur and lead the reciprocalist to
employ either cooperative or maximizing strategies. The contribution does
not even have to change.
For modelling strategy changes, in addition to the actual strategy
employed, each agent has as certainty for using that strategy. Positive
experiences and expected behaviour by other agents increase certainty, while
negative experiences and unexpected behaviour decrease it. The implemented
cues are described above. In addition, employing a strategy that corresponds
to the agent's cooperativeness increases certainty, while non-compliance
decreases it. With a low certainty the probability of a strategy change
increases.

The cues for strategy changes are checked after every round, those involving
punishment are checked for after punishment has been made. The above list is
transformed into the following checklist and corresponding cue values, 0
(cue was not encountered), 1 (cue was encountered), or 2 or 3
(numDefectors, numCooperators, numPunishers):

For each employed strategy the influence of the cues is different. For example,
if cooperation is higher than expected certainty about cooperative strategy is
increased, but about reciprocal and maximizing strategy it is decreased. This is
modelled as an array of parameters indicating for each of the three strategies
the influence on the certainty about this strategy.

Table 5: Multipliers for the different cues according to each
strategy

Maximizing

Reciprocal

Cooperative

coopIsHigher

-1

-1

2

coopIsLower

1

-2

-2

noDefectors

-2

0

0

numDefectors

1

-2

-2

noCooperators

0

0

-2

numCooperators

0

-1

0.5

profitIsHigher

1

2

1

profitIsLower

-1

-1

-2

profitLtInvestment

0

0

-1

numPunishers

-0.5

0

0

punishmentIsHigher

-1

0

0

punishmentIsLower

1

0

0

punishmentGtGain

-1

0

0

As you can see from the list, only maximizing strategy is influenced by
the punishment cues. These parameters are used as multipliers for the cue
values. The general procedure is:

Cooperativeness below 0.25 and above 0.75 decreases and between 0.25
and 0.72 increases the agent's certainty about reciprocal strategy.

Cooperative Strategy

((-0.5) + cooperativeness) * 2

Cooperativeness below 0.5 decreases and above 0.5 increases
the agent's certainty about cooperative strategy. This is doubled because
cooperative strategy seems to depend more on conviction than the other
strategies.

certainty step = 0.2

certainty tolerance = 0.4

From maximizing strategy and cooperative strategy any change leads to
reciprocal strategy. From reciprocal strategy it depends on whether more
cooperative or defecting cues have been encountered.

Another decision the agents have to make is the punishment decision. As has been
mentioned above, player agents have an attribute for "inclination to be
annoyed" and "willingness to punish". The corresponding heuristic involves
not only those two attributes but also the height of the defection that is to be
punished. We argue for two dependencies. First, the higher the player agents'
inclination to be annoyed and the higher the defection, the more likely it is,
that punishment occured. Second, the higher the player agents' willingness to
punish and the higher the defection, the higher the punishment decision was.
Furthermore, there was also punishment, that did not fall into the pattern, that
cooperative players punished defecting players.

defectionis the difference between the investment
decision of the player agent in question and the mean of the other players,
this is 0 if no defection occured. If there was no defection only
irrational anger leads to a punishment decision. This can only happen, if the
attribute inclination to be annoyed (plus a possible minimal defection) is
bigger than 0.8, as indicated in the firstelseclause.

angerlevelis a test level for the random number, the higher
the angerlevel, the greater the possibility that punishment actually
occured.

annoyanceis the punishing players inclination to be
annoyed attribute.

punishmentis the punishing players willingness to
punish attribute.

punishDecisionis the decision, how much to punish the player
agent in question. This is still a number between 0 and 1.

punishPointsis the points that the punishing player agent
decides to invest in the punishment. (The punishing players pays the number of
points in money units and the punished player pays 3 times that amount.)

both the tolerance and the value for irrational anger were fitted, rather
than taken from the data.

Results

With this implementation we have been able to reproduce both the aggregated
and individual data provided by the altruistic punishment experiment by
Fehr and Gächter.

We made model runs similar to the experimental setting, model run A
starts with games with punishment (see figure 1) and model run B starts
with games without punishment (see figure 2). Model runs have been conducted
with 1200 player agents. We did not do model runs with only 24 agents because of
a strong influence of the random number generator. Even in runs with 120
players the mean investment usually deviates considerably from the mean
investment of the experiment, in some cases not even the trend was
reproduced. In fact, this effect is interesting and needs to be analyzed in more
detail. We believe that the higher variance is due to the lack of prior
knowledge of our agents compared to the experiment's participants.

Only individual data of the experimental setting B was used for calibrating
the model. As can be seen in figure 2, the data from setting A are not
reproduced as well.
In setting B the variance of the twelve data points of the model run from those
of the eyperiment is 0.38. In setting A, however, the variance is 2.51. This
strong deviation may be explained by two reasons. The first reason is the drop
of investment level in the sixth investment game. Some participants defect in
that game because they think it is the last one. Whether or not they expect
others not to punish in the last game or simply take their chance, would have to
be ascertained by questionnaires. The second reason may be that some
participants are angry at having been punished in previous games and therefore
the aggregated level of investment is lower than in setting B without
punishment. Integrating these two aspects in the model leads to figure 3, which
shows a better reproduction of the experimental data than figure 1. For this
altered setting A the variance is 0.61.

For the last round effect all agents expected cooperativeness was reduced by
lastGameOffset = 0.1. This value corresponds roughly to the data.
This could have been modelled as an individual trait of agents, because
it seemed to be only some participants, who defect without provocation in the
last round. However, so far we needed this parameter to be easily accessible and
changeable.

To incorporate increased anger due to previous punishment we increased the
percent of player agents employing maximizig strategy from 21% to 23%.

Another difference is that mean punishment in the model is higher than in the
experiment (1.07 compared to 0.73 mean punishment decision in setting A). The
reason for this may be that participants were more risk avers than agents.

Figure 3. Mean investment in the experiment and modified model
setting A

Reproduction of individual data is harder to prove. The reason
for this is path dependency of individual data. Decisions to increase or
decrease investment depend on recent experiences and those are different for
every player. However, a few examples of participants' and agents' decisions
in setting B are given. As two examples for truly cooperative behaviour
see figures 4 and 5 for a participant of the experiment and an agent from the
simulation respectively.

Figure 4. Mean investment in the experiment and modified model
setting A

Examples for reciprocal behaviour alternating with maximizing decisions are
given in figures 6 and 7. Note that the participant's and agent's investment is
influenced strongly by the experience made in the prior game.

In addition to model runs that are similar to the experiment, we
also made longer test runs (see figure 10). Results are that it takes about
12 rounds for almost every agent to invest 20 money units in games with
punishment and 0 money units in games without punishment. Interestingly,
homogenous investment decisions are possible with different, co-existing
strategies. In games without punishment more than 10% of the agents still use
reciprocal strategy, but since they expect others to invest nothing, they also
do not invest. In games with punishment the setting allows for all three
strategies to co-exist. About 60% of the agents are cooperators, about 20% are
reciprocalists that invest close to 20 money units because they expect others to
contribute that much, and another 20% are maximizers, who
contribute about 19 money units, because they want to avoid punishment.

Figure 10. Mean investment in the long run with and without punishment

Learning and strategy changes are crucial for model behaviour. Learning
rates and, to a lesser extent, the importance of cues are very sensitive
parameters. However, from data alone, we could retrieve only limited and
unreliable information about these aspects. With questionnaires in addition
to the experiment these questions could be addressed more thouroughly.

Discussion

In the previous section we have shown that the altruistic punishment model
reproduces the experiment's data quite well. But we have not yet discussed
how it fits into the concept of an adaptive toolbox.
The adaptive toolbox is based on three principles: psychological plausibility,
domain specificity, and ecological rationality. By analysing data of individual
participants we captured the actual, individual behaviour, both for the
game decision and the reactions to the other participants' actions. For
the implementation this behaviour was classified into a set of behavioural
types, distinguished by an assumed cooperativeness. From this the actual
strategies were derived. The agents' cooperativeness defined the preferred
strategy. Strategies themselves were kept very simple. With the exception
of the maximizing strategy, no calculation is done by the player agents.
Domain specificity was ensured by linking decision strategies to the
game setting. This is done by the implementation. That is, only strategies
that fit to the game currently played may be used by the players. It is
also done by deriving cues for strategy changes from the game setting.
For this model, strategies and cues were predefined by the modeller.
However, in principle the society of agents could learn in an evolutionary
adaptation process, which strategies are possible and which cues are appropriate.
Since our objective is to reproduce observed behaviour, we did not choose
that path. For this reason, ecological rationality came only from data
analysis and not through an evolutionary process.

Decision making within the adaptive toolbox is done by heuristics that
are comprised of simple building blocks, so that they can be applied to
different kinds of decision environments. As outlined in section 2, heuristics are either search
rules, stopping rules, or decision rules. In the model player agents chose
from a predefined set of strategies. The strategies define not the actual
decision, but how the decision is made. That is, they refer to the choice
between different solutions. The way in which the choice between strategies
is made, is also implemented as heuristics, in this case, checking for cues.
Certain cues indicate for player agents that the strategy employed is not
appropriate. These cues induce strategy changes. However, new stratgies are not
(yet) searched for by agents.
Another feature not yet implemented is an evolutionary adaptation
process. Adaptation takes place to some extent, but is restricted to learning
processes about the social environment.

We have started to derive hypotheses about human behaviour from an economic
experiment. By classifying behavioural types it was possible to implement
an agent based model to represent data of the experiment. This was
done as a first module of an adaptive toolbox that is to be expanded in
the near future (see Prospects and Conclusion). The
model is derived from individual rather than aggregated behaviour. By this
the idea of an adaptive toolbox has the potential to integrate different
coexisting representations of human decision making in agent based models. At
the same time it also provides us with a framework for modelling the behaviour
and for comparing different settings.

However, our current approach has some limitations. Experimental economics
focuses on a constrained set of behavioural patterns and considers mainly
extrinsic motivation for behaviour. Optimization of a utility function is always
triggered externally. Furthermore, in economic games all context is removed.
However, it is evident from the results that people respond to emotions, so they
also have intrinsic motivation. This psychological aspect is very hard to
capture in games. In the case of the altruistic puishment experiment, anger was
ascertained as major driving force behind punishment decisions. However, this
was only possible through corresponding questionnaires. In addition, all people
will enter the game with a personality shaped by previous experience. They will
change their strategies but not their personalities during a gaming session.

The model can not cover the multitude of behavioural patterns of all the
participants. Classifications have the advantage of emphasizing some aspects and
general patterns, but always will lessen the variety. Additionally, the
interpretation of the individual data, what reasons there were for behaving in
that way or another, depends on the modeller.

Prospects and conclusion

The heuristics explored in this paper focus
on what determines the willingness of individuals to cooperate and the
development of trust in a group. It is assumed that individuals are
characterized
by their cooperativeness which may be determined by individual character,
individual experience and the cultural context. Nooteboom (2002) suggested quite a useful conceptual
framwork for sources of cooperation that incorporates many elements and
intuitions from literature. Table 8 summarizes the main points. One can make a
distinction between macro and micro sources and between egotistic and altruistic
sources.

This framework is a good base to structure future
observations from both empirical and modelling studies. An individual's
cooperativeness determines his/her expectations about the behaviour of
other players and individual and social learning effects may occur on different
time scales. The work reported in this paper provides a start to compile
a more comprehensive knowledge base.

We also need to compare insights from the model with theoretical approaches.
For instance, cooperativeness was simply modelled as an equally distributed
random variable, it might have been modelled as a combination of two variables,
individualism and altruism, as proposed by Social Value Orientation (as in
Jager and Janssen 2002). Important questions to answer
in this comparison between our model and a social value orientation model are:
In what way do the results differ? How do investment choices depend on
individualism and altruism versus cooperativeness? Does the value orientation of
an agent determine, what cues for strategy changes are important to it? How may
punishment acts be explained by Social Value Orientation?
The last question is important because Fehr and Gächter found that most
punishing acts were done by participants who invested more than average.
However, following Social Value Orientation theory this should not be the case.
Apparantly, there is another aspect, namely anger, involved in that decision.

In order to answer these questions it is necessary to find out about the
individual value orientation of the participants by looking at the individual
data. One might assume that the classifications of cooperative, reciprocal, and
maximizing participants would be the same as a Social Value Orientation
classification of cooperative, individualistic, and competitive. This prediction
is in line with McClintock and Liebrand, who found that the choices of
individualistic players were the most variable ones among those three classes
(cf. McClintock and Liebrand 1988, p. 407). If
the classification was the same, the outcome would likely be the same also, only
the representation of cooperativeness in the model would differ. For this
experiment we did not need to distinguish between individualistic and altruistic
behaviour. For other experiments this might very well be necessary.

The altruistic punishment model is a first step towards an adaptive toolbox
that should be comprised of many different modules. For this reason the
next logical step is to extend the toolbox by more models of other experimental
games. By this it will become possible to compare the validity of the
assumptions made to other findings. This work is currently in progress.

A second step after the extension of the adaptive toolbox is to compare
the insights with results from case studies. The question arises whether
data from case studies deviate considerably from results in experimental
settings. And, if so, what are the main differences?

Major differences, already pointed out, are context dependency and
a longer time scale. Short term behaviour may be typically relevant in
negotiation processes. However, in case studies we are interested in particular
in long-term changes. In our model, these would refer to changes in the
individual types, e.g. the attitude to be cooperative. This implies that the
incentive for behaviour shifts from extrinsic motivation by sanctions through
punishment to intrinsic motivation by the internalization of social norms about
socially acceptable behaviour and about a behaviour that leads to an acceptable
pay-off in a certain social environment. Hardly any individual is cooperative to
an extent to be continuously exploited by others. Thus, on this longer time
scale institutions as another major influence on human behaviour become
relevant. Numerous studies have shown evidence for the importance of
institutions shaping human nature (Held and Nutzinger
1999). Hence it will be of interest to explore systematic differences
determined by culture and institutional contexts. In these cases more important
information can be derived from stakeholder interviews than from experimental
settings.