Baldwin effect and overcoming the rationality fetish

As I’ve mentioned previously, one of the amazing features of the internet is that you can take almost any idea and find a community obsessed with it. Thus, it isn’t surprising that there is a prominent subculture that fetishizes rationality and Bayesian learning. They tend to accumulate around forums with promising titles like OvercomingBias and Less Wrong. Since these communities like to stay abreast with science, they often offer evolutionary justifications for why humans might be Bayesian learners and claim a “perfect Bayesian reasoner as a fixed point of Darwinian evolution”. This lets them side-stepped observed non-Bayesian behavior in humans, by saying that we are evolving towards, but haven’t yet reached this (potentially unreachable, but approximable) fixed point. Unfortunately, even the fixed-point argument is naive of critiques like the Simpson-Baldwin effect.

Introduced in 1896 by psychologist J.M. Baldwin then named and reconciled with the modern synthesis by leading paleontologist G.G. Simpson (1953), the Simpson-Baldwin effect posits that “[c]haracters individually acquired by members of a group of organisms may eventually, under the influence of selection, be reenforced or replaced by similar hereditary characters” (Simpson, 1953). More explicitly, it consists of a three step process (some of which can occur in parallel or partially so):

Organisms adapt to the environment individually.

Genetic factors produce hereditary characteristics similar to the ones made available by individual adaptation.

These hereditary traits are favoured by natural selection and spread in the population.

The overall result is that originally individual non-hereditary adaptation become hereditary. For Baldwin (1886,1902) and other early proponents (Morgan 1886; Osborn 1886, 1887) this was a way to reconcile Darwinian and strong Lamarkian evolution. With the latter model of evolution exorcised from the modern synthesis, Simpson’s restatement became a paradox: why do we observe the costly mechanism and associated errors of individual learning, if learning does not enhance individual fitness at equilibrium and will be replaced by simpler non-adaptive strategies? This encompass more specific cases like Rogers’ paradox (Boyd & Richerson, 1985; Rogers, 1988) of social learning.
The easiest way to get around the Simpson-Baldwin effect is to postulate that selectively significant environmental change is happening on a time-scale of a single organism’s lifespan. This justification fares well in the age of industrialization and drastic human generated environmental change; it can even apply to other animals — like fishes — where humans cause a strong coupling between the ecological and evolutionary dynamics. However, it seems less relevant to the paleolithic human, and yet we attribute unique cognitive flexibility and learning to the development of early stone tools (although the uniqueness might be more physical and less mental) that characterize that period. Further, we usually attribute our learning abilities to our bigger brains and those are best explained by the social brain hypothesis. This suggests that we should turn our attention from external environmental pressures, to internal (to the species) social pressures. In other words, we should be using evolutionary game theory to look at frequency-dependent selection.

Smead & Zollman (2009) considered general strategic plasticity (learning would be a specific type) in a game theoretic setting. Given an arbitrary symmetric evolutionary game G, they introduce a new strategy L that learns how to play best-response dynamics to any strategy in G and a Nash equilibrium against itself but carries a small cognitive cost c; the resulting game is GL. They show that for all games G without a Pareto dominant mixed strategy Nash equilibrium (i.e. for any generalized social dilemma game), L is not an evolutionary stable strategy in GL. Not being an ESS means that we will never find a monomorphic population of learners, but it doesn’t mean that learners can’t exist as a fraction of a polymorphic populations — this is what we usually expect of social learners, for instance. However, they show that even this is possible only for Hawk-Dove like games where every pure strategy does poorly against itself. Specifically, if there exists a strategy s in G such that it is a best response to itself then L will go to extinction. Even if the cost of learning is not associated with development, but instead depends on the opponent. In such a setting, for L to be an ESS it must be the case that in a world of learners, it is more costly to be a non-learner than a learner. This might be possible for imitation, where it is cheaper than innovation, but it is typically not the case for more sophisticated types of learning.

As an ultimate relaxation, Smead & Zollman (2009) make the mechanism for strategic plasticity free with the only costs coming from mistakes during learning (nobody is perfect before they see any data). In particular, let be the proportion of interactions where L makes sub-optimal decisions, then among two-strategy games, strategic plasticity is an ESS only for games of the form where . That is part of the assurance and prisoner’s dilemma section of cooperate-defect games. Further, to achieve this ESS the learner needs a pretty sophisticated theory of mind: the strategy needs to respond not to what other learners are doing, but what they are ‘trying’ to do.

To study the Simpson-Baldwin effect, Smead & Zollman (2009) extend beyound their analysis of evolutionary stability and consider dynamics. They simulate the discrete-time replicator equation in an inviscid population, and show that starting from a point far from equilibrium, the proportion of strategically plastic agents increases drastically as they are initially selected for in assurance and PD games. However, once they make up a large fraction of the population, the non-learning strategies achieve higher fitness and drive the adaptable agents to extinction. This is exactly what Simpson’s three step recipe predicts. For Hawk-Dove dynamics, the effect is much less drastic, but sometimes present. Thus, for most games and reasonable costs of cognition, frequency-dependent selection is not sufficient to generate strategic plasticity at equilibrium.

What if learning is completely free, and there are many learning strategies present? What if the learner doesn’t have to respond to individual opponents but can learn a global strategy? At least in this setting, can we use evolutionary game theory to justify rational learning rules? Harley (1981) and Maynard Smith (1982) thought so, and argued that the only evolutionary stable learning rules are ones that generate ESS behaviour. Smead (2012) shows that this is not typically the case. He assumes that learning is fast relatively to reproduction, and considers rules like Bayesian learning that are behaviourly persistent. A population of learners is behaviourally persistent if introducing a few mutants does not drastically change the behaviour of the focal population. Bayesian learning would satisfy this since if only a few mutants are introduced then interactions with them are infrequent and thus the agents beliefs about the world are barely changed from these few samples. Smead (2002) shows that if a population is an evolutionary stable state then it is not behaviourally persistent. The most interesting result for me, though, is that if you slow the rate of learning then you can make the learner stable against any invader that leads away from equilibrium (although an agents that expresses the equilibrium behavioral proportions without learning can still invade). I think it is relatively surprising to see a slower learning rate resulting in higher resilience to invasion.

The above invasions usually happen by neutral drift, and so behaviourally Nash equilibrium (NE) is still expressed, even if it is not evolutionary stable. Smead (2012) summarizes this nicely:

This means that there is reason to expect a population to play according to a NE, but that the learning rules used by individuals may not (on their own) take the population to a NE. Consequently, pointing to the evolutionary success of equilibrium-learning rules does not clearly support equilibrium behaviour any more than more basic evolutionary process that act on behaviour directly.

So, to answer the rationality fetish: no, Bayesian learning is not a fixed-point of Darwinian evolution. At least it is not in the most basic and intuitive model of inviscid dynamics. However, in the special case of social learning and Rogers’ paradox, it is known that spatial structure can help learning strategies enhance the fitness of their bearers (Rendell et al., 2010). As such, it might be possible to achieve stability of general learning (maybe even under cognitive costs) in spatial populations. Of course, this isn’t the only way around the problem. We could also follow Valiant (2009) and express machine learning and evolution in the same framework, and show that evolution is a strict subclass of general PAC-learning and thus some behaviours are learnable but not evolvable. In terms of Simon’s three steps, this would be equivalent to saying that step two is not always possible. My preferred option, though, would be to look at rational learning not as superior to memorized strategies, but as a make-do mechanisms for when there are two many basic strategies to memorize. This would give us the perplexing view of rational learning as a consequence of constraints on our descriptive complexity and an artifact of bounded rationality.

About Artem KaznatcheevFrom the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

20 Responses to Baldwin effect and overcoming the rationality fetish

The environmental complexity hypothesis is a relatively new approach that proposes humans developed large brains to “insulate” themselves from environmental change in the variable environment our ancestors occupied. It often claims not to replace the social brain hypothesis, but to be an additional force that drove encephalisation.

If external, rather than internal, forces played a significant role in the development of “rationality” would this change the results of these models?

The environmental complexity hypothesis is new? I thought that was what people thought before the SBH, that our big brains were an artifact of adaptations to better exploit our external environment: hunt better, protect ourselves from the elements better, etc. How is the new version of it different? Or am I confused in my history?

The two Smead models focused specifically on social factors, because it is relatively easy to build a variable environment model where learning is required. This is the sort of philosophy of “cut models apart to understand the essential features” instead of the “build models up to incorporate every subtlety we’ve thought of so far”. I am a big fan of the former, and often very skeptical of the latter, especially if the results of the latter models are simulations and not analytic. I dwell on this a bit in the three types of mathematical models post. In particular, if you can convince me that your model is a good insilication and its parameters can be measured in the real world and output can be quantitatively tested with real world experiments or observations then I am all for building up complex models. However, if your model is heuristic (as almost all models in this field are) then I advocate for learning their essence instead of building up.

With all that disclaimer out of the way. I do think that some interesting things could be done with looking at environmental complexity. For example, a few posts ago, I looked at the work of Livnat & Pippenger where they show how to get interesting (and qualitatively similar to what we see in humans and animals) kinds of irrationality out of a model with a static environment of higher complexity than the brains of organisms are allowed to have. I think some of their ideas can be combined with learning to show that rational learning is actually a consequence of bounded rationality. I have the bones of a model worked out for this, but for now it will remain in my giant folder of “potentially cool future research projects”. Could you point me to some references on the new environmental complexity hypothesis? Any mathematical models in particular?

ECT probably does have a history (I’m surprised you couldn’t trace it back to Aristotle, like the brain cooling); but it has undergone a recent resurgence in popularity and maybe a slight reformation as well. The short story of it now is that large brains developed to allow us to adapt our behaviours to fit a changing environment, meaning we can adapt much faster than genetics would normally allow. This is thought to have been particularly useful in Africa within the last 3 million years or so, when Milankovitch cycles prompted periods of extensive climate variability. Hence why we became encephalised during this period.

Also the SBH happened, but doesn’t explain all of our brain growth.

You make a good case for break down modelling, so perhaps the best bet would be to create a separate model that explores how a learner/innovator would do compared to a more static (but cognitively less expensive) if the fitness of various strategies was dictated by the environment. An environment that could then be varied.

I’m not aware of much of your sort of modelling that’s been done on the subject, with most of the research attempting to investigate how strong variability selection has been over the relevant time frame. The most relevant ones to you I can think of is “Variability Selection in Hominid Evolution” (which attempts to model how a versatilist would flourish compared to specialists in a variable environment and “Change and variability in Plio-Pleistocene climates: modelling the hominin response” which attempts to identify if the sort of variable environment modelled in the other paper occurred, and what sort of effect that may have had on the selection pressures hominins were exposed to.

There is quite a bit of modelling of the sort you mention. These models are very general: they do not distinguish between learning and developmental plasticity, and apply to plants as well (or as badly) as to humans. You could start here:
How Learning Can Guide Evolution
Geoffrey E. Hinton & Steven J. Nowlan
and then the ~1000 articles that Google scholar says cite it…

Mutual shielding (of evolution and learning) is sometimes seen as the opposite of the Baldwin effect. If a species can learn x, it doesn’t need to evolve x, and vice versa. Shielding seems important in evolution/learning interaction and relatively under-studied, although this paper by Griffiths gets pretty close: Innateness and culture in the evolution of language S Kirby, M Dowman, TL Griffiths – Proceedings of the …, 2007 – National Acad Sciences … Genes controlling strength of bias could therefore be shielded from selection …
Your link between Baldwin and Rogers is interesting and echoed in the Griffiths paper.
Tom Shultz

I think this might be a matter of evolutionary biologists and cognitive scientists having differing vocabularies. Learning for biologists would be just a specific type (a very teleological, goal directed variant) of phenotypic plasticity. The ability of phenotypic plasticity to ‘slow down’ or ‘speed up’ evolution and change the population’s resilience to shocks seems to be relatively heavily studied. For example, see: