When a biologist or a layman tries to reason the evolutionary explanation for something, they would simply use English with some math thrown in (for a random example, pick any explanation out of "The Selfish Gene" -- for example, the reasoning why "discriminate in favor of one's own eggs" strategy is employed by Guillemots, in "Genesmanship" chapter, page 103. I won't quote it in full since it's a page worth of text).

When a biologist tries to actually model evolutionary development to see which traits would win, they would need to somehow teach the computer to implement that model: what the environmental factors are, what the genotype involved is, how exactly it's expressed in different phenotypal and extended phenotypal traits, and how environment would affect an individual with that phenotype.

My question is: is there some sort of standard way to build such a model? A domain specific language (in computer science terminology) that is used by many different biologists or some standard modeling packages/software? E.g. some sort of special XML format, etc...

Or is it always just hand-built custom implementation by individual researchers for their current model?

Just to clarify:

I'm NOT asking what the models look like theoretically. I'm asking what language/format (if any standard one exists) is used to encode them to run simulations.

If there are discrepancies between the type/purposes of models, the ones I'm most interested in are game theoretical ones.

The impetus of this question is, if I'm curious to see how a specific model would behave, would I have to code the entire thing - mode, language, and simulator - from scratch (likely incorrectly, being a layman); or there are some standard packages that do it for you as long as you use some proper language to describe your model.
–
DVKSep 9 '12 at 13:24

Unfortunately the guillemot example is a bad one - Dawkins was wrong about it (and very obviously so). He forgets to account for the fact that if a bird lays twice as many eggs as any other in the colony, his eggs are twice as likely to be excluded from care.
–
Richard Smith-UnnaSep 10 '12 at 1:40

@RichardSmith - Seems counterintuitive to me. If the only eggs being excluded are the ones that no bird claims; you lay 2x eggs, and there's 5% of you with that specific strategy, then (assuming you have a colony of 100, with 1 egg clutch for normals), you have 105 eggs total, so 5 will be excluded. Each egg has ~1% chance of being excluded, so for BOTH of your eggs to be excluded, the chances are 1%x1%=0.01%. Nearly 100 times better than a single-egg bird.
–
DVKSep 10 '12 at 11:16

It's not the chance of both eggs being excluded that Dawkins talks about, he talks about the chance of having an egg excluded. It's still true that it would be worth cheating for the first few birds. The wider point about the trait not being an ESS might be true, but his specific explanation is not. In your specific example, if there are 105 eggs and 5 are excluded, each egg has a ~5% (more precisely, 4.76%), not ~1%, chance of being excluded. In any case, your question was interesting so I'm writing a model of the Guillemot situation. I'll report back when done :)
–
Richard Smith-UnnaSep 10 '12 at 20:40

3 Answers
3

The field most closely associated with game theoretic models in biology is evolutionary game theory. If modeling is required, then the typical paradigm is agent-based modeling, and a good introductory book is:

As for actually building the model, and what to describe/how, I will take you through my usual procedure since this is a field I specialize in:

Define what type of strategies you think are relevant to the interactions you are modeling. Select what you expect the payoff of those strategies to be. For instance, if you are studying the evolution of cooperation, you might select 'Cooperate' and 'Defect' as your strategies and the Prisoner's dilemma as your pay-off matrix, but maybe you will pick something more general. Unfortunately, in most of EGT a clear distinction between genotype and phenotype is not drawn, and they are usually equated. At the end of this step you have a game matrix G. Sometimes when mutation or innovation are explicitly necessary even in the inviscid model, further analytic approaches are taken at this stage. I recommend Hofbauer & Sigmund (2003) for a broad treatment of step 1.

Now you need a basic intuition of what the 'default' behavior in this interaction is, so solve the replicator dynamics of G.

The main interest in EGT right now is structured populations. This is where computational modeling is typically used. However, before I turn to simulation, I first try the best analytic approach I know. I use the Ohtsuki-Nowak transform on G to analytically solve the interaction for random graphs(Ohtsuki & Nowak, 2006).

If I am still interested in the question, and steps 2 and 3 didn't capture all the subtlety of the system I want to study, then I start building a multi-agent computational model. I make sure that my model has some way to scale to the completely inviscid case of replicatory dynamics, and the simply-structured case of ON-transform. If my computational model disagrees with the analytic approach in these regimes, then I become worried. Otherwise, I continue with standard agent-based modeling techniques. Personally, I code in Matlab. I have never seen a modern EGT simulation that required the performance of C/Fortran. As suggested in another answer, if you don't have a programming background then you can use NetLogo. However, my experience is that the models are typically simple enough to implement from scratch, and NetLogo models typically hide from you some very important subtleties (like which reproduction rule to use: death-birth, birth-death, immitation?) and usually lead to weaker papers.

Note the broad theme. These models are typically described as differential equations, and this approach is preferred to agent-based models. However, if a clean differential equations approach does not capture all the subtleties of what you are studying, then an ABM paradigm is adopted.

The particular language a bioligist uses depends on the trade-offs between speed and ease of programming. Many models are written in C or Fortran if speed is paramount. On the other hand people will write models in higher level languages if speed is less important. These would be Python, R, MatLab, etc... In my models, which are written mostly in Python, I write all the classes from scratch and then all the simulation components by hand as well. Since almost all models are mathematical in nature, the language is inconsequential. Algorithms should behave similarly across platforms. If you're looking for examples of easy ways to code up game theoretic models, consider NetLogo, they have some nice examples using game theory.

This. I would say most people use the language they know best, so for non-programmer scientists that's python or perl, for mathematicians it's matlab, mathematica, or berkeley madonna, and for programmers it's c or some descendant.
–
Richard Smith-UnnaSep 10 '12 at 1:44

No generally used DSLs? Is that because it's too hard to develop, or just nobody needs it?
–
DVKSep 10 '12 at 11:10

I'd say that as far as DSLs go, Logo is a DSL for individual based models, but most biologists can program in another language that's more flexible. Also both python (in scipy) and R have ODE solving capabilities so you don't have to use Madonna.
–
emhartSep 10 '12 at 21:37

There is no single way to build such a model. They can vary from a simple mathematical statement like Hamilton's rule (rB>C) to the chemical diffusion models used to describe the patterns in animal skin coloring (like zebra stripes, leopard spots and the like).

There are efforts to build molecular models of entire cells like this model of mycobacterium genitalium dividing, which integrates nearly 30 different mathematical models to describe different aspects of the organism. There are efforts to build such a model of an entire brain as well.

Another common sort of model for evolutionary biology is the use of game theory, where different strategies can be posed one against another as in the prisoner's dilemma competition Dawkins describes in the Selfish Gene.

It goes on and on. Basically biological modeling is driven by the sorts of mathematical models that we know. New models will reveal new paradigms of how biology works. They can be highly mathematical, but their relative importance and when they apply and what they mean are more analogy than proof.

For instance in the prisoner's dilemma, the first contests showed that Tit for Tat was the strongest model - generally assisting others, but betraying when there is a history of betrayal. The ideas at the time moved towards general cooperation in populations. More recent replays have shown that if there is a team of entrants that make extraordinary gifts to each other (allow betrayal without retribution), then they can compete against other models quite well.

One can never prove that a selfish model for the prisoner's dilemma will not show up, though biological systems to seem to be highly cooperative. That is a model, not a proof.

Please see updates to the question. I'm not asking how the models are tested. I'm asking what language (if a standard one exists) is used to communicate the model parameters to the simulation.
–
DVKSep 9 '12 at 15:35

I think the answer would be no - R could be a standard in the coming years, but i've seen several languages touted in various courses including Excel.
–
shigetaSep 9 '12 at 22:39