Song. An original song written by a graduate student about graduate student-supervisor meetings. It’s a catchy tune! Click Here.

Movie. This is a movie that I took in 2009 while attending a Mathematical Biology Summer School in Botswana. Click here. (Unfortunately, I encountered some technical difficulties uploading this to YouTube, but it’s still watchable, albeit in micro-mini).

Mathematical biology takes many different forms depending on the practitioner. I take mine with one math and two biologys (the so-called “little m, big B”), but others like it stronger (“big M, little b”). Under my worldview, mechanistic models are a tool to analyze biological data; a tool that infuses our knowledge of the relevant biological processes into the analytical framework. That might sound very pie-in-the-sky, and so I’ve made up an example to illustrate what I mean. This example has been constructed so that it doesn’t require any advanced knowledge: if you know how to add and multiply – that’s all you’ll need to answer these questions.

In the example below, the relevant biological processes are described in the section what we know already. You will need to use logical thinking to relate the what we know already section to the data reported in the DATASHEET so that you can answer the questions.

If you have ever wondered ‘what is Theoretical Biology?’ this example helps to answer that question too. Specifically, the required steps to do modelling, as inspired by this example, would be: 1) to write down the information that goes in the what we know already section (you’d refer to these as the model assumptions); 2) to devise a scheme to relate what we know already with the biological quantities of interest (this is the model derivation step); and 3) to report the results of your analysis (model analysis and interpretation).

As you work through this example, think about the types of questions that you are able to answer and how fulfilling it is that careful thinking has enabled us to draw some valuable conclusions. Understand too, that a criticism of mathematical modelling is that, in reality, everything might not happen quite as perfectly as we describe it to happen in the what we know already section. These sentiments capture the good and the bad of mathematical modelling. Mathematical models enable new and exciting insights, but our excitement is temped because these insights are only possible owing to the assumptions that have been made, and while we do our best to make sure these assumptions are good, we know that these assumptions can never be prefect.

If this sounds like fun, then have a go at the example below. If you want to email me your answers, I can email you back to let you know how you did (see here for my email address).

—————————————-

INFLUENZA X

A new and unknown disease, Influenza X, has swept through a small town (popn. 100). Your task is to describe the characteristics of the disease. Health officials want to know:

How many days are citizens infected before they recover?

What fraction of infected citizens died from the disease?

What is the rate of becoming infected?

What we know already

During the epidemic, citizens can be classified into one of these four groups:

Susceptible

Infected

Recovered, or

Dead

As is shown in the diagram:

Only Susceptible citizens can beInfected.

Infected citizens eitherDie or Recover.

Citizens must have been Infected before they can Recover.

Only Infected citizens die from the disease.

Once they have Recovered, citizens cannot be re-infected.

All Infected citizens take the same number of days to Die or Recover.

During the epidemic no one enters or leaves the city. No babies are born; no one dies of anything other than Influenza X.

During the epidemic all that was recorded was the number of citizens who were Susceptible, Infected orRecoverd on each day and the number of people who had Died up until that point. This information is summarized in the DATASHEET provided at the end of this post. This information is also presented graphically below and you’ll get a better understanding of the data by considering how the graphs and the DATASHEET are related (Question 4).

QUESTIONS

Fill in the missing values on the DATASHEET (below).

How many days are citizens infected before they recover?

What fraction of infected citizens died from the disease?

Label the axes on the graphs.

The transmission rate of Influenza X is 0.008 (the units have deliberately been omitted). Consider the graphs above and describe how this rate was estimated?

How is the unknown quantity from the DATASHEET calculated?

DATASHEET

Some definitions

If a patient is infected on Day 1 and recovers on Day 4 that patient is infected for 3 days (i.e., Day 1-3 inclusive).

Infected (cumulative) on Day T means the total number of citizens who have been infected any time from Day 1 to Day T (inclusive). Citizens who subsequently Died or Recovered are included in this number.

As a WordPress blogger, I get a handy list of search terms that have led people to my blog. A particularly memorable search term that showed up on my feed was ‘how to make mathematical models at home’. What I liked about this query was that it suggests mathematical modelling as a recreational hobby: at home, in one’s spare time; just for fun. This speaks to an under-appreciated quality of mathematical modelling – that it’s really quite accessible once the core principles have been mastered.

Now, I know, you want to make your own mathematical model, not just read about other people’s mathematical models in a textbook. To start down this road, I think you should pay attention to two things:

How to make a diagram that represents your understanding of how the quantities you want to model change and interact, and;

Developing a basic knowledge of the classic models in the ecology, evolution and epidemiology including developing an understanding of what these models assume.

A good way to start towards developing your own model would be to identify the ‘classic model’ which is closest to the particular problem you want to look at. If you’re interested in predator-prey interactions, this would be the Lotka-Volterra model, or if you’re asking a question about disease spread, then you need to read about Kermack and McKendrick and the SIR model. Whatever your question, it should fall within one of the basic types of biological interactions, and the corresponding classic model is then the starting point for developing your mathematical model. From there, the next step is to think about how the classic model you’ve chosen should be made more complicated (but not too complicated!) so that your extended model best captures the nuances of your particular question.

Remember that the classic model usually represents the most simple model that will be appropriate, and only in rare circumstances, might you be able to justify using a more simple model. For example, if the level of predation or disease spread for your population of interest is very low, then you might be able to use a model for single species population growth (exponential/logistic/Ricker) instead of the Lotka-Volterra or SIR models, however, if predation and disease spread are negligible, then it arguably wasn’t appropriate to call your problem ‘predator-prey’ or ‘disease spread’ in the first place. Almost by definition, it’s usually not possible to go much simpler than the dynamics represented by the appropriate classic model.

That should get you started. You can do this at the university library. You can do this for a project for a class. And, yes, you can even do this at home!

Footnotes:

*For someone with a background in mathematics some excellent textbooks are:

but while the above textbooks will give you a better understanding of how to perform model analysis, the ‘For Biologist’s’ textbooks listed in this post are still the recommended reading to learn about model derivation and interpretation.

UPDATE: I wrote this, discussing that I don’t really know the justification for the law of mass action, however, comments from Martin and Helen suggest that a derivation is possible using moment closure/mean field methods. I recently found this article:

The paper outlines a method for testing the mass-action assumption of a model without non-linear fitting or parameter estimation. Instead, the method constructs a transformation of the model variables so that all the steady-state solutions lie on a common plane irrespective of the parameter values. The method then describes how to test if empirical data satisfies this relationship so as to reject (or fail to reject) the mass-action assumption. Sounds awesome!

One of the reasons I like this contribution is that I’ve always found mass-action to be a bit confusing, and consequently, I think developing simple methods to test the validity of this assumption is a step in the right direction. Thinking about how to properly represent interacting types of individuals in a model is hard because there are lots of different factors at play (see below). For me, mass-action has always seemed a bit like a magic rabbit from out of the hat; just multiply the variables; don’t sweat the details of how the lion stalks its prey; just sit back and enjoy the show.

Defn. Let be the density of species 1, let be the density of species 2, and let be the number of interactions that occur between individuals of the different species per unit time. Then, the law of mass-action states that .

In understanding models, I find it much more straight forward to explain processes that just involve one type of individual – be it the logistic growth of a species residing on one patch of a metapopulation, or the constant per capita maturation rates of juveniles to adulthood. It’s much harder for me to think about interactions: infectious individuals that contact susceptibles, who then become infected, and predators that catch prey, and then eat them. Because in reality:

Person A walks around, sneezes, then touches the door handle that person B later touches; Person C and D sit next to each other on the train, breathing the same air.

There are lots of different transmission routes, but to make progress on understanding mass-action, you want to think about what happens on average, where the average is taken across all the different transmission routes. In reality, also consider that:

Person A was getting a coffee; Person B was going to a meeting; and Persons C and D were going to work.

You want to think about averaging over all of a person’s daily activities, and as such, all the people in the population might be thought of as being uniformly distributed across the entire domain. Then, the number of susceptibles in the population that find themselves in the same little as an infectious person is probably .

Part of it is, I don’t think I understand how I am supposed to conceptualize the movement of individuals in such a population. Individuals are going to move around, but at every point in time the density of the S’s and the I’s still needs to be uniform. Let’s call this the uniformity requirement. I’ve always heard that a corollary of the assumption of mass-action was an assumption that individuals move randomly. I can believe that this type of movement rule might be sufficient to satisfy the uniformity requirement, however, I can’t really believe that people move randomly, or for that matter, that lions and gazelles do either. I think I’d be more willing to understand the uniformity requirement as being met by any kind of movement where the net result of all the movements of the S’s, and of the I’s, results in no net change in the density of S(t) and I(t) over the domain.

That’s why I find mass-action a bit confusing. With that as a lead in:

How do you interpret the mass-action assumption? Do you have a simple and satisfying way of thinking about it?

________________________________

Related reading

This paper is relevant since the author’s derive a mechanistic movement model and determine the corresponding functional response:

Mechanistic models describe the processes that relate variables to each other, attempting to explain why particular relationships emerge, rather than solely how the variables are related, as a phenomenological model would. Colleagues will ask me ‘is this a mechanistic model’ and then provide an example. Often, I decide that the model in question is mechanistic, even though the authors of these types of models may rarely emphasize this. Otto & Day (2008) wrote that mechanistic and phenomenological are relative model categorizations – suggesting that it is only productive to discuss whether one model is more or less mechanistic than another – and I’ve always thought of this as a nice way of looking at it. This has also led me to think that nearly any model, on some level, can be considered mechanistic.

But, of course, not all models are mechanistic. Here’s the definition that I am going to work from (derived from the Ecological Detective, see here):

Mechanistic models have parameters with biological interpretations, such that these parameters can be estimated with data of a different type than the data of interest

For example, if we are interested in a question that can be answered by knowing how the size of a population changes over time, then our data of interest is number versus time. A phenomenological model could be parameterized with data describing number versus time taken at a different location. On the other hand, a mechanistic model could be parameterized with data on the number of births versus time, and the number of deaths versus time; and so it’s a different type of data, and this is only possible because the parameters have biological interpretations by virtue of the model being mechanistic.

The essence of a mechanistic model is that it should explain why, however, to do so, it is necessary to give biological interpretations to the parameters. This, then, gives rise to a test of whether a model is mechanistic or not: if it is possible to describe a different type of data that could be used to parameterize the model, then we can designate the model as mechanistic.

Validation

In mathematical modelling we can test our model structure and parameterization by assessing the model agreement with empirical observations. The most convincing models are parameterized and formulated completely independently of the validation data. It is possible to validate both mechanistic and phenomenological models. Example 1 is a description of a series of three experiments that I believe would be sufficient to validate the logistic growth model.

Example 1. The model is which has the solution N(t) = f(t, r, K, ) and where is the initial condition, N(0).

Experiment 1 (Parameterization I):

1. Put 6 mice in a cage, 3 males and 3 females and of varied, representative ages. (This is a sexually reproducing species. I want a low density but not so few that I am worried about inbreeding depression). A fixed amount of food is put in the cage every day.

2. Every time the mice produce offspring, remove the offspring and put them somewhere else (i.e., keep the number of mice constant at 6 throughout Experiment 1).

3. Have the experiment run for a while, record the total time, No. of offspring and No. of the original 6 mice that died.

Experiment 2 (Parameterization II):

4. Put too many mice in the cage, but the same amount of food everyday, as for Experiment 1. Let the population decline to a constant number. This is K.

6. Put 6 mice in the cage and the same amount of food as before. This time keep the offspring in the cage and produce the time series N(t) by recording the number of mice in the cage each day. Compare the empirical observations for N(t) with the now fully parameterized equation for f(t,r,K,N(0)).

The Question. Defining that scheme for model parameterization and validation was done to provide context for the following question:

When scientists talk about independent model parameterization and validation – what exactly does that mean? How independent is independent enough? How is independent defined in this context?

If I was asked this, I would say that the parameterization and the validation data should be different. In the logistic growth model example (above), the validation data is taken for different densities and under a different experimental set-up. However, consider this second example.

Example 2. Another way to parameterize and validate a model is to use the same data, but to use only part of the information. As an example consider the parameterization of r (the net reproductive rate) for the equation,

(eqn 1)

The solution to Equation (1) is u(x,t), a probability density that describes how the population changes in space and time, however, another result is that the radius of the species range increases at a rate c=. To validate the model, I will estimate c from species range maps (see Figure 1). To estimate r, I will use data on the change in population density taken from a core area (this approach is suggested in Shigesada and Kawaski (1997): Biological invasions, pp. 36-41. See also Figure 1). To estimate D, I will use data on wolf dispersal taken from satellite collars.

Returning to the question. But, is this data, describing the density of wolves in the core area, independent of the species range maps used for validation? The species range maps, at any point in time, provide information on both the number of individuals and where these individuals are. The table that I used for the model parameterization is recovered from the species range maps by ignoring the spatial component (see Figure 1).

Figure 1. The location of wolves at time 0 (red), time 1 (blue) and time 2 (green). The circles are used to estimate, c, the rate of expansion of the radius of the wolves’ home range at t=0,1,2. The population size at t=0,1,2 is provided in the table. The core area is shown as the dashed line. Densities are calculated by dividing the number of wolves by the size of the core area. The reproductive rate is calculated as the slope of a regression on the density of wolves at time t versus the density at time t-1. For this example, the above table will only yield two data points, (3,5) and (5,9).

While the data for the parameterization of r, and the validation by estimating c, seems quite related, the procedure outlined in Example 2 is still a strong test of Equation (1). Equation (1) makes some very strong assumptions, the strongest of which, in my opinion, is that the dispersal distance and the reproductive success of an individual are unrelated. If the assumptions of equation (1) don’t hold then there is no guarantee that the model predictions will bear any resemblance to the validation data. Furthermore, the construction of the table makes use of the biological definition of r, in contrast to a fully phenomenological approach to parameterization which would fit the equation u(x,t) to the data on the locations of the wolves to estimate r and D, and would then prohibit validation for this same data set.

So, what are the requirements for independent model parameterization and validation? Are the expectations different for mechanistic versus phenomenological models?

and the MPE initiative is likely to bring more attention to blogging around the topic of mathematical biology.

Here at Memorial University of Newfoundland, as part of MPE, we are proud to be hosting the AARMS Summer School on Dynamical Systems and Mathematical Biology. This summer school consists of 4 courses over 4 weeks from July 15 to August 9, 2013. These courses can often be transferred for credit at the student’s home institution and will be taught by leading experts in each of the focus areas. The city of St John’s offers a vibrant downtown, urban parks and walkways, and stunning coastlines. More information to follow.