Modeling Logistic Growth Data in R

My dog rocks. Wilson is friendly to almost everyone (mailmen excepted) and he’s very soft. We’ve had him since he was a puppy and because the wife and I are dorky scientists, we’ve collected (non-invasive) data from him since day one. So today we’ll be modeling growth data, courtesy of Wilson, using R, the “nls” function, and the packages “car” and “ggplot2”. For reference, I drew on Fox and Weisburg (2010).

Wilson’s growth looks like a logistic function. As a puppy, he put on the pounds quickly (yep, I remember that), and he has flattened out around 75 lbs (thank god). Although I will say that he still thinks he is a lap dog.

A logistic growth model can be implemented in R using the nls function. “nls” stands for non-linear least squares.

The logistic growth function can be written as

y <-phi1/(1+exp(-(phi2+phi3*x)))

y = Wilson’s mass, or could be a population, or any response variable exhibiting logistic growth
phi1 = the first parameter and is the asymptote (e.g. Wilson’s stable adult mass)
phi2 = the second parameter and there’s not much else to say about it
phi3 = the third parameter and is also known as the growth parameter, describes how quickly y approaches the asymptote
x = the input variable, in our case, days since Wilson’s birth

One important difference between “nls” and other models (e.g. ordinary least squares) is that “nls” requires initial starting parameters. This is because R will iteratively evaluate and tweak model parameters to minimize model error (hence the least squares part), but R needs a place to start. There are functions in R that obviate the need for imputing the initial parameters, these are called “self starting” functions and in our case it would be the “SSlogis” function. But for now, we’ll skip that and give R some initial parameters manually.

coef(lm(logit(mass/100)~days.since.birth,data=data))

This calls for the coefficients of a linear model (slope and intercept) using the logit transform (log of the odds) and scaling the y by a reasonable first approximation of the asymptote (e.g. 100 lbs)

(Intercept) days.since.birth
-1.096091866 0.002362622

Then, we can plug these values into the nls function as starting parameters.

We can see that our initial parameters weren’t too far off.
Next, let’s create the model predictions and plot the data. (I’ve noticed that copying and pasting this ggplot script isn’t working in R because of the quotation marks. Anybody know the solution for this? Temporarily, just substitute the quotation marks from this text with regular ones within R or R Studio.

Hey, that looks like a pretty good model! It suggests that Wilson will asymptote at 71.57 lbs (Wilson, lose some weight buddy!). The model is under predicting his weight in the later region of the data, probably because of the two data points near the inflection point. Overall, I’m pretty happy with the model though!

There are some other packages (e.g. package “grofit” looks promising) and growth functions (e.g. gompertz) worth exploring because they can streamline some of the code, but we’ll save that for a future post.

Thanks for the data Wilson!

Check this out if you want to see how temperature affects whether Wilson is panting or not.

Hey Dustin! Yes you could totally use this to distinguish groups. I am unsure of the specific syntax using ‘nls’ but I’ve done similar things with logistic regression. Off the top of my head, you could look at parameter estimates between groups and see if they differed (e.g. compare confidence intervals of the estimates). If the groups had differing phi values that did not overlap, that would be evidence for different growth models. I will try and look at the code for this….

Growth rates typically follow some pattern, assuming resources are not limited. This is why most studies evaluate growth under ad libitum resource availability. I would first ask why your data don’t have any discernible trend. Second, what is your question of interest?

Dear Brian,
I am trying to fit the predicted curve on my data, but I think there is something wrong in the curve.
Can you review my script to look what is wrong?
Thank you very much for your attention.
Sincerely, Bruna

Hi Taryn! For an approximate 95 CI you could double the standard errors around the slope coefficients. But I think there is a better method involving pulling estimates from the profiled log-likelihood, which will allow for asymmetric CIs (I’ll have to look into this). Another approach would be to bootstrap the data and generate CIs that way. So for example, simulate 999 more Wilsons (with replacement) and calculate model coefficients. Then take the 2.5 and 97.5 percentiles of those models and there is your 95% CI.

Hi Brian,
I am very interested in this code! I am trying to predict the size of a given population using the following line of code:
predict.pop <- predict(pop, data.frame(time = c(2020, 2030)))
It works very well but I wonder if it would be possible to provide some confidence intervals along with these estimates as you discuss above with Taryn?

Hi Antoine!
“confint(model)” will return the lower and upper 95% CI of the parameter estimates for your model. Try modeling both upper and lower bounds and using geom_ribbon to fill in the prediction. If this doesn’t make sense, perhaps I can generate a follow up post to highlight this. Cheers, Brian

thanks for being so responsive!
Your answer makes sense but I am struggling a little bit with my code.
By running the predict cmd, I get the expected pop size at a given year.
The LGM I used is the following:
nls(value ~ SSlogis(time, phi1, phi2, phi3), mydata)
it is based on the following webpage:
When I run confint on the model itself:
confint(nls(value ~ SSlogis(time, phi1, phi2, phi3),data=mydata)), it works but gives me the CI of phi1, phi2 and phi3.
My goal is to obtain the confint of predict…
I tried to adapt the model above with the given upper CI of phi1, phi2 and phi3 but was not successful
Thoughts?
I am happy to try an optimized/alternative model if you favor another one.

Hi Brian,
I am reaching out to you again regarding the lgm model I am using. Thanks to your advices, I was able to run my model and got CI for phi1, phi2 and phi3 with the following:
model <- nls(size ~ SSlogis(date2, phi1, phi2, phi3), data = mydata)
confint(model)
The code above gives me the CI of all phis.
Next, I tried to estimate the predicted pop size in the future by doing this:
predict.model <- predict(model, data.frame(year = c(2020,2030,2040)))

…but I am still unable to generate the CI for the 'predicted pop size'. Any chance you can help me with that? Does that make sense?

I’m not sure if plotting confidence intervals the long route is statistically sound. I’ll have to gander at the texts. Another solution would be to bootstrap simulate the data and plot the upper and lower bounds of the simulated data. We did this in a paper, let me know if you want to see that solution.

The paper is Komoroske et al 2014 Conservation Physiology (full citation on my publication page). But no code attached to the paper. I can share the code when I return to the states. Feel free to email me if you haven’t heard from me in a day or two!

Hello I am Victor, from Chile, I am glad to meet you and thank you for the explanation. Using your code for learning, I identify a little typing mistake, that blows some minutes my mind, in the 14th line phi1 does not have added the letter “-“, for the rest. I am grateful for you, because despite that spanish is my mother tongue, the spanish explanations were not clear for me. Best Regards from Southern Chile.