Model Validation: Interpreting Residual Plots

When conducting any statistical analysis it is important to evaluate how well the model fits the data and that the data meet the assumptions of the model. There are numerous ways to do this and a variety of statistical tests to evaluate deviations from model assumptions. However, there is little general acceptance of any of the statistical tests. Generally statisticians (which I am not but I do my best impression) examine various diagnostic plots after running their regression models. There are a number of good sources of information on how to do this. My recommendation is Fox and Weisberg’s An R Companion to Applied Regression (Chp 6). You can refer to Fox’s book, Applied Regression Analysis and Generalized Linear Models for the theory and details behind these plots but the corresponding R book is more of the “how to” guide. A very brief but good introduction to checking linear model assumptions can be found here.

The point of this post isn’t to go over the details or theory but rather discuss one of the challenges that I and others have had with interpreting these diagnostic plots. Without going into the differences between standardized, studentized, Pearson’s and other residuals, I will say that most of the model validation centers around the residuals (essentially the distance of the data points from the fitted regression line). Here is an example from Zuur and Colleagues’ excellent book, Mixed Effects Models and Extensions in Ecology with R:

So these residuals appear exhibit homogeneity, normality, and independence. Those are pretty clear, although I’m not sure if the variation in residuals associated with the predictor (independent) variable Month is a problem. This might be a problem with heterogeneity. Most books just show a few examples like this and then residuals with clear patterning, most often increasing residual values with increasing fitted values (i.e. large values in the response/dependent variable results in greater variation, which is often correct with a log transformation). A good example of this can be see in (d) below in fitted vs. residuals plots (like top left plot in figure above).

These are the type of idealized examples usually shown. I think it’s important to show these perfect examples of problems but I wish I could get expert opinions on more subtle, realistic examples. These figures are often challenging to interpret because the density of points also changes along the x-axis. I don’t have a good example of this but will add one in when I get one. Instead I will show some diagnostic plots that I’ve generated as part of a recent attempt to fit a Generalized Linear Mixed Model (GLMM) to problematic count data.

The assumption of normality (upper left) is probably sufficient. However, the plot of the fitted vs. residuals (upper right) seems to have more variation at mid-level values compared with the low or high fitted values. Is this patten enough to be problematic and suggest a poor model fit? Is it driven by greater numbers of points at mid-level fitted values? I’m not sure. The diagonal dense line of points is generated by the large number of zeros in the dataset. My model does seem to have some problem fitting the zeros. I have two random effects in my GLMM. The residuals across plots (5 independent sites/subjects on which the data was repeatedly measured – salamanders were counted on the same 5 plots repeatedly over 4 years) don’t show any pattern. However, there is heterogeneity in residuals among years (bottom right). This isn’t surprising given that I collected much more data over a greater range of conditions in some years. This is a problem for the model and this variation will need to be modeled better.

So I refit the model and came up with these plots (different plots for further discussion rather than direct comparison):

Here you can see considerable variation from normality for the overall model (upper left) but okay normality within plots (lower right). The upper right plot is an okay example of what I was talking about with changes in density making interpretation difficult. There are far more points at lower values and a sparsity of points are very high fitted values. The eye is often pulled in the direction of the few points on the right creating difficult in interpretation. To help with this I like to add a loess smoother or smoothing spline (solid line) and a horizontal line at zero (broken line). The smoothing line should be approximately straight and horizontal around zero. Basically it should overlay the horizontal zero line. Here’s the code to do it in R for a fitted linear mixed model (lme1):plot(fitted(lme1), residuals(lme1), xlab = “Fitted Values”, ylab = “Residuals”) abline(h=0, lty=2) lines(smooth.spline(fitted(lme1), residuals(lme1)))

This also helps determine if the points are symmetrical around zero.I often also find it useful to plot the absolute value of the residuals with the fitted values. This helps visualize if there is a trend in direction (bias). It can also help to better see changes in spread of the residuals indicating heterogeneity. The bias can be detected with a sloping loess or smooth spline. In the lower left plot, you can see little evidence of bias but some evidence of heterogeneity (change in spread of points). Again, I an not sure if this is bad enough to invalidate the model but in combination with the deviation from normality I would reject the fit of this model.

In a mixed model it can be important to look at variation across the values of the random effects. In my case here is an example of fitted vs. residuals for each of the plots (random sites/subjects). I used the following code, which takes advantage of the lattice package in R.# Check for residual pattern within groups and difference between groups xyplot(residuals(glmm1) ~ fitted(glmm1) | Count$plot, main = “glmm1 – full model by plot”, panel=function(x, y){ panel.xyplot(x, y) panel.loess(x, y, span = 0.75) panel.lmline(x, y, lty = 2) # Least squares broken line } )

And here is another way to visualize a mixed model:

You can see that the variation in the two random effects (Plot and Year) is much better in this model but there are problems with normality and potentially heterogeneity. Since violations of normality are off less concern than the other assumptions, I wonder if this model is completely invalid or if I could make some inference from it. I don’t know and would welcome expert opinion.

Regardless, this model was fit using a poisson GLMM and the deviance divided by the residual degrees of freedom (df) was 5.13, which is much greater than 1, indicating overdispersion. Therefore, I tried to fit the regression using a negative binomial distribution:

# Using glmmPQL via MASS package

library(MASS)

#recommended to run model first as non-mixed to get a starting value for the theta estimate:

#negbin

glmNB1

summary(glmNB1)

#anova(glmNB1)

#plot(glmNB1)

# Now run full GLMM with initial theta starting point from glm

glmmPQLnb1

Unfortunately, I got the following validation plots:

Clearly, this model doesn’t work for the data. It is quite surprising given the fit of the poisson and that the negative binomial is a more general distribution than the poisson and handles overdispersed count data well usually. I’m not sure what the problem is in this case.

Next I tried to run the model as if all observations were random:

glmmObs1 Again I end up with more problematic validation/diagnostic plots:

So that's about it for now. Hopefully this post helps some people with model validation and interpretation of fitted vs. residual plots. I would love to hear opinions regarding interpretation of residuals and when some pattern is too much and when it is acceptable. Let me know if you have examples of other more subtle residual plots.
Happy coding and may all your analyses run smoothly and provide clear interpretations!