10.4.2. Dispersion in statistical models

For a binomial distribution, the variance np(1-p) depends on the mean np. When the variance in the
observations is bigger or smaller than the expected variance, data are said to
show over- or under-dispersion. Both types of dispersion are indicated by the
goodness-of-fit tests of fitted models by the ratio of the residual deviance of
the fitted model to the number of degrees of freedom, values appreciably larger
than 1 indicating over-dispersion and values lower than1 indicating
under-dispersion. Both types can strongly affect and invalidate model
hypothesis testing (standard errors, confidence intervals and p-values). See
Twisk (2010), Zuur et al. (2009),
Hardin and Hilbe (2007) and Myers et al.
(2002) for examples. Causes of under- or over-dispersion can be related to the
frequency characteristics of the data, with relatively small and large
beekeepers/operations present in different numbers (heterogeneity of the sample
population). An important assumption of a binomial distribution, namely
independence of observations (independent Bernoulli trials), might be violated
when losses are not independent (are clustered) through an unknown factor (i.e.
effects of a certain location, incidence of pathogens) that cannot be used
(properly) in the model.
When under- or over-dispersion
are not reduced after using the most significant model factors derived from the
data and/or stratifying available data according to binomial trial size, the
solution is using a different distribution for the dependent variable. A
suitable candidate is the quasi-binomial distribution, in which variance is
characterised by adding an additional parameter to the binomial distribution,
and hypothesis testing can be corrected for the extra-binomial variance. The
form of the quasi-binomial probability distribution is:

See the manual available online by Kindt and Coe (2005) for an
excellent example of the use of a quasi-binomial distribution and its
differences compared to the standard binomial distribution. An excess of zero
values (no loss) can be a cause of over-dispersion. To investigate the relation
between predictor variables and the presence of zero values (no loss), zero-inflation
techniques can be used (for example, Hall (2000)).