In modeling claim count data in an insurance environment, I began with Poisson but then noticed overdispersion. A Quasi-Poisson better modeled the greater mean-variance relationship than the basic Poisson, but I noticed that the coefficients were identical in both Poisson and Quasi-Poisson models.

If this isn't an error, why is this happening? What is the benefit of using Quasi-Poisson over Poisson?

Things to note:

The underlying losses are on an excess basis, which (I believe) prevented the Tweedie from working - but it was the first distribution I tried. I also examined NB, ZIP, ZINB, and Hurdle models, but still found the Quasi-Poisson provided the best fit.

I tested for overdispersion via dispersiontest in the AER
package. My dispersion parameter was approximately 8.4, with p-value
at the 10^-16 magnitude.

I am using glm() with family = poisson or quasipoisson and a log link
for code.

$\begingroup$Tried Tweedie from the get-go but our loss data is not ground-up, but rather on an excess basis. Also tried Negative Binomial, ZIP, and hurdle models to address the count dispersion.$\endgroup$
– Frank H.Oct 14 '15 at 15:26

1

$\begingroup$can you explain a bit more about where the non-integer values in your data come from ??$\endgroup$
– Ben BolkerOct 14 '15 at 15:37

6

$\begingroup$you should not model frequencies/rates by computing ratios of counts/exposure. Rather, you should add an offset (offset(log(exposure))) term to your models.$\endgroup$
– Ben BolkerOct 14 '15 at 15:54

1

$\begingroup$It's practical, although most important when doing Poisson (not quasi-Poisson) modeling. I don't know of a good reference offhand; if you can't find a relevant answer here on CrossValidated, it would make a fine follow-up question.$\endgroup$
– Ben BolkerOct 14 '15 at 16:21

1 Answer
1

This is almost a duplicate; the linked question explains that you shouldn't expect the coefficient estimates, residual deviance, nor degrees of freedom to change. The only thing that changes when moving from Poisson to quasi-Poisson is that a scale parameter that was previously fixed to 1 is computed from some estimate of residual variability/badness-of-fit (usually estimated via the sum of squares of the Pearson residuals ($\chi^2$) divided by the residual df, although asymptotically using the residual deviance gives the same result). The result is that the standard errors are scaled by the square root of this scale parameter, with concomitant changes in the confidence intervals and $p$-values.

The benefit of quasi-likelihood is that it fixes the basic fallacy of assuming that the data are Poisson (= homogeneous, independent counts); however, fixing the problem in this way potentially masks other issues with the data. (See below.) Quasi-likelihood is one way of handling overdispersion; if you don't address overdispersion in some way, your coefficients will be reasonable but your inference (CIs, $p$-values, etc.) will be garbage.

As you comment above, there are lots of different approaches to overdispersion (Tweedie, different negative binomial parameterizations, quasi-likelihood, zero-inflation/alteration).

With an overdispersion factor of >5 (8.4), I would worry a bit about whether it is being driven by some kind of model mis-fit (outliers, zero-inflation [which I see you've already tried], nonlinearity) rather than representing across-the-board heterogeneity. My general approach to this is graphical exploration of the raw data and regression diagnostics ...

$\begingroup$Very helpful. I see now that the p-values for the variables and levels of variables in the Poisson are much more statistically significant than for the Quasi-Poisson, due to the scaling you mentioned. I did test for outliers but did not find this to be an issue. What might be some other issues that are being masked by overdispersion, or examples of such approaches to find these issues?$\endgroup$
– Frank H.Oct 14 '15 at 15:46

$\begingroup$Mostly non-linearity of responses on the link (log) scale; check residuals-vs-fitted plots and residuals-vs-predictor-variables plots to see if there are patterns.$\endgroup$
– Ben BolkerOct 14 '15 at 15:53