Re: st: Omit Constant from Count Models

On Fri, Sep 14, 2012 at 2:58 PM, Habiger, Matt wrote:
> I'm hoping somebody can inform me of what impact(s) omitting a constant term from count models, such as poisson or negative binomial, have? Does it impact t-statistics or the validity of coefficient estimates?
>
> I'm modeling the number of days a patient spends in a hospital for a given year and the constant is causing the predicted visits distribution to start at ~4 days (exp(1.46)). In the actual data, roughly 25% of days are below 4 (only those with visits are being modeled). When I drop the constant my estimates are much closer to resembling the actual distribution. Below are the outputs from two models for reference.
>
>
> Truncated negative binomial regression Number of obs = 1334
> Truncation point: 0 LR chi2(5) = 99.50
> Dispersion = mean Prob > chi2 = 0.0000
> Log likelihood = -3639.7431 Pseudo R2 = 0.0135
>
> ------------------------------------------------------------------------------
> inpunits | Coef. Std. Err. z P>|z| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> claims2009 | .0047266 .0020829 2.27 0.023 .0006442 .0088091
> previnpunits | .0290032 .007004 4.14 0.000 .0152755 .0427308
> age09 | .0041179 .0016468 2.50 0.012 .0008903 .0073455
> inplow_ind | .1627803 .080974 2.01 0.044 .0040741 .3214865
> inphigh_ind | .2904038 .078893 3.68 0.000 .1357764 .4450312
> _cons | 1.469507 .0654664 22.45 0.000 1.341195 1.597818
> -------------+----------------------------------------------------------------
> /lnalpha | -.418825 .0693098 -.5546696 -.2829804
> -------------+----------------------------------------------------------------
> alpha | .6578193 .0455933 .574262 .7535346
> ------------------------------------------------------------------------------
> Likelihood-ratio test of alpha=0: chibar2(01) = 2745.11 Prob>=chibar2 = 0.000
The statement that "the constant is causing the predicted visits
distribution to start at ~4 days (exp(1.46))" is not quite true. Your
results say that for a (hypothetical) observation with the value 0 on
the variables claims2009, previnpunits, age09, inplow_ind, inphigh_ind
you would predict that such an person would stay about 4 days in
hospital. Depending on these variables, this can be a gross
extrapolation.
In general you do not want to leave the constant out. The idea that
leaving the constant out will lead to better predictions is certainly
wrong. But don't take my word for it, try it out: estimate the model
with and without the constant, use -predict- to predict the expected
days in hospital for each of these models and plot both against the
observed days in hospital.
As always there are exceptions. I think the most common valid reason
for leaving out the constant is when you put it back in through the
backdoor by the way you enter categorical variables, e.g.: M.L. Buis
(2012) "Stata tip 106: With or without reference", The Stata Journal,
12(1), pp. 162-164. You may also be modeling a physical process, where
there is very strong a priori evidence that there can be no constant.
But any process involving humans is just too random to fall in that
class of models.
Hope this helps,
Maarten
---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany
http://www.maartenbuis.nl
---------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/