st: Regression diagnostics with panel data (-xtreg-)

Dear Statalisters,
I encounter a few difficulties with regression diagnostics after a fixed
effects regression with panel data (-xtreg, fe-).
Previous threads in Statalist give hints, but in some cases ambiguity
remains. Below, I would follow the splendid structure of UCLA's Stata Web
Book on regression diagnostics
(http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter2/statareg2.htm).
Open questions are marked with "Q:".
I would very much appreciate your input!
I am not a professional in this field, and apart from the questions there
might be some inaccuracies in what is written below.
My panel data set is unbalanced, covers 26 years, 40,000 individuals and
315,000 observations. We cluster standard errors for region*year (-xtreg
..., fe vce(cluster region_svyyear)). We use Stata SE 11.2.
I already looked at -findit test panel- and searched Statalist for
"diagnostics xtreg", "diagnostics panel" and other search terms.
Kind regards,
Tobias
Center for Interdisciplinary Economics
University of Muenster, Germany
**************************************
1. UNUSUAL AND INFLUENTIAL DATA
-predict- after -regress- allows to calculate standardized residuals,
leverage, Cook's D and DFITS which can be used to identify outliers and
influential data. It seems that none of them can be calculated after
-xtreg-.
Q: How can you identify influential observations after a (fixed effects)
panel regression? Is it OK to use -regress- with the same equation and do
the diagnostics with DFITS etc. (as suggested here:
http://www.stata.com/statalist/archive/2006-05/msg00075.html)?
Remedy if assumption is violated: Exclude observations above (or below)
thresholds/cut-off points and check if results change.
2. NORMALITY OF RESIDUALS
The overall error component e can be predicted after -xtreg- (-predict res,
e-).
Q: I guess that it is e that should be normally distributed and not the
fixed error component u?
I would then check for normal distribution of the overall error component
with graphs (-kdensity res, normal-, -pnorm res-, -qnorm res-) and formal
tests (-iqr res-, -jb res-, -sktest res-).
Remedy if assumption is violated: Transform variables. Or use bootstrapping
since this method does not assume normal distribution to calculate correct
standard errors and t-values and check if results change.
3. HOMOSCEDASTICITY
Plot residuals vs. fitted values and check for patterns. See also:
http://www.stata.com/support/faqs/stat/panel.html.
Remedy if assumption is violated: Use robust standard errors, either with
-xtreg, vce(robust)- or -xtreg, vce(cluster ...)-.
4. MULTICOLLINEARITY
Calculating variance inflation factors (VIF) seems to be the standard
approach to check for multicollinearity. Again, -estat vif- is only
available after -regress-, but not after -xtreg-.
It has been suggested to compute case- and time-specific dummies, run
-regress- with all dummies as an equivalent for -xtreg, fe- and then compute
VIFs (http://www.stata.com/statalist/archive/2005-08/msg00018.html).
However, in our panel with several thousand individuals it doesn't seem
appropriate to do -regress- with thousands of dummies.
Another thread suggests that multicollinearity is irrespective of the
dependent variable or the link function
(http://www.stata.com/statalist/archive/2003-12/msg00333.html). Thus, you
could for example use -collin- to calculate VIFs even before using -xtreg-
or any other regression command.
Remedy if assumption is violated: Leave out variables causing
multicollinearity.
5. LINEARITY
I would think that a check for linearity is independent of the regression
method used. If so, then you could test for neglected nonlinearities with
the RESET using -estat ovtest- (or -ivreset- with more options) after
-regress-. And -nlcheck- after -xtreg- might give you more information on
linearity or non-linearity for individual regressors. Graphically, you can
always check scatter plots of the dependent variable and regressors for
linearity. Another graphical method suggested in the UCLA Web Book is an
augmented component-plus-residual plot (-acprplot-) after -regress-.
Q: However, I would think that any (graphical) analysis based on residuals
(such as -acprplot- or -rvpplot-) is sensitive to whether -regress- or
-xtreg, fe- is used. Correct?
Remedy if assumption is violated: Transform variables.
6. MODEL SPECIFICATION
I would think that the question if there is an omitted variable or an
irrelevant variable in the model is often more a theoretical one than an
issue which should be tested graphically or formally. But I might be wrong?
The UCLA Web Book suggests -linktest- which works after -regress- but not
after -xtreg-.
Q: Is there a graphical or any other formal test for omitted or irrelevant
variables after -xtreg-?
Remedy if assumption is violated: Exclude or include variables.
7. INDEPENDENCE
Several commands can be used for testing autocorrelation of the error term
with panel data: -xtserial-, -xttest1-, and -pantest2- (see also:
http://www.stata.com/support/faqs/stat/panel.html).
Remedy if assumption is violated: Use -xtregar, fe- to fit fixed effects
model with first-order autoregressive error term.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/