Load and summarize the data

STATA:

use "C:\Users\CRIME3.dta"
des
sum

R:

[sourcecode language=”r”]
require(foreign)
crime = read.dta(file="/Users/CRIME3.dta")
sumstats(crime)
as.matrix(sapply(crime,class))
[/sourcecode]
If you haven’t yet loaded in the sumstats function, I suggest you do – you can find the code here.

A hypothesis test

See Part 2 of this series for a primer on hypothesis testing. Here, we will do one more example of testing a hypothesis of a linear restriction. Namely, from the regression equation:
where are “district” fixed effects, and is a white noise error term.
We would like to test the following hypothesis:
This can be re-written in matrix form:
Where:

# Equivalently, we can skip creating the R and q matrices
# and use this streamlined approach:
linearHypothesis(reg1a,"clrprc1 = clrprc2")

# Or, we can use the glhtest function in gmodels package
require(gmodels)
glh.test(reg1a, R, q)
[/sourcecode]

First-Differenced model

As a review, let’s go over two very similar models that take out individual-specific time-invariant heterogeneity in panel data analysis. Our example regression is:

where individual and time period are denoted by the and subscripts, respectively.

The within estimator — a.k.a the “fixed effects” model, wherein individual dummy variables (intercept shifters) are included in the regression. All variation driving the coefficients on the other regressors is from the differences from individual specific means (= individual dummy estimates). The new model is:

where represents the individual dummy variables.

The first-differenced model — The first-differenced model creates new variables reflecting the one-period change in values. The regression then becomes where .

Note: These two models are very similar because they “strip out” / “eliminate” / “control for” the variation “between” individuals in your panel data. To do this, they use slightly different methods. The variation left over, and therefore identifying the coefficients on the other regressors, is the “within” variation — or the variation “within” individuals.

STATA:

reg clcrime cavgclr
outreg2 using H3_1312, word replace

There are two ways we can calculate the first-differenced model, given the variables included in CRIME3.dta . Since the data set included changed variables with a “c” prefix (e.g. “clcrime” = change in “lcrime”; “cavgclr” = change in “avgclr”) we can do a simple OLS regression on the changed variables:

The Fixed Effects model

Another way to account for individual-specific unobserved heterogeneity is to include a dummy variable for each individual in your sample – this is the fixed effects model. Following from the regression in the previous section, our individuals MURDER.dta are states (e.g. Alabama, Louisiana, California, Montana…). So, we will need to add one dummy variable for each state in our sample but exclude one to avoid perfect collinearity — the “dummy variable trap”.

In STATA, if your data is set up correctly (e.g. individual in first column, time variable in second column), it is accomplished by adding ,fe to the end of your regression command.

The Breusch-Pagan test for Heteroskedasticity

The Breusch-Pagan (BP) test can be done via a LaGrange Multiplier (LM) test or F-test. We will do the LM test version; this means that only one restricted model is run. identity matrix, homoskedasticity identity matrix, e.g. heteroskedasticity

First, we will run the test manually in three stages:

Square the residuals from the original regression .

Run an auxiliary regression of on the original regressors.

Calculate the BP LM test statistic , where is r-squared fit measure from the auxiliary regression, and is the number of observations used in the regression.

White’s Test for Heteroskedasticity

White’s test for heteroskedasticity is similar the Breusch-Pagan (BP) test, however the auxiliary regression includes all multiplicative combinations of regressors. Because of this it can be quite bulky and finding heteroskedasticity may simply imply model mispecification. The null hypothesis is homoskedasticity (same as BP).

So, here we will run a special case of the White test using the fitted values of the original regression: