Testing model specification and using the program version of gmm

This post was written jointly with Joerg Luedicke, Senior Social Scientist and Statistician, StataCorp.

The command gmm is used to estimate the parameters of a model using the generalized method of moments (GMM). GMM can be used to estimate the parameters of models that have more identification conditions than parameters, overidentified models. The specification of these models can be evaluated using Hansen’s J statistic (Hansen, 1982).

We use gmm to estimate the parameters of a Poisson model with an endogenous regressor. More instruments than regressors are available, so the model is overidentified. We then use estat overid to calculate Hansen’s J statistic and test the validity of the overidentification restrictions.

In this post, the Poisson regression of \(y_i\) on exogenous \({\bf x}_i\) and endogenous \({\bf y}_i\) has the form
\begin{equation*}
E(y_i \vert {\bf x}_i,{\bf y}_{2,i},\epsilon_i)= \exp({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i}) + \epsilon_i
\end{equation*}
where \(\epsilon_i\) is a zero-mean error term. The endogenous regressors \({\bf y}_{2,i}\) may be correlated with \(\epsilon_i\). This is the same formulation used by ivpoisson with additive errors; see [R] ivpoisson for more details. For more information on Poisson models with endogenous regressors, see Mullahy (1997), Cameron and Trivedi (2013), Windmeijer and Santos Silva (1997), and Wooldridge (2010).

Moment conditions are expected values that specify the model parameters in terms of the true moments. GMM finds the parameter values that are closest to satisfying the sample equivalent of the moment conditions. In this model, we define moment conditions using an error function,
\begin{equation*}
u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2) = y_i – \exp({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i})
\end{equation*}

Let \({\bf x}_{2,i}\) be additional exogenous variables. These are not correlated with \(\epsilon_i\), but are correlated with \({\bf y}_{2,i}\). Combining them with \({\bf x}_i\), we have the instruments \({\bf z}_i = (\begin{matrix} {\bf x}_{i} & {\bf x}_{2,i}\end{matrix})\). So the moment conditions are
\begin{equation*}
E({\bf z}_i u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)) = {\bf 0}
\end{equation*}

In this case, if \({\bf W}\) is an optimal weight matrix, it is equal to the inverse of the covariance matrix of the moment conditions. Here we have
\[
{\bf W}^{-1} = E\{{\bf z}_i’ u_{i}({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)
u_{i}({\boldsymbol \beta}_1,{\boldsymbol \beta}_2) {\bf z}_i\}
\]

The two-step and iterated estimators used by gmm provide estimates of the optimal W. For overidentified models, the estat overid command calculates Hansen’s J statistic after these estimators are used.

Moment-evaluator program

We define a program that can be called by gmm in calculating the moment conditions for Poisson models with endogenous regressors. See Programming an estimation command in Stata: A map to posted entries for more information about programming in Stata. The program calculates the error function \(u_i\), and gmm generates the moment conditions by multiplying by the instruments \({\bf z}_i\).

To solve the weighted moment conditions, gmm must take the derivative of the moment conditions with respect to the parameters. Using the chain rule, these are the derivatives of the error functions multiplied by the instruments. Users may specify these derivatives themselves, or gmm will calculate the derivatives numerically. Users can gain speed and numeric stability by properly specifying the derivatives themselves.

When linear forms of the parameters are estimated, users may specify derivatives to gmm in terms of the linear form (prediction). The chain rule is then used by gmm to determine the derivatives of the error function \(u_i\) with respect to the parameters. Our error function \(u_i\) is a function of the linear prediction \({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i}\).

The program gmm_ivpois calculates the error function \(u_i\) and the derivative of \(u_i\) in terms of the linear prediction \({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i}\).

Lines 3–4 of gmm_ivpois contain the syntax statement that parses the arguments to the program. All moment-evaluator programs must accept a varlist, the if condition, and the at() option. The varlist corresponds to variables that store the values of the error functions. The program gmm_ivpois will calculate the error function and store it in the specified varlist. The at() option is specified with the name of a matrix that contains the model parameters. The if condition specifies the observations for which estimation is performed.

The program also requires the options depvar() and rhs(). The name of the dependent variable is specified in the depvar() option. The regressors are specified in the rhs() option.

On line 4, derivatives() is optional. The variable name specified here corresponds to the derivative of the error function with respect to the linear prediction.

The linear prediction of the regressors is stored in the temporary variable m over lines 6–12. On line 13, we give the value of the error function to the specified varlist. Lines 14–16 allow the program to exit if derivatives() is not specified. Otherwise, on line 17, we store the value of the derivative of the error function with respect to the linear prediction in the variable specified in derivatives().

The data

We simulate data from a Poisson regression with an endogenous covariate, and then we use gmm and the gmm_ivpois program to estimate the parameters of the regression. We will then use estat overid to check the specification of the model. We simulate a random sample of 3,000 observations.

We generate the exogenous covariates \(x\), \(z\), and \(w\). The variable \(x\) will be a regressor, while \(z\) and \(w\) will be extra instruments. Then we use drawnorm to draw the errors \(e\) and \(u\). The errors are positively correlated.

We generate the endogenous regressor \(y2\) as a lognormal regression on the instruments. The outcome of interest \(y\) has an exponential mean on \(x\) and \(y2\), with \(e\) as an additive error. As \(e\) is correlated with \(u\), \(y2\) is correlated with \(e\).

Estimating the model parameters

Now we use gmm to estimate the parameters of the Poisson regression with endogenous covariates. The name of our moment-evaluator program is listed to the right of gmm. The instruments that gmm will use to form the moment conditions are listed in instruments(). We specify the options depvar() and rhs() with the appropriate variables. They will be passed on to gmm_ivpois.

The parameters are specified as the linear form y in the parameters() option, while we specify haslfderivatives to inform gmm that gmm_ivpois provides derivatives of this linear form. The option nequations() tells gmm how many error functions to expect.

The J statistic equals 0.17. In addition to computing Hansen’s J, estat overid provides a test against misspecification of the model. In this case, we have one more instrument than regressor, so the J statistic has a \(\chi^2(1)\) distribution. The probability of obtaining a \(\chi^2(1)\) value greater than 0.17 is given in parentheses. This probability—the p-value of the test—is large and so we fail to reject the null hypothesis that the model is properly specified.

Conclusion

We have demonstrated how to estimate the parameters of a Poisson regression with an endogenous regressor using the moment-evaluator program version of gmm. We have also demonstrated how to use estat overid to test for model misspecification after estimation of an overidentified model in gmm. See [R] gmm and [R] gmm postestimation for more information.