I Am Too Absolutely Heteroskedastic for This Probit Model

I’m working on a project that uses a binary choice model on panel data. Since I have panel data and am using MLE, I’m concerned about heteroskedasticity making my estimates inconsistent and biased.

Are you familiar with any statistical packages with pre-built tests for heteroskedasticity in binary choice ML models? If not, is there value in cutting my data into groups over which I guess the error variance might vary and eyeballing residual plots? Have you other suggestions about how I might resolve this concern?

I replied that I wouldn’t worry so much about heteroskedasticity. Breaking up the data into pieces might make sense, but for the purpose of estimating how the coefficients might vary—that is, nonlinearity and interactions.

Soren shot back:

I’m somewhat puzzled however: homoskedasticity is an identifying assumption in estimating a probit model: if we don’t have it all sorts of bad things can happen to our parameter estimates. Do you suggest not worrying about it because the means of dealing with it are so noisy? [I had hoped to test for it using the algorithm suggested by Davidson & MacKinnon (1993) and to correct for it using a multiplicative heteroskedasticity model.]

I recently graduated from undergrad so my concerns stem from very recent study of econometrics (the professors for whom I work at first nearly scoffed at my concern), but could you please describe (or point me to a source / paper) on why we might not be so concerned about heteroskedasticity in maximum likelihood binary choice models?

To which I replied:

If you’re worried you can always check your model fitting using some simulated data. (That’s the sort of thing I always say.)

I often encounter these sorts of students who try to use everything they learned in class. Just because you learn it doesn’t you should use it. My data mining class is raising tons of these types of students who just run decision trees and cross-validations automatically with thinking whether those tools are appropriate for their end goal.

If Soren is concerned with estimation and testing, that is, a priori specifying the hypothesis/parameter of interest (a measure of association) for inference via a probit model (not modeling the data for prediction), then heteroskedaticity is of course a concern when drawing inference on the parameter (say beta1) as the asymptotic distribution used for testing and constructing confidence intervals depends upon the ASSUMED mean-variance relationship of the model. A 95% CI would not cover the “truth” (what beta hat is consistent for) 95% of the time; this stems from the fact that the estimated variability of beta hat is wrong.

The use of robust standard errors (sandwich estimator) would allow for valid inference (in the sense that 95% CI would cover the “truth” 95% of the time); that is, properly quantify the variability of beta hat.

IF I understand your answer, you’re pointing that with fake simulated data, you can understand how well the model fits the data and if something like heteroskedasticity is driving your model to the wrong places. Am I right?

Yes. By simulating data with, perhaps, means consistent with a given dataset but variances specified according to guesses about the dimensions along which the het. persists, via Monte Carlo, you can observe rather well how the het. affects the model.

Stata allows you to estimate probit models with multiplicative het’. So you can specify what variables (which may be covariates or external variables) that affect the variance.
The routine is “hetprob”.
With panel data, however it may be (much) more complicated, as you are presumably estimating a random effects model.