Fortran must die

Not a robit

Do you have a source referencing these as concerns in logistic reg? I haven't heard of overdispersion in logistic, Poisson and count models yes. Heterogeneity, can you be more specific? Treatment heterogeneity is a concern when you have an exposure variable with uncontrolled for interaction.

Not a robit

Do you have a source referencing these as concerns in logistic reg? I haven't heard of overdispersion in logistic, Poisson and count models yes. Heterogeneity, can you be more specific? Treatment heterogeneity is a concern when you have an exposure variable with uncontrolled for interaction.

Fortran must die

Thanks. I got over some of my physical problems which made it impossible to post until recently.

A reference, from a book I think you have, is Allison's book "Logistic Regression Using SAS." 2nd ed p 98. Some authors raise more concern than he does, I list him because I think you are familiar with his book.

Logistic regression estimates do not behave like linear regression estimates in one important respect: They are affected by omitted variables, even when these variables are unrelated to the independent variables in the model. You cannot straightforwardly interpret log-odds ratios or odds ratios as effect measures, because they also reflect the degree of unobserved heterogeneity in the model because of this problem. You cannot compare log-odds ratios or odds ratios for similar models across groups, samples, or time points, or across models with different independent variables in a sample. This article discusses these problems and possible ways of overcoming them[this is the only article I have found that says this, although in general such comparisons are difficult across samples even for linear regression].

Not a robit

I get your rationale, but they set it up as a rate (e.g., number of positive trials divided by the number of possible positive trials). But you always see these types of data modeled as counts. I bet a Google search of Poisson would reveal very comparable examples over and over again.

Overdispersion is a completely different issue. In logistic regression it can only happen if your DV can take on more than 2 values; it can't happen with binary 0 vs. 1 outcomes. But if you do have more than 2 DV values, then yes, you should check whether your outcome is dispersed according to your model.

Fortran must die

thanks a lot jake. Although now I have two statistical experts (you and Allison) disagreeing entirely. I never know quite what to do about that - since I am not an expert

We won't be running ordinal or multinomial logistic regression so we will never have more than two levels of the DV. I have been told that the agency we report to uses linear regression (aka linear probability models) for two level DV so we may do that in the end. Crazy as that seems to me, you do what those who pay you tell you to do.

Fortran must die

"For virtually every logistic regression model that we estimate in the real world, there will be some uncorrelated covariates that are statistically associated with the binary outcome, but that we couldn’t observe to include in the model. In other words, there’s always unobserved heterogeneity in our data on covariates we couldn’t measure. But then—the argument goes—how can we interpret the slopes from any logistic regression model that we estimate, since we know that the estimates would change as soon as we included additional relevant covariates, even when there’s no confounding?"

I think in practice this problem exists in linear regression as well. In few if any real world situations are you going to include all variables that are related to the DV. And since many of these omitted will be related to variables in the model as well all slopes are biased (again in real world data).

Which, since you don't know how much the bias is, always seemed to be a strong argument against using regression