Marginal effects significance vs original model effects significance

03 Mar 2016, 16:44

I often get questions like "This variable has significant effects in the original (logit/probit/heckprobit/whatever) model. But its marginal effect is not significant. Why?" Or vice-versa. I tend to just say that different hypotheses are being tested. But can somebody give a more elegant or complete explanation?

I think this is a multi-faceted problem and that different misunderstandings underlie this question when asked by different people. Among them:

1. Some are making a fetish out of 0.05. They see a p-value of 0.04 in one place and 0.06 in another where they expect them to be "the same" and they freak out.

2. Some have a fundamental misconception of what statistical significance is. They think that statistically significant means "there is an effect" and not statistically significant means "there is no effect." Which again leads them to freak out when they get these apparently contradictory results. For these people, what is needed is re-education about the concept of statistical significance. They need to learn that "statistical significance" is an arbitrary dichotomization of a continuum of degrees of improbability of the outcome under the null hypothesis. I try to get them to think of it differently: a statistically significant result means a combination of the effect in the sample being large enough, and the data being of adequate quantity and quality (low noise) that our estimates of the effect have enough precision that we are highly skeptical of the idea that the true effect size is zero. So our p-values are telling us only indirectly about how big the effect is, and not even actually telling us whether or not it really is zero. I generally prefer to focus them on the confidence intervals, because those are more conducive to thinking about an estimate and the precision of that estimate. Then the p-value can be understood as telling us whether our estimate is both sufficiently far from zero and sufficiently price that it is implausible that we would get such an estimate from a truly zero effect. I also like to point out that in most real-life research situations, the null hypothesis of zero effect is a rather shabby straw man in any case. (As you can tell, I"m not a big fan of p-values.)

As an antidote to #1 and #2 I often advise those I mentor not to use the term "statistically significant" in my presence.

The above two are generic and give rise to a lot of misunderstandings and confusion about results from a wide variety of analyses.

More specific to the situations you mention are the following:

3. Failure to take into account that the coefficients and odds ratios are different metrics from the marginal effect on probability. The non-linearity of these models then produces "paradoxical" results. In fact, in many situations, if you run -margins, dydx(x) at(x = (numlist spanning a wide range of values)) you will get an interesting mix of large and small, "significant" and "non-significant" marginal effects. Drawing a graph of the logistic curve and pointing out that it has a steep section in the middle and flat sections at the ends probably makes the point better than any number of words and sentences. So a unit increment in x may correspond to a rather large increase in predicted outcome probability if we are starting out in the steep area, and a barely visible increase if we are out at the far ends. Once again, due to the noise in our data and sampling variation, we are estimating these marginal effects with a certain degree of precision, and some, but not all, marginal effects will be large enough that we can bound them away from zero at that degree of precision. Just where we are on the logistic curve isn't always obvious from looking at the regression output or the marginal effects, as it depends on the sample distributions of the predictor variables too. The predicted probability for the sample as a whole, or with all variables set at their means, can be helpful for figuring that out. Once you know where you are on the curve, it is easier to see graphically why a marginal effect might be surprisingly large or small in the face of a particular logistic regression coefficient.

Marginal effects are typically (but not always) non-linear functions of all the estimated parameters and explanatory variables. So even if particular coefficient or OR is "statistically significant", it doesn't guarantee that the marginal effect associated with that coefficient is "statistically significant". For that reason, you can get some of the features described by Clyde in #2.

Comment

Clyde's first point is trivial and that is not the point of the question.
The question is not really about why odds ratio is "significant" while the corresponding marginal effect is not. It is really about what does it mean when the respective p-values are very 'far' apart (e.g. the p-value for the odds ratio is .002 an that for the marginal effect is .5). Is this plausible and under what conditions? I accept Stephen's point that the calculation of the marginal effects are typically non-linear functions of all the estimated parameters and explanatory variables. But does this always mean less precision in its calculation? Under what conditions would you get the opposite result? Certainly if you get such large non-trivial differences in the p-values then it is important ask the question about could possibly be the cause.

Comment

Certainly if you get such large non-trivial differences in the p-values then it is important ask the question about could possibly be the cause.

However, I doubt that it's possible to set out, in a general way, the conditions under which "large" divergences appear. Won't it be model-specific? Also, it's likely to be related to how non-linear is the function relating the original statistic of interest and the marginal effect of interest. (And remember that there is no single marginal effect in a non-linear model, so "which marginal effect?" is something for Richard as well.) In this respect, Clyde's point 3 is definitely relevant here.

Comment

Excellent points, and I especially like how Clyde explains point 3. Different hyp are being tested, And, as Stephen adds, there is no single marginal effect. The value of the marginal effect is contingent on how the values of the other variables in the model are set. You may use atmeans or asobserved and get a single number but there are any number of other ways you could set the values of the other variables.

Personally, I mostly focus on the significance of coefficients, not the significance of the marginal effect. Or, if I do look at the significance of the marginal effect, it might be over a range of values. So, for example, the marginal effect of race might be very small at low values of age but much greater at higher values of age. e.g. something like

Comment

An empirical conundrum can arise when doing inference about partial effects rather than coefficients. For any particular variable, wk, the preceding theory does not guarantee that both the estimated coefficient, θk and the associated partial effect, δk will both be ‘statistically significant,’ or statistically insignificant. In the event of a conflict, one is left with the uncomfortable problem of simultaneously rejecting and not rejecting the hypothesis that a variable should appear in the model. Opinions differ on how to proceed. Arguably, the inference should be about θk, not δk, since in the latter case, one is testing a hypothesis about a function of all the coefficients, not just the one of interest.

And I suppose in the case of a bivariate probit with sample selection, the calculation of δk is much more complicated involving parameter estimates and variables from the selection equation as well.Does this mean you are more likely to encounter such conundrums when you have a more complex model like a bivariate probit with sample selection than a simple probit?

Is this anything to do with the consistency of the estimator for the variance-covariance matrix?

Or the delta method used in the calculation?

Comment

Please provide an exact and full bibliographic citation for the quotation from Greene.

I would suggest that the issues are primarily to do with how non-linear the transformations are that relate the original parameter(s) to the partial/marginal effects. (And not the issues raised in your last 2 questions.)

Comment

I kindly ask you to provide quotes, well, under quotation marks (as shown above), as recommended in the FAQ. By reading #7, it turns out difficult to many of us to spotlight who said who, I mean, message and unmarked quotation, unfortunately, got entwined.

Comment

No problem.
Quote from William Greene: "An empirical conundrum can arise when doing inference about partial effects rather than coefficients. For any particular variable, wk, the preceding theory does not guarantee that both the estimated coefficient, θk and the associated partial effect, δk will both be ‘statistically significant,’ or statistically insignificant. In the event of a conflict, one is left with the uncomfortable problem of simultaneously rejecting and not rejecting the hypothesis that a variable should appear in the model. Opinions differ on how to proceed. Arguably, the inference should be about θk, not δk, since in the latter case, one is testing a hypothesis about a function of all the coefficients, not just the one of interest."

page 12: http://archive.nyu.edu/bitstream/2451/26036/2/7-7.pdf

Also, see:
Dowd, BE, Greene, WH & Norton, EC 2014, 'Computation of Standard Errors', Health Services Research, vol. 49, pp. 731-750.
where the authors discuss the issue of consistency of the estimator of variance- covariance matrix and the complexity involved in calculating the standard error of the a marginal effect in a multiple equation model (e.g. bivariate probit with sample selection). If full information maximum likelihood is used then the estimator is consistent.

Sorry I can't figure out how to do the quotes like in the above posts!

Comment

Sorry I can't figure out how to do the quotes like in the above posts!

Chandra: please read the FAQ from top to bottom. (Hit the black bar at the top of the page.) Your post #11 suggests you have mastered how to do the "quote" inserts, but also read about using CODE delimiters. Also read how to use the Advanced editor and its functionality (accessed by clicking on the underlined upper-case A in the editing box for composing messages). With that you can insert hyperlinked URLs

For Stata's capabilities for estimating marginal effects for a "bivariate probit with sample selection", see help heckprobit_postestimation. Also read the associated manual entry for information about methods and formulae. The "complexity" you refer to exists of course, but StataCorp have done a lot of work for you.

Comment

Thanks Stephen, just found the Advanced editor!
I have used Stata capabilities you refer to for estimating my bivariate model with sample selection. Both the outcome and selection equations are important for the research questions I am investigating. Furthermore I have replicate weights, plausible values and missing data to deal with. For each plausible value I have three multiple imputed data sets, making 30 multiple imputed data. Incidentally the two variables for which the p-values of the coeff. and the marginal effect are radically 'far' apart are both continuous.
And thanks everybody for your contribution to this thread.

Comment

That kind of complexity makes my head hurt! I might try skipping a few of the bells and whistles (e.g. do it without the imputation, or without the replicate weights) and see if that changes anything. I wouldn't expect it to, but a mistake somewhere along the way (either by you or by Stata) might affect the results.

Comment

+1 to the post from Clyde. Very nicely put.
Marginal effects are typically (but not always) non-linear functions of all the estimated parameters and explanatory variables. So even if particular coefficient or OR is "statistically significant", it doesn't guarantee that the marginal effect associated with that coefficient is "statistically significant". For that reason, you can get some of the features described by Clyde in #2.

From a technical perspective, Stephen's answer is already sufficient. Coefficients and marginal effects are different quantities, and the latter are often non-linear combinations of the former. Typically, one would expect the p-values to be similar but there is no reason why they have to be close or even identical.

With regard to the discussion whether to look at coefficients or p-values, we should ask: Why are we interested in the respective test result? The coefficients themselves in non-linear models typically do not have a good interpretation when we want to examine the effect of a certain covariate on our outcome variable. When interpreting the results, we would typically look at the marginal effects. Yet, testing for statistical significance of the coefficient estimates can be relevant when we think about the model specification, whether to include or exclude a certain covariate.

A variable could be statistically relevant in the sense that its coefficient estimate is statistically significant but at the same time its marginal effect might be statistically insignificant. The latter does not mean that this variable does not matter because it also affects the marginal effects of all the other covariates. In other words, including it still improves the fit of the model.

Comment

Thanks for the latest comments.
As you rightly point out Richard, the results are in the same ball park without multiple imputation and replicate weights. But I only know this after having estimated the models both ways! There were 10% missing values in aggregate across all variables. As Paul Allison noted, one should impute missing values whenever possible. Also the use of replicate weights and plausible values is what the OECD recommend for the analysis of the PIAAC data that I am using.