st:: RE: identifying perfect outcome predictor

I would definitely consider "convergence to +/- infinity" to be a
feature of -glm-, rather than a bug. I sometimes fit models for a binary
disease outcome, with multiple baseline odds, corresponding to subgroups
of subjects, and a single odds ratio, corresponding to the exposure of
interest. In this case, the common odds ratio is the parameter of
interest, and the baseline odds are nuisance parameters. And, if there
are no disease cases in one subgroup, then that subgroup will have a
zero baseline odds (corresponding to a baseline log odds of minus
infinity), and a full set of zero residuals (implying a zero influence
function for all subjects in that group). The estimate and confidence
interval for the common odds ratio will then still be valid, provided
that there are enough disease cases in the other groups, and the
convergence of the baseline odds in the disease-free subgroup to zero
implies only that that disease-free subgroup will have no influence on
the common odds ratio.
Also, like Al, I sometimes fit a logit model with a view to defining a
propensity score, rather than with a view to calculating parameter
estimates that I can take seriously. In those cases, the presence of a
parameter that converges to +/- infinity will often imply only that a
handful of subjects are allocated to the top or bottom propensity group.
(If nearly all subjects are allocated to the top or bottom propensity
group, then I know to make the model a bit more "coarse" and to be less
ambitious about the level of confounder-adjustment that I can
realistically acchieve with the available data.)
Roger
Roger B Newson
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: r.newson@imperial.ac.uk
Web page: www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/pop
genetics/reph/
Opinions expressed are those of the author, not of the institution.
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Feiveson,
Alan H. (JSC-SK311)
Sent: 05 May 2008 19:24
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: RE: identifying perfect outcome predictor
Brilliant! All I really wanted was a linear classifier, so your
suggestion works perfectly.
Thanks, Roger.
Al
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Newson, Roger
B
Sent: Monday, May 05, 2008 1:14 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: identifying perfect outcome predictor
I personally use -glm- (with the options -link(logit) family(bin)-)
instead of -logit-. That way, the offending parameters are allowed to
"converge" to plus or minus infinity without an error message. And the
guilty parameters are then displayed for all to read.
I hope this helps.
Roger
Roger B Newson
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group National Heart and Lung
Institute Imperial College London Royal Brompton campus Room 33,
Emmanuel Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: r.newson@imperial.ac.uk
Web page: www.imperial.ac.uk/nhli/r.newson/ Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/pop
genetics/reph/
Opinions expressed are those of the author, not of the institution.
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Feiveson,
Alan H. (JSC-SK311)
Sent: 05 May 2008 18:34
To: statalist@hsphsun2.harvard.edu
Subject: st: identifying perfect outcome predictor
Hi - I am running logistic regression on simulated data sets. However
sometimes one of the explanatory variables completely separates the
outcome variable and I get a message such as
outcome = X2 <= .1955861 predicts data perfectly r(2000);
Presumably if I get a return code of 2000, I know this has occurred -
but is there information in logit postestimation to tell which variable
gives perfect separation (as opposed to checking each one "by hand")?
Thanks
Al Feiveson
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/