Question on approximations of full logistic regression model

Question on approximations of full logistic regression model

Hi,
I am trying to construct a logistic regression model from my data (104
patients and 25 events). I build a full model consisting of five
predictors with the use of penalization by rms package (lrm, pentrace
etc) because of events per variable issue. Then, I tried to approximate
the full model by step-down technique predicting L from all of the
componet variables using ordinary least squares (ols in rms package) as
the followings. I would like to know whether I am doing right or not.

Re: Question on approximations of full logistic regression model

I think you are doing this correctly except for one thing. The validation and other inferential calculations should be done on the full model. Use the approximate model to get a simpler nomogram but not to get standard errors. With only dropping one variable you might consider just running the nomogram on the entire model.
Frank

細田弘吉 wrote

Hi,
I am trying to construct a logistic regression model from my data (104
patients and 25 events). I build a full model consisting of five
predictors with the use of penalization by rms package (lrm, pentrace
etc) because of events per variable issue. Then, I tried to approximate
the full model by step-down technique predicting L from all of the
componet variables using ordinary least squares (ols in rms package) as
the followings. I would like to know whether I am doing right or not.

Re: Question on approximations of full logistic regression model

Thank you for your reply, Prof. Harrell.

I agree with you. Dropping only one variable does not actually help a lot.

I have one more question.
During analysis of this model I found that the confidence
intervals (CIs) of some coefficients provided by bootstrapping (bootcov
function in rms package) was narrower than CIs provided by usual
variance-covariance matrix and CIs of other coefficients wider. My data
has no cluster structure. I am wondering which CIs are better.
I guess bootstrapping one, but is it right?

Re: Question on approximations of full logistic regression model

The choice is not clear, and requires some simulations to estimate the average absolute error of the covariance matrix estimators.
Frank

細田弘吉 wrote

Thank you for your reply, Prof. Harrell.

I agree with you. Dropping only one variable does not actually help a lot.

I have one more question.
During analysis of this model I found that the confidence
intervals (CIs) of some coefficients provided by bootstrapping (bootcov
function in rms package) was narrower than CIs provided by usual
variance-covariance matrix and CIs of other coefficients wider. My data
has no cluster structure. I am wondering which CIs are better.
I guess bootstrapping one, but is it right?

Re: Question on approximations of full logistic regression model

Thank you for your comment, Prof. Harrell.
I would appreciate it very much if you could teach me how to simulate
for the estimation. For reference, following codes are what I did
(bootcov, summary, and validation).

In stratified sampling, the narrowness factor depends on the
stratum sizes, not the overall n.
In regression, estimates for some quantities may be based on a small
subset of the data (e.g. coefficients related to rare factor levels).

This doesn't mean we should give up on the bootstrap.
There are remedies for the bootstrap biases, see e.g.
Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling
vs. Smoothing, Proceedings of the Section on Statistics and the
Environment, American Statistical Association, 2924-2930.
http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf

And other methods have their own biases, particularly in nonlinear
applications such as logistic regression.

Re: Question on approximations of full logistic regression model

I tried to make a histogram of bootstrap distribution of my logistic
model according to "Regression Model Strategy" (pp197-200). Attached is
the histogram I made. The figure demonstrates bootstrap distribution of
log odds ratio from my logistic model. The solid curve is a kernel
density estimate and dashed curve is a normal density with the dame mean
and standard deviation as the bootstrapped values. Vertical lines
indicate asymmetric 0.9, 0.95, and 0,99 two-sided confidence limits for
the log odds ratio based on quantiles of the bootstrap values.

It seems to me that bootstrap distribution is normal and that estimation
of confidence interval is, ummm, accurate.