Reposted: st: cytel challenge

On Wed, 21 Nov 2001, William Gould wrote:
> Lee Sieswerda <Lee.Sieswerda@tbdhu.com> wrote,
>
> > Cytel makes LogXact for doing "exact" logistic regression (www.cytel.com).
> > In their ads they have something called the "Cytel Challenge" where they ask
> > people to try to fit a logistic regression model to the following data:
> >
> > Diar AB Age Hosp
> > 0 0 0 0
> > 6 0 0 1
> > 1.9 0 1 0
> > 2.9 0 1 1
> > 100 1 1 1
> >
> > The percentage of patients with diarrhea (Diar) is the outcome and the other
> > three variables are predictors: [...] Taking the challenge using the
> > -logistic- in Stata fails to produce a converged model. I really don't know
> > the details of how LogXact manages to fit this model, but my question is:
> > would it not be possible to program Stata to do "exact" logistic regression
> > and be able to fit this model? Or is there something inherently different
> > about Cytel's software that it can accomplish this and Stata cannot?
>
> I take issue with Lee's comment that "Stata fails to produce a converged
> model" -- Lee did something wrong -- but I do not take issue with Cytel's
> ad (although I have not seen it).
>
> Something evidently got left out of the ad or the posting because, to do the
> above example, we need to know the population sizes. Nevertheless, I went to
> Cytel's web site and found a longer problem on which the ad was obviously
> based. The URL is http://www.cytel.com/new.pages/LX.ex.04.html. On the
> web, the problem is longer.
>
> In the longer problem, there are more observations, the population is
> included, and there are five independent variables: Cephelaxin, Clindomycin,
> Sex, Age, and LOS. In any case, the web site says,
>
> Challenge: Try fitting a logistic regression model to the data with
> all five covariates included.
>
> so let's do that and see exactly the point Cytel wishes to make.
>
> After loading the data, I had 18 observations and the first five looked like
> this:
>
> . list in 1/5
>
> diarrhea totno cephalex clindomy sex age los
> 1. 0 174 0 0 0 0 0
> 2. 1 113 0 0 0 0 1
> 3. 0 349 0 0 0 1 0
> 4. 16 451 0 0 0 1 1
> 5. 0 213 0 0 1 0 0
>
> To estimate this model, I must use the -blogit- command since that is
> the Stata's logit command for estimating when the dependent data contain
> counts of the positive outcomes and the total population. I also specify
> the -or- option to obtain odds ratios. Here is the result of running the
> model:
>
> ==============================================================================
> . blogit diarrhea totno cep cli sex age los, or
> note: cephalex~=0 predicts success perfectly
> cephalex dropped and 2 obs not used
>
>
> Logit estimates Number of obs = 2488
> LR chi2(4) = 91.48
> Prob > chi2 = 0.0000
> Log likelihood = -218.30047 Pseudo R2 = 0.1732
>
> ------------------------------------------------------------------------------
> _outcome | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> clindomy | 9.198602 2.89523 7.05 0.000 4.963739 17.04648
> sex | .8263463 .2336678 -0.67 0.500 .474751 1.438329
> age | 2.440564 1.176263 1.85 0.064 .948947 6.276803
> los | 11.84492 7.113316 4.12 0.000 3.65051 38.43354
> ------------------------------------------------------------------------------
> ==============================================================================
>
> What Cytel wants you to notice is Stata's message
>
> note: cephalex~=0 predicts success perfectly
> cephalex dropped and 2 obs not used
>
> That is the point of their challange: When cephalex is nonzero, there is
> always a positive outcome:
>
> . list if ceph==1
>
> diarrhea totno cephalex clindomy sex age los
> 17. 1 1 1 0 0 1 1
> 18. 4 4 1 0 1 1 1
>
> There are a total of 5 patients who were observed with cephalex==1 and all
> five patients suffered from diarrhea. How do you interpret that? Does that
> mean cephalex==1 always results in diarrhea? Well, of course it does not.
> With only five such patients, Cytel's computationally intensive methods were
> able to put a confidence interval around the result: [27.52, infinity]. Very
> nice. (I would like somebody to explain to me why the point estimate is a
> finite 207.40 rather than infinite, but I'm sure Cytel has carefully
> considered the answers they produce).
>
> In any case, Stata smartly recognized its limitations and estimated the model
> conditional on cephalex==0. Some other packages might not have recognized the
> problem and gotten messed up in the numerics.
>
> I leave it for you to decide how important it is to put a confidence interval
> around cephalexin in this particular case, but without question there are
> problems for which doing this kind of thing is important.
>
> -- Bill
> wgould@stata.com
> *
> * Help is available at
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/