Re: st: Sample size for four-level logistic regression

Clyde Schechter <clyde.schechter@gmail.com> asks several questions about
sample-size determination for a four-level random-effects logistic model. I
will address some of Clive's questions below.
> Our intervention would be randomized at the level of institutions, which
> have a few levels of outcome-relevant internal hierarchy themselves. The
> outcome is dichotomous and is fairly rare: around 2.5 "successes" per 1,000
> observations. (Observations within institutions will be relatively
> plentiful and inexpensive to obtain electronically, although limited by the
> number of discharges per year they handle. The limit on feasibility will be
> the number of institutions, each of which will need resources to implement
> the intervention and program their data collection.) Ultimately, the
> analysis will require a 4-level logistic regression.
>
> I need to get a sense of how many institutions would need to be recruited
> for the study: if too large, it's a dead letter.
>
> ...
>
> By any chance, will the expanded sample size calculations supported in
> Stata 13 handle this?
The new -power- command does not provide a sample-size computation for the
design that Clive describes.
> ...
> Plan A was to do simulations.
> ...
If Clive decides to pursue the simulation approach, he may benefit from a
forthcoming feature of the -power- command that will allow access to -power-'s
tables and graphs for user-specified power computations. I will talk more
about this feature at the upcoming Stata Conference in New Orleans. This
feature is not yet available in Stata 13, so Clive would not be able to
benefit from it immediately.
> ...
> The problem is that in the simulations, each replication (analysis of a
> single simulated sample) takes 2 hours to run on my setup, even with the
> Laplace approximation. For even one candidate number of institutions and
> set of assumptions about variance components, I will need about 500
> replications to get reasonable precision on the power. So we're talking
> months here.
> ...
> Or is its (Stata 13) speedup in runtime for xtmelogit so great that it will
> deliver me from this problem?
In general, we find the new -melogit- command to be approximately 4 to 10
times faster than the old -xtmelogit- command. There are models for which the
speed increase is greater. It is difficult, however, to comment on the speed
increase in Clive's situation without running timings for his design.
We ran a quick comparison of the timings from the new -melogit- command and
the old -xtmelogit- command fitting a 4-level nested random-intercept logistic
model with a rare outcome using adaptive Gaussian quadrature with 7
integration points. The model had 30 groups at the fourth level, 150 groups
at the third level, and 600 groups at the second level. The number of
observations per group varied between 4 and 60. The total number of
observations was 1800. The new command took 100 seconds to execute and the
old command took 739 seconds to execute. We also considered models with a
smaller and larger total numbers of observations. We found the new command to
be approximately 5 to 7 times faster than the old command for the considered
models.
--Yulia
ymarchenko@stata.com
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/