Re: st: event history analysis with years clustered in individuals

Yes, maybe I might do that; at least I will write it in my essay to
explain why I chose to perform a survival analysis with the
hshaz-command. This have been very informative, and the link to the
essex-web site on discrete time survival models and the suggestion on
the hshaz model 2 was exactly what I needed. Thank you so very much
for taking your time helping me.

Regards,
Hilde
Quoting Steven Samuels <sjhsamuels@earthlink.net>:

--
Hilde-

You might explain to the professor that, with survival data, the
number of years of observation is itself the (posssibly censored)
outcome. Therefore "year" cannot be a level 1 effect in a
multilevel model;

-Steve
On Feb 15, 2009, at 3:43 PM, Hilde Karlsen wrote:

Ah. Ok, I see I have to do some serious rethinking when it comes to
this essay, then. I guess this to a certain degree explains why I
have trouble understanding what sigma_u refers to in this specific
analysis. I am wondering if I should forward this e-mail
correspondance to the professor who held the course in multilevel
techniques, because what I've learned from you today are not in
line with what we were told at the course when it comes to this
matter. Anyway. Thank you so much for the advice and for answering
me.

Regards,
Hilde
Quoting Steven Samuels <sjhsamuels@earthlink.net>:

I agree with Austin. Just to be clear: sigma_u is a parameter that
is meaningless for this problem, No interpretation is possible.

On Feb 15, 2009, at 9:22 AM, Austin Nichols wrote:

Hilde Karlsen <Hilde.Karlsen@hio.no>:
If you have to use a mixed model as an exercise, and you have no
compelling reason to choose a particular research question, you should
ask a different research question where a mixed model is a more
appropriate model, not apply it blindly to data you know is better
suited to a survival model. Why not use the attrition dummy you have
made as the explanatory variable in a mixed model instead--what other
variables do you have on the data?

suggest because I have to use a multilevel approach for this
essay (it is an

essay for a multilevel course I followed a while ago). I should probably

have been more clear on this issue, and on what my problem
really is. What I
am wondering is not which method/command I should use, but how I
am going to

interprete the sigma_u estimate when my level 1 variable is years and my
level 2 variable is individuals.
As mentioned, I find it more intuitive to grasp the point of separate
variance estimates when the levels are schools, classes etc, but for some
reason I have a hard time understanding how I should interpreate the

variance estimate sigma_u when the years are clustered in
individuals. How

should I interpreate sigma_u when years are clustered in individuals.

I asked the professor who was leading the course which command I
should use,
and he told me I should use xtmelogit (my advicor told me the
same thing).
As he is the one who is going to judge wheter I pass or not on
this essay,

it is probably best to follow his advice.
I agree that it is a survival model, and I have designed my data for this

type of analysis (i.e. all individuals in the file start out
with 0 on the

dependent variable, and when/if they drop out of the nursing occupation,

they receive 1 on the dependent variable. I have no info on
which date/month

people drop out; I only have information on which year they drop out).
Regards,
Hilde
Quoting Steven Samuels <sjhsamuels@earthlink.net>:

Hilde, I agree with Austin's approach. Even if you have only months, not

days, of starting and quitting, use that time unit in a
survival or discrete
survival model. I recommend Stephen Jenkins's -hshaz- (get it
from SSC);

his "model 1" (the "Prentice-Gloeckler model" is the same as that fit by
-cloglog-. His model 2 adds unobserved heterogeneity and so may be more
realistic (Heckman and Singer, 1984).
I would not be surprised if prediction equations for of early and later

quitting differed. If so, time-dependent covariates or separate
models for

early and later quitting, would be informative.
-Steve
Prentice, R. and Gloeckler L. (1978). Regression analysis of grouped

Heckman, J.J. and Singer, B. (1984). A Method for minimizing
the impact of
distributional assumptions in econometric models for
duration data,

Econometrica, 52 (2): 271-320.

Hilde Karlsen <Hilde.Karlsen@hio.no>:
Attrition from nursing sounds like a survival model, probably in
discrete time, using -logit- or -cloglog- with time dummies, not
-xtmelogit- (see
http://www.iser.essex.ac.uk/iser/teaching/module-ec968 for a textbook
and self-guided course on discrete time survival models). If you have
T years of data on each individual, all of whom are first-year nurses
in period 1, and some of whom quit nursing in each of the subsequent
years, with a variable nurse==1 when a nurse (and zero otherwise), an
individual identifier id, a year variable year, and a bunch of
explanatory variables x*, you can just:
tsset id year
bys id (year): g quit=(l.nurse==1 & nurse==0)
by id: replace quit=. if l.quit==1 | (mi(l.quit)&_n>1)
tab year, gen(_t)
drop _t1
logit quit _t* x*
and then work up to more complicated models with heterogeneous
frailty, etc. The main issues are that someone who quit nursing last
year cannot quit nursing again this year, and people who never quit
nursing might at some future point that you don't observe, which is
why you use survival models...
If you know the day they started work and the day they quit, you might
prefer a continuous-time model (help st).
I've been assuming you had data on people working as nurses, but
rereading your email, maybe you have data on breastfeeding mothers,
though I suppose the same considerations apply (though with multiple
years of data on breastfeeding mothers, there is probably no
censoring).
On Fri, Feb 13, 2009 at 9:19 AM, Hilde Karlsen <Hilde.Karlsen@hio.no>
wrote:

Dear statalisters,
This is probably a stupid question, but I've been searching around the
nets

and in books and articles, and I've still not grasped the
concept: When

I'm
performing a multilevel analysis of attrition from nursing using
xtmelogit,
and time (year) is the level 1 variable and individuals (id) is the
level 2
variable (i.e. years are clustered within individuals; I have a
person-year

file), how do I formulate the expectation related to this
model? Why is

it
important to separate between these two levels?
I find it more intuitive to grasp the fact that individuals are
clustered
within schools, and that variables on the school level - as well as
variables on the individual level - may influence e.g. which grades a
student gets.

I understand (at least I hope I understand) the point that
when the same

individuals are followed over a period of time, the individual's
responses
are probably highly correlated, and that this implies a violation to
the
assumption about the heteroskedastic error-terms. As I see it, I could
have
used the cluster() - command (cluster(id))to 'avoid' this violation;

however, I have to write an essay using multilevel analysis,
so this is

not
an option.

I don't know if I'm being clear enough about what my problem
is, but any

information regarding this topic (how to grasp the concept of years
clustered in individuals) will be greatly appreciated.
I'm really sorry for having to ask you such an infantile question.. My
colleagues and friends are not familiar with multilevel analyses, so I
don't
know who to turn to.
Best regards,
Hilde