RE: st: stset for grouped data

Thanks J.
I am not an statistician but why do we need to log transform population? can you please help me here.
We want to compare the incidence rate (x/person-years) rather than mere incidence.
Is there any good reference on how to carry out longitudinal analysis on aggregated data? I did not find one on google.
thanks.
m
________________________________________
From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] On Behalf Of Joerg Luedicke [joerg.luedicke@gmail.com]
Sent: 14 April 2011 16:04
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: stset for grouped data
On Thu, Apr 14, 2011 at 6:39 AM, Dherani, Mukesh
<M.K.Dherani@liverpool.ac.uk> wrote:
> Thanks. Yes it is aggregated data. What actually I want to do is to calculate the cumulative incidence rate by country.
> Assuming that there is no censoring, and on average each person in a particular age group has same age of onset, can't we calculate incidence rate? We have population by age group hence we may be able to calculate total person years of follow-up. I may be naive here, I thought if I expanded data based on
> expand cases
> I will get data for an individual and using above assumptions I may be able to calculate the incidence rate.
Maybe I am missing something here, but I would think that your
incidents have constant exposure ("no censoring", "same age of
onset") and that is not changing if you merely inflate your data. You
should be able to get the incidents rate from the data you have, for
example, by simply dividing the "cases" by "population" (for each age
group). If you want to "test" something and need a model you could
run, for example, a Poisson regression with (logged) population as
offset. If we take your example data and run the Poisson model:
. input region year agegp cases population
region year agegp cases populat~n
1. 1 1994 4 2 5000
2. 1 1994 9 5 2548
3. 1 1994 14 6 2547
4. 1 1994 19 15 7521
5. 1 1994 24 75 7896
6. end
. gen logpop=log( population)
. list
+-----------------------------------------------------+
region year agegp cases popula~n logpop
-----------------------------------------------------
1. 1 1994 4 2 5000 8.517193
2. 1 1994 9 5 2548 7.843064
3. 1 1994 14 6 2547 7.842671
4. 1 1994 19 15 7521 8.925454
5. 1 1994 24 75 7896 8.974112
+-----------------------------------------------------+
. poisson cases i.agegp, offset( logpop) irr
Iteration 0: log likelihood = -19.89126
Iteration 1: log likelihood = -10.835984
Iteration 2: log likelihood = -10.235966
Iteration 3: log likelihood = -10.233162
Iteration 4: log likelihood = -10.233161
Poisson regression Number of obs = 5
LR chi2(4) = 84.25
Prob > chi2 = 0.0000
Log likelihood = -10.233161 Pseudo R2 = 0.8046
cases IRR Std. Err. z P>z [95% Conf. Interval]
agegp
9 4.905808 4.104493 1.90 0.057 .9517967 25.28581
14 5.88928 4.808577 2.17 0.030 1.188664 29.17866
19 4.986039 3.753353 2.13 0.033 1.140235 21.80303
24 23.74619 17.0135 4.42 0.000 5.830841 96.70675
logpop (offset)
we see, for instance that the incidence rate in the oldest age group
is roughly 24 times as high as in the youngest age group.
However, there still may be better solutions to your problem.
J.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/