RE: st: RE: coding problem: looping through a list of ID's

///
The observations with missing values for age and/or height are useless
for your purposes. You might as well -drop- them.
Then I'd thin your data to a minimum spacing of (say) 9 months. Then
just calculate height change from adjacent heights within each panel.
How do you thin a panel? Rajesh Tharyan asked questions on this, or at
least related subjects, in May. Subsequently I wrote a program
-panelthin- but have not hitherto made it public.
*! 1.0.0 NJC 19 May 2008
program panelthin, sort
version 8
syntax [if] [in] , Min(numlist max=1 >0) Generate(str)
quietly {
capture confirm new var `generate'
if _rc {
di as err "generate() invalid"
exit _rc
}
marksample touse
tsset
local panel `r(panelvar)'
local time `r(timevar)'
markout `touse' `panel' `time'
count if `touse'
if r(N) == 0 error 2000
tempvar t T prev
bysort `touse' `panel' (`time') : gen `t' = _n * `touse'
by `touse' `panel' (`time') : gen `T' = _N * `touse'
su `T', meanonly
local tmax = r(max)
drop `T'
gen byte `generate' = `t' == 1
by `touse' `panel' : gen `prev' = `time'[1]
forval i = 2/`tmax' {
replace `generate' = 1 ///
if (`time' - `prev') >= `min' & `t' == `i'
by `touse' `panel' : ///
replace `prev' = `time'[`i'] ///
if (`time'[`i'] - `prev') >= `min'
}
}
end
Despite its name, -panelthin- doesn't thin a panel, but rather
identifies observations that would belong in a panel with at least the
minimum spacing specified. If you like the selection, follow with
-keep-.
There is no help file, but here is an example:
. l id t
+---------+
| id t |
|---------|
1. | 1 2 |
2. | 1 3 |
3. | 1 5 |
4. | 1 7 |
5. | 1 11 |
|---------|
6. | 1 13 |
7. | 1 17 |
8. | 1 19 |
9. | 1 23 |
10. | 1 29 |
|---------|
11. | 2 2 |
12. | 2 3 |
13. | 2 5 |
14. | 2 7 |
15. | 2 11 |
|---------|
16. | 2 13 |
17. | 2 17 |
18. | 2 19 |
19. | 2 23 |
+---------+
. tsset id t
panel variable: id (unbalanced)
time variable: t, 2 to 29, but with gaps
delta: 1 unit
. panelthin, min(5) gen(OK)
. l
+--------------+
| id t OK |
|--------------|
1. | 1 2 1 |
2. | 1 3 0 |
3. | 1 5 0 |
4. | 1 7 1 |
5. | 1 11 0 |
|--------------|
6. | 1 13 1 |
7. | 1 17 0 |
8. | 1 19 1 |
9. | 1 23 0 |
10. | 1 29 1 |
|--------------|
11. | 2 2 1 |
12. | 2 3 0 |
13. | 2 5 0 |
14. | 2 7 1 |
15. | 2 11 0 |
|--------------|
16. | 2 13 1 |
17. | 2 17 0 |
18. | 2 19 1 |
19. | 2 23 0 |
+--------------+
The made-up example has the same times for each panel, but that's
laziness, not an assumption.
Nick
n.j.cox@durham.ac.uk
Leny Mathew
Figured as much, but at this point any comments would be useful for me!!
When I got this data set each patient had a different number of visits
over the period of time and not everyone visits over whole period. I
tried writing this code for that but couldn't get it to loop over
patient id, so had to create a max(number of visits) per person and
use that value as a break point. The I changed the data into wide
format and when I reshaped it back to long, I ended up with the
everybody having the maximum of 148 visits. This was a boon in a sense
because I didn't have to worry about the max number of visits per
person (a problem as each loop takes an awful lot of time!) Some
people would have age as missing in some rows but I figured that it
wouldn't be a problem in the loop as that difference would be set to
missing and that wouldn't be an issue.
Since I needed to create an age interval I thought that age could be
used in the same way as I would use date, just that I would set 1 year
instead of the number of days elapsed.
The way this is set up I'm trying to take the difference (within each
patient) of age[i]- age[j] where i takes all values 148 through 1 and
j does the same. So I can calculate the all age differences and if the
age difference is between approximately 1 year ( 0.95, 1.5) then I can
calculate the height difference for those ages.
The data is as follows:
pt_id counter num timec age dov
1 1 148 1 10.1 1 jan 01
1 2 148 2 12.1 1 jan 03
.
.
1 147 148 147 . .
1 148 148 148 16.1 ..
2 149 148 1 5.0 2 Jan 98
2 150 148 2 5.5 2 Jun 98
.
.
.
************************************************************************
****************************************
On Mon, Nov 24, 2008 at 10:34 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> I suggest you start again and show us the structure of the data and
> define more precisely how height increase over a year is to be defined
> from measurements irregular in time.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Leny Mathew
> I've have been using stata for a while but
> am a novice when it comes to loops and macros. I'm hoping that someone
> on the list could help with with the following problem. I have a data
> set with 74 patients each with 148 measurements of height over a
> period of 10 years. The height measurements for all cases are not
> necessarily at the same time and not everybody has 148 measurements.
> The number 148 is a result of changing the data set into the long
> form.
> I'm trying to calculate the the the increase in height for a one year
> (approximately) interval for each case. I developed the following code
> for this purpose, but am not able to get it to work perfectly. It
> loops though each patient, but at the end, 'phv' end up being the same
> for everyone in the data. Also, this code might be a totally
> convoluted way of doing this and I'm hoping that someone could give me
> some pointers on how to improve/ revamp this completely.
> If this posting is a violation of the list protocol on type of
> posting, please feel free to let me know and I'll take it off. I've
> spent quite some time tweaking this and am out of ideas.
> I'm using stata10.1 Any suggestions are much appreciated.
>
> Note: 'dov' is date of visit, used to create age.
>
> sort pt_id dov
> by pt_id: gen timec=_n
>
> gen counter=_n
>
> sort pt_id dov
> by pt_id: gen num=_N
>
> local j=counter
> local i=1
> local k=1
> while pt_id <75 {
> display pt_id[_n]
> while (pt_id[`j']==pt_id[`j'+1]) & timec < 148{
> local value= max(`j'+147, num[pt_id])
> while (`value' >
> max(`j',1)) & (`value' > timec[`j']) & (pt_id[`j']==`k') {
> replace phv=
> (pt_ht[`value']-pt_ht[counter[`i']]) if
> ((age[`value']-age[counter[`i']]) >0.95 &
> (age[`value']-age[counter[`i']]) < 1.5) & (pt_id[`j']==`k')
> local i= `i'+1
> if `i'==`value' local
> value= `value'-1
> if `i'==`value'+1 &
> pt_id[`j']==1 local i=1
> if `i'==`value'+1 &
> (pt_id[`j'] >1) local i=(`k'-1)*148
> }
> local i=`i'+148
> local j=`j'+148
> display `j'
> local k=`k'+1
> display `k'
> continue
> if k >74 break
> }
> }
>
>
> **I tried to use phv[`value'] so that it would replace the value at
> the certain row, but that gave me an error that weights are not
> allowed. (I'm sure that that must have been a violation of stata
> rules!)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/