Re: st: Capturing the date and which something first occurs

The question raised by Richard is wanting an indicator variable for
the first occurrence of an event. At its simplest there is a -state-
indicator and we are looking for the first time that -state == 1-.
The approach in the FAQ I wrote on this question
http://www.stata.com/support/faqs/data/firstoccur.html
is I now think too indirect. I would now urge focus on finding the
date of the first occurrence and an indicator is then just given by
when the date variable is equal to that first date.
The first date is just the minimum and we can get that easily, even
with panel data, using -egen-:
egen first_date = min(date / (state == 1)), by(id)
or
egen first_date = min(cond(state == 1, date, .)), by(id)
The expressions fed to the -min()- function of -egen- are
date / (state == 1)
cond(state == 1, date, .)
They are equivalent and are both focused on getting -egen- to ignore
everything except the times when -state == 1-. If there are no such
times then the expressions become missing, which in turn gives the
right answer.
Although for simplicity we are in this example looking for -state ==
1-, i.e. values of 1 for an indicator for what interests us, that is
just detail. The meain idea generalises easily to any true-or-false
condition:
egen first_date = min(date / (foo == 42)), by(id)
Another nice feature about this approach is that it extends easily to
giving the _last_ date, using -max()- as the -egen- function.
Nick
On Sat, Apr 7, 2012 at 7:21 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> Also consider
>
> gsort id -state time
> by id: gen date_first = time[1] if state[1] == 1
> gen is_first = time == date_first
On Fri, Apr 6, 2012 at 6:41 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> When the first occurrence occurred is discussed in the same FAQ
>>
>> http://www.stata.com/support/faqs/data/firstoccur.html
>>
>> Here is another way to do it.
>>
>> egen date_first = min(time / state), by(id)
>>
>> Explanation: This is the trick that dividing by zero can be useful.
>> time / 0 is returned as missing and thus ignored in the calculation of
>> the minimum, as long as the -state- did occur.
On Fri, Apr 6, 2012 at 6:28 PM, Richard T. Campbell <dcamp@uic.edu> wrote:
>>> Suppose I have a data set like that used by Nick Cox in an FAQ which shows
>>> how to capture a
>>> record at which something first occurs. Here is his example.
>>>
>>>
>>> +---------------------------+
>>> | id time state first |
>>> |---------------------------|
>>> 1. | 1 1 0 0 |
>>> 2. | 1 2 0 0 |
>>> 3. | 1 3 0 0 |
>>> 4. | 1 4 1 1 |
>>> 5. | 1 5 1 0 |
>>> 6. | 1 6 1 0 |
>>> 7. | 1 7 1 0 |
>>> 8. | 1 8 1 0 |
>>> 9. | 1 9 1 0 |
>>> 10. | 1 10 1 0 |
>>> |---------------------------|
>>> 11. | 2 2 0 0 |
>>> 12. | 2 2 0 0 |
>>> 13. | 2 3 1 1 |
>>> 14. | 2 4 1 0 |
>>> 15. | 2 5 1 0 |
>>> 16. | 2 6 1 0 |
>>> 17. | 2 7 1 0 |
>>> 18. | 2 8 1 0 |
>>> 19. | 2 9 0 0 |
>>> 20. | 2 10 0 0 |
>>> |---------------------------|
>>> 21. | 3 1 0 0 |
>>> 22. | 3 2 1 1 |
>>> 23. | 3 3 0 0 |
>>> +---------------------------+
>>>
>>> So, for ID 1, the first time at which state = 1 occurs is the fourth record,
>>> for
>>> ID 2 it is the third record etc. I want to assign a value within an id equal
>>> to
>>> that index. For example, for ID 1 I want a variable that equals 4 for all
>>> ten cases, for ID 2 a variable equal to 3 for all cases etc. Put
>>> differently,
>>> I want to assign to all cases within an id, the value of _n when first = 1.
>>> I can't seem to get my head around how to do this.
>>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/