Assigning an ID variable across missing rows

I have a dataset at the person-month level with an ID variable that is not clean, another ID variable, and a few other categorical variables.

Dataset1:

ID1 ID2 categ vars.....

1 123

1.1 123

1.2 234

1.2 234

In ID1, the 1 is the valid value, while 1.1 and 1.2 are invalid values. ID 2 is the way I can distinguish one person from another, but I need to retain as many valid values of ID1 as possible. I need to do two things: One is that for every person (ie, value of ID2) that has at least one valid value of ID1, I want to fill the rest of their rows with that valid value of ID1. The other is that for every person that never has a valid value of ID1, I want a new variable, ID3, to give them a value.

if rows_w_invalid_id=1 and num_rows>rows_w_invalid_id then ID1=ID1_keep;

run;

This code correctly completes my first objective of overwriting invalid values of ID1 with valid values for each person using ID2 as the person identifier. However, when assigning ID3, it resets to 1 every time it encounters a missing value, ie someone who would not be assigned a value of ID3. So it looks like this:

Dataset 2

ID1 ID2 ID3 categ vars.....

1 123

1 123

. 234 1

. 234 2

2 345

2 345

. 456 1

What I would want it to do is continue counting rather than resetting upon encountering the missing variable.

Re: Assigning an ID variable across missing rows

Q2: If you have multiple goods and some bads, which one of the goods you want to use to populate the bads.

Q3: Will the goods always appear before the bads?

Q4: Have your data presorted by ID2 or at least you know ID2 are clustering together?

Anyway, I made some reasonable guess, and here are some codes to get you going ( Sorry, I haven't read through your code):

data have;
input (id1 id2) (:$8.);
cards;
1 123
1.1 123
1.2 234
1.2 234
;
/*using 2XDOW is to assume that you are not sure if goods appear before bads, otherwise, code can be simpler*/
/*this is to assume if anything but number appears will be redeemed as bad*/
data want;
do until (last.id2);
set have;
by id2 notsorted;
if notdigit(strip(id1))=0 then
_id1=id1;
end;
do until (last.id2);
set have;
by id2 notsorted;
if notdigit(strip(id1))>0 then
id1=_id1;
if missing(_id1) then
id3=sum(id3,1);
output;
end;
drop _id1;
run;

It may be that what you need is to revisit the requirements earlier in the process and describe the whole requirement. Some problems are iterative that way. You learn more about the data and sometimes it is appropriate to go back to the beginning as choices made early in a process may introduce the issues you are encountering now. The knowledge gained by working this far may give you clues as to which of other options not previously selected might help.