I have two data sets, a and b, containing three variables, hospitals, patient_id and date. For each patient_id there are multiple dates, and due to some definitions regarding what classifies as a new entry, I need to weed out excessive observations, which translated, code-wise corresponds to something like:

data=a;

if patient_id in a = patient_id in b and dates in b-30 days<date in a <dates in b+30 days then same=1;

Else same=0;

Basically, what I am have trouble with referencing across data sets, since a and b should be kept separated.

Due to the information being sensitive, I cannot provide any concrete examples of data.

Re: comparison of variable across data sets

Due to the information being sensitive, I cannot provide any concrete examples of data.

The data needs to be reflective of your situation, not actual data. You can make fake data, include at least one example of the different situations you'll encounter, ie no matches in either data set, duplicates, multiple duplicates.