SAS : Many to Many Merge

Suppose you have two data sets. You want to merge both the data sets but there are duplicate values in the common variable (ie. primary key) of any or both of the datasets.

Many to Many Merging

Data Step Merge

The DATA step Merge does not handle many-to-many matching very well. When we perform many to many merges. the result should be a cartesian (cross) product of matching observations. For example, if there are three records that match from one contributing data set to two records from the other, the resulting data set should have 3 × 2 = 6 records.

Data Step MERGE does not create a cartesian product in case of a many-to-many relationship.It will return number of records for a duplicate value equal to maximum number of the duplicate value in both the table.

SAS Code -

data dat1;

input ID Info;

cards ;

1 3123

1 1234

2 7482

2 8912

3 1284

;

run;

data dat2;

input ID Info2;

cards ;

1 4444

1 5555

1 8989

2 9099

2 8888

3 8989

;

run;

data combined;

merge dat1 dat2 ;

by ID;

run;

Output : Merge

Note : In this example, we have 2 1s in dat1 and 3 1s in dat2. The maximum number of 1s in both the tables is 3. So it would return 3 1s in the merged dataset.

PROC SQL JOINPROC SQL JOIN creates all possible combinations of matching observations in case of a many-to-many relationship. Cartesian product is a collection of all pairs of two given sets. For example, In ID variable, there are 2 1's in dat1 dataset and 3 1's in dat2 dataset, the cartesian product would be (3*2 = 6 Observations) in the final result.

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 7 years of experience in data science and predictive modeling. During his tenure, he has worked with global clients in various domains like banking, Telecom, HR and Health Insurance.

While I love having friends who agree, I only learn from those who don't.