Latest Data News

Merging data files with duplicate variable names

If you are merging two files that have variables with
duplicate names, the duplicate variable names will be dropped
from the first file. This happens even if the duplicate
variable names do not have duplicate values.

To prevent this from happening, be aware of your variables
across both data files. Rename variables if there are duplicate
names that don't have duplicate values.

The following is an example of two data files that have a
variable 'black.' In one file, a new variable, bdummy is
created so that when the files are merged, nothing is lost.

Note that the file order is not determined by the
order in which the files are read by SAS. The file order is
determined by the merge statement.

data merge1;
merge a b; by id;
data merge2;
merge b a; by id;

In the first example, 'a' is the first data file and 'b' is
the second. In the second example, the reverse is true.

If you read your log file carefully, you might notice when
you have duplicate variable names across data files. The total
number of variables in a merged file should be one less than
the sum of the number of variables in both data sets. If you
have dropped more than one variable, you have a duplicate
variable problem.