Excerpt
Introduction: Record linkage is the science of finding matches or duplicates within or across files. Matches are typically delineated using name, address, and date-of-birth information. Other identifiers such as income, education, and credit information might be used. With a pair of records, identifiers might not correspond exactly. For instance, income in one record might be compared to mortgage payment size using a crude regression function. In the computer science literature, datacleaning or object identification often refers to methods of finding duplicates. In the model of record linkage due to Fellegi and Sunter (1969, hereafter FS), a product space A B of records from two files A and B is partitioned into two sets matches M and nonmatches U. Pairs in M typically agree on characteristics such as first name, last name, components of date-of-birth, and address. Pairs in U typically have isolated (random) agreements of the characteristics. We use g = (g1, g2 ... , gn ) to denote an arbitrary agreement pat ...