Our Blog

Deduplication of Data during Import using Data Import Wizard and Duplicate Detection Rules

Colin Maitland, 04 August 2013

When using the Data Import Wizard in Microsoft Dynamics CRM 2011, duplicate detection rules may be used to deduplicate data during the data import. In this blog I will describe how this may be done.

The following two screenshots show a sample combined set of Contacts and Accounts to be imported. The highlighting shows that some of the Accounts, e.g. Alpine Ski House, A. Datum Corporation and Coho Winery, are duplicated because they are related to more than one Contact.

Contacts and Accounts

Prior to running the data import, the combined list of Contact and Account records may be split into two separate lists.

Contacts

Accounts

Because all we have done is split the original list of records, the list of Accounts still contains duplicates for Alpine Ski House, A. Datum Corporation and Coho Winery.

When there are a small number of records, such as with this example, it is very easy and takes little time, to identify and manually remove duplicates. However, when working with a large number of records the task of manually identifying and removing duplicates is not as easy and takes more time.

A simple solution is to use Duplicate Detection Rules, in conjunction with the Data Import Wizard, to remove the duplicate records. This method only applies when importing the Accounts and Contacts as separate imports.

Select Settings, Data Management, Duplicate Detection Rules and ensure that appropriate Duplicate Detection Rules have been created and published for the record types to be deduplicated.

The most appropriate Duplicate Detection Rule for this example is the Accounts with the same Name rule.

Care should be taken to ensure that the impact of all published Duplication Detection Rules for the selected record type is understood. In this example, retaining the ‘…same Account Number’, ‘…E-mail Address’, ‘…Phone Number’ and ‘…Website’ Duplicate Detection Rules will not cause unwanted duplicate detections. However, retaining the use of other Duplicate Detection Rules, if any, such as ‘…same City’, which does not exist in this example, may cause unwanted duplicate detections. If required, unwanted Duplicate Detection Rules may be unpublished prior to the data import and then published again afterward the data import.

Step 3: Import Data Using Data Import Wizard

Use the Data Import Wizard to import the records and ensure that the Allow Duplicates option is set to No on the Review Settings and Import Data screen as shown in the sixth screenshot below.

Step 4: Review Import Failures

When the data import is completed you may review the Import Failures log to see a list of the records that were not imported because they are duplicates. The following error will be displayed for each, “A record was not created or updated because a duplicate of the current record already exists”.

The following screenshots show that the records on rows 6 (Alpine Ski House), 11 (A. Datum Corporation) and 13 (Coho Winery) were not imported.

The highlighting in the following screenshot shows which of these records were imported (GREEN) and which were not (YELLOW), i.e. Coho Winery, Alpine Ski House and A. Datum Corporation on rows 2, 5 and 7 were imported but the duplicates of these on rows 6, 11 and 13 were not.

The following screenshot shows the Accounts imported into Microsoft Dynamics CRM:

In this example, after the Accounts were imported, the Contacts were then also imported. These are shown in the following screenshot:

As mentioned previously, a limitation of this method, is that it does not work when combining several import files (such as Accounts.xml and Contacts.xml) into a single Zip file (such as Accounts and Contacts.zip) for import.

In this, example, the six Contacts related to the three duplicate Accounts, are not imported when they should be.

This is because the data import process attempts to match the Contacts to Accounts prior to the Accounts being deduplicated and so a, ‘A duplicate lookup reference was found’, error occurs.

Finally, in this example, it would be desirable to ensure that both the Primary Contact and the Parent Customer relationships are retained. This is only possible by importing the Accounts and Contacts from a single Zip file rather than as two separate imports.