OH22 data has just released a free SSIS transform task for SQL Server Data Quality Services (DQS) matching. The solution is published on codeplex. With this custom transform task, you can use the Knowledge Base (KB) created in DQS to automate data matching through SSIS.

To summarize, we have the following data set which we would like to identify duplicate and related records:

In the previous article, we created a DQS Knowledge Base, MyCustomerKB, with the following matching rules:

Match By Customer Name: Matching any records with similar CustomerName value.

Match By City State: Matching any records with similar City (50% weight) and State (50% weight)

We were using Data Quality Client tool to (manually) execute a matching project using MyCustomerKB against the data input above. In this article, we will be automating the process using the new SSIS transform task.

B. Create SSIS project and configure DQS matching transform task

Drag and drop Source Assistant and configure to connect to the customer data you would like to de-duplicate

Drag and drop DQS Matching transform task

Connect the output line from the source to the DQS Matching transform task

Double click DQS Matching to open the configuration editor window

Under the Connection Manager tab, connect to the SQL Server Data Quality Service server by clicking New:

Select the Data Quality Knowledge Base containing the matching policy – the available matching rules and domains are displayed for you to review

Click the Mapping tab to map the columns in your data source with the Knowledge Base domain

Note:

You must map all domains used by matching rules in your domain. The following error message appears if you click OK without mapping all domains: “Not all domains have been assigned or has been assigned twice. Please assign all domains properly”

When you use composite domain as part of your matching rule, you can either map the composite domain or all of the single domains that make up the composite domain. If you map a column to a composite domain, then the single domains removed as drop down options. Conversely, if you select one of the single domains that are part of the composite domain, then the composite domain disappear from the drop down option. You can redo selection by un-selecting the column checkbox from the top section.

Optionally, you can also go to the Advanced tab to specify the minimal matching rule threshold. By default, DQS match only records with matching score of 80% and above. You can raise the minimal score by changing it from the advanced tab.

Open SQL Server Management Studio and review the MyCustomersMatched table:

Note:

The ClusterID indicates the group of matched records. In our example, CustomerID=1 and CustomerID=2 has the same clusterID of 1000000

SiblingId indicates the corresponding record that is being compared for a given MatchingScore, for example: CustomerID = 6 has SiblingId of 1000004 (correspond to CustomerID=5), so the matching score of 87.5 is between CustomerID=6 & CustomerID=5

The output table also include RuleID, in order to get the Rule name, you first run a query to get an ID for the Knowledge Base you used then use the ID to query for the matching rule :

In this article, I describe how to use the SSIS DQS Matching transform task developed by OH22. Using this transform task, you can automate the DQS record matching using the Knowledge Base created from DQS client. You can download the codeplex project from https://ssisdqsmatching.codeplex.com/.

Great Blog,,,, Data matching services have provided huge database related information to improve business in different scenarios. for more information on data matching services. please visit:<a href="http://www.b2bemaillistz.com">click here</a>

Excellent blog, data matching process provides high quality of data which helps to identify replica and maintain the database. For good quality of data matching service visit<a href=” http://www.elisthunter.com/”>elisthunter</a>

The DQS Matching transform is inadequate and unstable (while the columns are mapped). Why do we need to specify minimum Matching Score on the advanced tab when Matching policy , with multiple matching rules, can be already defined in DQS. Why can't this component leverage the matching rules already defined in DQS? This component does not support multiple matching rules. How can Microsoft ignore the automated matching through an SSIS transformation and instead something untrustworthy from CodePlex is being pushed?