User Tools

Site Tools

Data Merging

Definitions

Data: “Information that is produced or stored by a computer”1)
Merging: “To combine or unite into a single entity.”2)
Therefore merging data is to combine information that is stored by a Computer into a single entity.
Data Merging, which could also be called consolidation, is an important step in the data quality processes. In a merge data records from multiple source systems are consolidated into one single record, which can be very complex. 3)

When to merge data?

• The task is to add a dataset with updates or other records to an already existing dataset that holds the working data. When working with an XML Web service this typically happens as a result of a method call.
• When receiving data from two different sources and want to put the two together into one single dataset, e.g. company information out of the intranet and order information out of the SAP ERP System.4)

Merging methods

There are a few different merging methods, which are explained in the following table.
The possibilities are to copy records or just schemes, update one dataset based on another dataset, including modifications, additions and deletes.

Method

Merges … into the current DataSet.

Function

DataSet.Merge Method (DataRow[])

an array of DataRow objects

DataSet.Merge Method (DataSet)

a specified DataSet and its schema

DataSet.Merge Method (DataTable)

a specified DataTable and its schema

DataSet.Merge Method (DataSet, Boolean)

a specified DataSet and its schema

Depending on the arguments, it preserves or discards any changes in this DataSet accordingly.

DataSet.Merge Method (DataRow[], Boolean, MissingSchemaAction)

an array of DataRow object

Depending on the arguments, it preserves or discards changes in the DataSet and handles incompatible Schemas accordingly.

DataSet.Merge Method (DataSet, Boolean, MissingSchemaAction)

a specified DataSet and its schema

Depending on the arguments, it preserves or discardes changes in the current DataSet and handles incompatible Schemas accordingly.

DataSet.Merge Method (DataTable, Boolean, MissingSchemaAction)

a specified DataTable and its schema

Depending on the arguments, it preserves or discards changes in the DataSet and handles incompatible schemas accordingly.

All Merge methods can only be used, if the two objects that are supposed to be merged have largely similar schemas. It is typically executed at the end of a process that include changes, updates or correcting errors.
For example: If you use an online application platform. First you fill in all the data and usually at the bottom of the page, you click the button that reads submit (or something similar). Then the data is validated by the system.
The system returns either the former DataSet or a subset containing the changes, depending on whether the changes are possible. This DataSet is then merged back into the online applications original DataSet using the Merge method.5)

The schema is always merged first, then the data.
The Merge method firstly compares the two DataSet objects (the source DataSet and the target DataSet), identifying possible changes that may have happened, for instance, if an automated process added new columns to an XML schema. In case a change occurred in the source DataSet, the target DataSet has to adapt. In order to do that a Merge method containing the MissingSchemaAction argument is called, in this case MissingSchemaAction.Add. As a result the merged DataSet will contain the added schema and data.
Secondly the data is merged. The source rows that already exist and have only been changed, deleted or not touched are matched to the target rows with the same primary key values accordingly. The source rows that are added are matched to target rows with the same primary key values as the new source rows.6)
Thereby all possible changes are considered and merged properly into the DataSet.