103.2.6.a Handling Duplicates- An Example

Learn with Examples

The issue of duplicate values is very frequent in many transaction datasets. Here we have an example of telecom bill and complaints data. A single user may register a complaint twice a day. This repeated data need to be found and removed from the space.

The following data on Complaints would help in understanding how this is done.

DataSet: “./Telecom Data Analysis/Complaints.csv”

Identify overall duplicates in complaints data

Create a new dataset by removing overall duplicates in Complaints data

Identify duplicates in complaints data based on cust_id

Create a new dataset by removing duplicates based on cust_id in Complaints data