Find and Delete all duplicate rows but keep one

In this post “Find and Delete all duplicate rows but keep one”, we are going to discuss that how we can find and delete all the duplicate rows of a table except one row. Assume that we have a table named tbl_sample which has four columns – EmpId, EmpName, Age, and City. This table has some duplicate data (in all the four columns) which needs to be deleted except the original one row. To demonstrate this, let’s create the dummy table with some sample data.

In above table, we can see that we have multiple duplicate rows for EmpId 1, 2, 3 and 4. To find the duplicate records from the above table, we can use this query. This query will output all the duplicate rows from the table with the duplicate rows count.

SELECT * FROM dbo.tbl_Sample GROUP BY EmpId, EmpName, Age, City HAVING COUNT(1) > 1
GO

Duplicate rows

Now, we need to write a query to find and delete all the duplicate records from the table excluding the original one record. Here is the query.

WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY EmpId, EmpName, Age, City ORDER BY (SELECT NULL)) Seq
FROM dbo.tbl_Sample
)
DELETE FROM CTE WHERE Seq > 1
GO

In the above code, we are using a CTE (Common Table Expression) to find the duplicates and then we are deleting all the records using DELETE command but keeping the only one record for each employee.

After executing this query, we have only one row for each employee as this.

About Gopal Krishna Ranjan

Gopal has 8 years of industry experience in Software development. He has a head down experience in Data Science, Database, Data Warehouse, Big Data and cloud technologies and has implemented end to end solutions.
He has extensively worked on SQL Server, Python, Hadoop, Hive, Spark, Azure, Machine Learning, and MSBI (SSAS, SSIS, and SSRS). He also has good experience in windows and web application development using ASP.Net and C#.