Ramblings about everything from Cars to Computers

MySQL Removing duplicate rows – Part II

Using ALTER IGNORE TABLE [TABLENAME] ADD UNIQUE INDEX `UNIQUE_INDEX` ([FIELDNAME]) to remove duplicate rows in a table is a fast an efficient process, however on large tables where the physical size is larger than server memory, the ALTER statement can take a long time to run in a production environment.

If you need to remove duplicates on a very large table (we recently used this on a table of 77 million rows), try this method :

delete t1 from table t1, table t2
where t1.duplicate_field= t2.duplicate_field (add more if need ie. and t1.duplicate_field2=t2.duplicate_field2)
and t1.unique_field > t2.unique_field
and breakup into ranges to run faster

If you use an auto-incrementing ID field as the primary key, use this as your unique field, and in the Where clause to run on a range of records to break into smaller operations.