How to Remove Duplicate Rows in Calc [OOo, LibreOffice & NeoOffice]

There is no automatic function to remove duplicate rows in Calc. In order to remove duplicate rows you need to perform some comparison tests. This tutorial will show you how to

sort the data

compare the data

remove the duplicate rows

There is a tutorial here that will show you how to check for duplicates on a single column value, however often we need to compare for more than one value in a row.

Firstly open your spreadsheet and determine the values that you are going to test. In our sample spreadsheet, I am testing for duplicates that match the OrderDate, Region and Rep fields.

Select & Sort the Cells

Select all cells of the current data range, and then choose Data >> Sort.

Select the column names for the data you wish to compare. For example, I am testing to see if OrderDate, Region & Rep are replicated in this spreadsheet. I will sort according to these columns in ascending order.

Compare the Data

Click an empty cell in the first row. In this example it’s H2.

Enter the formula:
=IF(A2=A3;1;0)

This will display 1 if the current row has the same value in column A as the next row. It will display 0 if the values are different.

I labelled this column ‘Test A’ (as we are testing the values in column A).

Now because I want to also test the values in columns B and C, then I need to do similar comparisons.

So, in cell I2 I enter the formula:
=IF(B2=B3;1;0)
and I labelled this column ‘Test B’

And in cell J2 I enter the formula:
=IF(C2=C3;1;0)
and I labelled this column ‘Test C’.

Copy the formula from H2 to J2 down for all rows of the data range.

Freeze the Contents of Test Columns

Now the formulas must be replaced by their values to freeze the contents. To do this we highlight all the cells that have the test values in them. In this example, I have test values in columns, H, I and J, and so I highlight the cells from H2 to J48.

While these are highlighted, I then press Ctrl+C (or CMD+C for Macs) to copy all selected cells to the clipboard.

I then select Edit >> Paste Special… to open the ‘Paste Special’ dialog box.

In the Selection area, enable only the Numbers command (and Text – sometimes it works better if you enable both – see above image). Disable the other Selection commands. Click OK.

You may be prompted to confirm that you are overwriting existing data – click ‘Yes’.

Now in column K, I am going to add the three test values together. If all three are true (1), then we have a duplicate row.

So, in cell K2 I enter the formula:
=SUM(H2:J2)
And I label the column ‘Sum Tests’

And I copy the formula from K2 into all the other rows in column K.

Sort and Remove Data

I then select the whole data range including the new columns H, I, J & K and sort the range by column K (my ‘Sum Tests’ column). I usually sort this in a descending order.

You will notice in the ‘Sum Tests’ columns that the higher values will appear at the top. In this case, as we were testing for 3 criteria for our duplicates, if a duplicate exists, then the value in this column will be 3. All ‘3’ values represent a duplicate value. If you only test for 2 values, then returning a ‘2’ in the ‘Sum Test’ column will indicate a duplicate.

In this example, I will now delete all the rows that return a ‘3’ in my ‘Sum Test’ column.

Tidying Up

Obviously I don’t want to keep my test columns (columns H, I, J & K in the above image), so I can now delete these as well.

Select entire columns of the list
Call Data>Filter>Standard
Field1 -not empty-
Field2 -not empty-
Field3 -not empty-
Button “More Options”
“Skip Duplicates”
“Copy Output” to some free range
You get a new list of all fields with unique entries in Field1,Field2 and Field3
Repeat the steps with the new list, if nessecary

For a simple approach to removing duplicates you could:
1. Use ‘sort ascending’ then add an extra column (you can remove it later)
2. Add the branching statement (IF) as mentioned in previous posts eg: =IF(A2=A3,1,0)
copy and paste the formula into all the cells in the new column. (left click and hold, then drag down)
This will place a ‘0’ in the column where no match is found.

3. Now add auto filter to the new column (the one with the 1 or zero) and filter for zero
This will only show the those rows where no duplication has occurred.
4. Copy all the rows and paste into new spreadsheet if desired.

OR

You could on step ‘3’ filter for 1 rather than zero and delete the results, this will delete the duplicate rows in your spreadsheet.

I hope this helps someone, spreadsheets can be a real pain sometimes when you get stuck.

What’s Happening on Facebook…

My Skills

Sitemeter

Who is behind Guide2Office?

My name is Stephanie Krishnan and I'm passionate about the way that open source software and its community can help small businesses and individuals with their productivity and lives. One of the biggest arguments I get from business owners, however, is lack of support options. I decided to put together my own support blog to help people be productive at various levels with various Office software, including OpenOffice.org, LibreOffice, NeoOffice, MS Office and Apple products! Read More…