Pages

Blog is moving

My blog is moving to http://victormendonca.com/blog/. If you are looking for a specific or older post you are in the right place Otherwise check out my new page for more up to date content.

Monday, December 24, 2007

How to remove repeated lines in a file without changing the order

Using just uniq is no viable as repeated lines need to be next to each other for uniq to identify it. Using sort will just mix them up, loosing their original placement within the file. So here's a work around (originally from Linux Journal).

Let's say we have a document called file:

$ cat fileabcdabcd

Here are the steps we will take:1- Use nl (or cat -) to add a numbering to the each line;2- Use “sort -k 2” to place equal line after each other (we have to sort by the second column);3- “Uniq -f 1” will remove equal lines (we also have to use the second column);4- “sort -n” will re-add them in the proper order as per the first field, or the numbers4- “sed 's/[0-9]//g'” will remove the numbers

This is what you command should look like:

$ nl file | sort -k 2 | uniq -f 1 | sort -n | sed 's/[0-9]//g'abcd

On my machine I had a problem where nl kept adding empty fields (still trying to find why), so I had to modify my expression a little bit:

5 comments:

Great, sort of what I was looking for, I just have a problem.I need the same action but with an array.If I have, for example:array=(1 2 3 4 1 2 3 4 1 2 3 4)and I want to create a list for every "unique" value on that array.I tried this to no avail:array2=( `echo ${array[@]} | sort | uniq -u` )It doesn't work of course because the array values are printed all in a single line.Do you have an idea of how can I achieve this? I'd prefer it to be on the fly and not saving it to a file and reading from it later...Thanks in advance