Friday, September 4, 2015

In case you have a file "input.txt" with duplicate lines and you would like to remove duplicate lines from it, and have the result put in "output.txt" all you have to do is execute
this python script, be careful and use the same indentation (space):

lines_seen = set() # holds lines already seen

outfile = open("out.txt", "w")

for line in open("input.txt", "r"):

if line not in lines_seen: # not a duplicate

outfile.write(line)

lines_seen.add(line)

outfile.close()

This will execute in less than a 1 second, no matter how big is the file. Have a nice day.