Do you have background in Computer Sciences? Do you know anything about algorithms?
Finding identical elements in a large set is a difficult problem.
Most reasonable solutions require sorting of the set so identical elements become sequential.
Regular expressions by themselves won’t do the trick. They are only the first step of extracting tags. To find the duplicates you will need something like that.

I don’t know awk but extrapolating from this I think that after you extract all titles into titles.txt the following may work:awk 'seen[$0]++ == 2' titles.txt

Correcting myself: I didn’t look close enough at the awk solution. I thought it prints a single copy of EVERY line but it actually already prints the 2nd instance of duplicated lines so It will work just as is.awk 'seen[$0]++ == 1' titles.txt

@vasile-caraus, GAWK software is a very very powerful Unix tool, but you’ll need some time, even to learn basic functions. For instance, the PDF Reference manual is a 540 pages file ! But, I’m sure it won’t take you much time to “Google search” an short introduction to the GAWK tool !!