Is there any programme/script that will take a group of WARC files and merge them, removing exact duplicate responses ?

I realise this probably goes somewhat against good practice, but for reasons of space I would like to remove the approximately 90% replication of content (e.g. unchanged images) but retain the varying parts.

But you would have to write a script to go through all the individual WARC files building up a list of duplicates to remove? That's the bit I was hoping to automate (laziness I know). I suppose it's relatively easy to compare the checksums.