1) copy file A to file B but skip over the parts of file A you don't want copied!

2) invalidate the data! Write a 'bad' character code over fields in the data you don't like. The file size will remain the same, but the data will be invalidated thus your reader code when taught to ignore the 'bad' data characters effectively strips the data.

Write blank space 0x20, or other white space character (tab), etc. or even 0xff or 0x7f. Some 8-bit character you wouldn't normally find in your data!

what i would do is read the whole file into one list, then you can do something like:

#open it, it is automatically in read mode
f = open("dnalookingfile.txt")
#get a list with all the lines
lines = f.readlines()
#iterate through the lines
for line in lines:
if "rs" in line:
#delete this one from the lines list and the next one as well
else:
#its all good, the line does not have rs numbers in it

Then you could just rewrite the file after you are done and it would be fixed :)

And make sure that your file isn't some huge size that you shouldn't be loading into memory all at once. You can just go through the lines with the xreadlines() iterator instead.

# this requires the output file to already exist and it should be
# blank, as opening it with mode "a" will just append lines to it.
fhi = open("input_file", "r")
fho = open("output_file", "a")
for line in fhi.xreadlines():
if "rs" not in line:
fho.write(line)
fhi.close()
fho.close()

But if you are loading it all into memory as a list like paulthom suggested, make sure to cycle the file in a way that deleting indices as you go won't mess up the loop, like paulthom's idea would. This below example shows the odd effects of deleting the indices during a for loop:

So have a variable that, when set, it deletes each line it encounters until it finds one starting with a >
Then just have this set to True or something once you find a line you want to remove, and it'll remove the lines after that don't start with > (which would the lines with the sequence). Then just set it to False once it encounters a line starting with > again.
Sorry if that wasn't too clear :P This is what I meant:

It will set delete_seq to True if it finds an "rs" in the line, and while delete_seq is True, it'll ignore any following lines until one of them starts with ">", which will set it back to False. If you need me to clarify, just ask. Here's my output:

Like I showed in my previous post, use a boolean to track it. It's initially False, but when it encounters a line you want to remove, it becomes True. While it's True, it will disregard the lines it encounters until it finds one starting with ">". It will be set to False when it finds that again, but as it continues, any lines that have that "rs" will set it to True again, etc.

It basically starts skipping lines after a line it finds with "rs" until it finds a line starting with ">", and then it check that one to see if it's good. If it is, it stops ignoring lines, but if it's a bad line, it'll keep skipping, etc.