Single pass stream or a file? This is much easier to do when random access is allowed. With a file you would just find the first foo and last bar and print everything in between, if anything. With a stream you would have to read until the first foo, and buffer all subsequent lines in memory until EOF, flushing the buffer every time a bar is seen. This could mean buffering the entire stream in memory.
–
jw013Sep 12 '12 at 14:25

5 Answers
5

The sed pattern matching /first/,/second/ reads lines one by one. When some line matches to /first/ it remembers it and looks forward for the first match for the /second/ pattern. In the same time it applies all activities specified for that pattern. After that process starts again and again up to the end of file.

That's not that we need. We need to look up to the last matching of /second/ pattern. Therefore we build construction that looks just for the first entry /foo/. When found the cycle a starts. We add new line to the match buffer with N and check if it matches to the pattern /bar/. If it does, we just print it and clear the match buffer and janyway jump to the begin of cycle with ba.

Also we need to delete newline symbol after buffer clean up with /^\n/s/^\n//. I'm sure there is much better solution, unfortunately it didn't come to my mind.

If this were code-golf, you could use E instead of e and -00777 instead of the $/ bit (see perlrun(1)). Which would shorten it to: perl -0777 -nE 'say /(foo.*bar)/s', still sort of readable.
–
ThorSep 12 '12 at 15:51

I didn't know about these flags! I am sure that especially -0[octal] will find it's way in my workflow! Thanks for that
–
user1146332Sep 12 '12 at 15:57

If the input file is huge, you can use csplit to break it into pieces at the first foo and at every subsequent bar then assemble the pieces. The pieces are called piece-000000000, piece-000000001, etc. Choose a prefix (here, piece-) that won't clash with other existing files.

csplit -f piece- -n 9 - '%foo%' '/bar/' '{*}' <input-file

(On non-Linux systems, you'll have to use a large number inside the braces, e.g. {999999999}, and pass the -k option. That number is the number of bar pieces.)

You can assemble all the pieces with cat piece-*, but this will give you everything after the first foo. So remove that last piece first. Since the file names produced by csplit don't contain any special characters, you can work them over without taking any special quoting precaution, e.g. with

rm $(echo piece-* | sed 's/.* //')

or equivalently

rm $(ls piece-* | tail -n 1)

Now you can join all the pieces and remove the temporary files:

cat piece-* >output
rm piece-*

If you want to remove the pieces as they are concatenated to save disk space, do it in a loop:

It appends each line in /foo/,$ range (lines ! not in this range are deleted) to Hold space. Lines not matching bar are then deleted. On lines that match, the pattern space is emptied, exchanged with the hold space and the leading empty line in the pattern space is removed.

With huge input and few occurrences of bar this should be (much) faster than pulling each line into pattern space and then, each time, checking the pattern space for bar.
Explained: