I realize that the question asks for lines, but if you use the -c to skip characters, tail+|head is instantaneous. Of course, you can't say "50000000" and may have to manually search out the start of the section you're looking for.
– Danny KirchmeierApr 25 '14 at 16:26

tail will read and discard the first X-1 lines (there's no way around that), then read and print the following lines. head will read and print the requested number of lines, then exit. When head exits, tail receives a SIGPIPE signal and dies, so it won't have read more than a buffer size's worth (typically a few kilobytes) of lines from the input file.

The sed solution is significantly slower though (at least for GNU utilities and Busybox utilities; sed might be more competitive if you extract a large part of the file on an OS where piping is slow and sed is fast). Here are quick benchmarks under Linux; the data was generated by seq 100000000 >/tmp/a, the environment is Linux/amd64, /tmp is tmpfs and the machine is otherwise idle and not swapping.

If you know the byte range you want to work with, you can extract it faster by skipping directly to the start position. But for lines, you have to read from the beginning and count newlines. To extract blocks from x inclusive to y exclusive starting at 0, with a block size of b:

Are you sure that there is no caching inbetween? The differences between tail|head and sed seem too big to me.
– Paweł RumianSep 8 '12 at 12:03

@gorkypl I did several measures and the times were comparable. As I wrote, this is all happening in RAM (everything is in the cache).
– GillesSep 8 '12 at 12:06

1

@Gilles tail will read and discard the first X-1 line seems to be avoided when the number of lines is given from the end, In such case, tail seems to read backwards from the end according to the executing times. Please read: http://unix.stackexchange.com/a/216614/79743.
– user79743Jul 17 '15 at 4:52

1

@BinaryZebra Yes, if the input is a regular file, some implementations of tail (including GNU tail) have heuristics to read from the end. That improves the tail | head solution compared to other methods.
– GillesJul 17 '15 at 7:08

The head | tail approach is one of the best and most "idiomatic" ways to do this:

X=57890000
Y=57890010
< infile.txt head -n "$Y" | tail -n +"$X"

As pointed out by Gilles in the comments, a faster way is

< infile.txt tail -n +"$X" | head -n "$((Y - X))"

The reason this is faster is the first X - 1 lines don't need to go through the pipe compared to the head | tail approach.

Your question as phrased is a bit misleading and probably explains some of your unfounded misgivings towards this approach.

You say you have to calculate A, B, C, D but as you can see, the line count of the file is not needed and at most 1 calculation is necessary, which the shell can do for you anyways.

You worry that piping will read more lines than necessary. In fact this is not true: tail | head is about as efficient as you can get in terms of file I/O. First, consider the minimum amount of work necessary: to find the X'th line in a file, the only general way to do it is to read every byte and stop when you count X newline symbols as there is no way to divine the file offset of the X'th line. Once you reach the *X*th line, you have to read all the lines in order to print them, stopping at the Y'th line. Thus no approach can get away with reading less than Y lines. Now, head -n $Y reads no more than Y lines (rounded to the nearest buffer unit, but buffers if used correctly improve performance, so no need to worry about that overhead). In addition, tail will not read any more than head, so thus we have shown that head | tail reads the fewest number of lines possible (again, plus some negligible buffering that we are ignoring). The only efficiency advantage of a single tool approach that does not use pipes is fewer processes (and thus less overhead).

I expected sed and tail | head to be about on par, but it turns out that tail | head is significantly faster (see my answer).
– GillesSep 8 '12 at 11:31

1

I dunno, from what I've read, tail/head are considered more "orthodox", since trimming either end of a file is precisely what they're made for. In those materials, sed only seems to enter the picture when substitutions are required - and to quickly be pushed out of the picture when anything much more complex starts to happen, since its syntax for complex tasks is so much worse than AWK, which then takes over.
– underscore_dOct 8 '16 at 11:40

Note that times change drastically if the selected lines are near the start or near the end. A command which appear to work nicely at one side of the file, may be extremely slow at the other side of the file.

You're answering a question that wasn't asked. Your answer is 10% tail|head, which has been discussed extensively in the question and the other answers, and 90% determining the line numbers where specified strings/patterns appear, which wasn't part of the question. P.S. you should always quote your shell parameters and variables; e.g., "$3" and "$4".
– G-ManOct 8 '14 at 22:51