Ruby's ARGF

Many Unix utilities accept input both in the form of filenames passed as
command-line arguments and as data sent to the program’s standard input stream.
If filenames are passed, the corresponding files will be read in sequence. If
not, the standard input stream will be read instead. This behavior makes
utilities like cat, grep, and sed versatile and easy to use.

In Ruby, a subset of cat‘s features can be re-implemented with the following
code:

The implementation inspects the length of the ARGV array, containing all
command line arguments passed to the program. If any arguments are passed, they
are interpreted as filenames, read and output. If no arguments are passed, the
standard input stream is instead read and output.

It does what it’s supposed to do, but the implementation is very concerned with
where its input is coming from. It also duplicates the output functionality in
both branches of the conditional. To solve both of these problems, Ruby provides
the ARGF stream.

Using the ARGF stream, the cat clone can be re-implemented like so:

# argf.rbputsARGF.read

This implementation is oblivious to where its input is coming from and can
instead focus on what to do with it.

So what is the ARGF stream? The Ruby standard library documentation describes
it as such:

ARGF is a stream designed for use in scripts that process files given as
command-line arguments or passed in via STDIN.

ARGF will interpret all elements of the ARGV array as filenames and when
read will produce a concatenation of the contents of these files. If ARGV is
empty, then ARGF reads from standard input.

This means that if a program also accepts flags like --color or
--line-buffered, these flags will have to be shifted off the ARGV array
before ARGF is read in order to avoid unexpected “No such file or directory”
errors.

Filenames that are manually added to the ARGV array will also be read by
ARGF.

After a file has been read using ARGF, its filename is automatically shifted
off the ARGV array.

Many Unix utilities, like cat, also support another helpful feature that
allows input to be sent both to the standard input stream and as
filenames passed as command-line arguments. This is done by passing the special
filename - as a command-line argument:

To only get the name of the file currently being read, we can use the
#filename method.

If our program only processes partial files, for example the YAML front matter
of blog posts written in markdown format, the #close method can be used to
close the current file and skip to the next file: