Filters: Doing It Your Way

A look at several of the more flexible filters, probrams that read some input, perform some operation on it, and write the altered data as output.

One of the basic philosophies of Linux
(as with all flavours of Unix) is that each program does one
particular task, and does it well. Often you combine several
programs to achieve something, either at the shell prompt or in a
script, by piping the output of one program into the next. I'm
talking about things like

ls -l | more

and

ps -auxw | \
grep netscape >> people.who.should.be.working

But what if the output of one program isn't in the format
needed for the next? We need some way of processing the output of
one program so that it is ready for the next.

Fortunately, there are many Linux programs that do this job:
read some input, perform some operations on it, and write the
altered data as the output. These programs are called filters. Some
filters do quite limited tasks, such as head, grep and sort,
whereas others are more flexible, such as sed and awk. In this
article, we're going to look at several of these more flexible
filters, and give several examples of what can be done with
them.

The name “sed” is a contraction of stream
editor; sed applies editing commands to a stream of
data. A common use for sed is to replace one text pattern with
another, as in

sed 's/Fred/Barney/g' foo

This command takes the file foo, changes every occurrence of
Fred to Barney, and writes
the modified version to standard output.

Note that in this example we have placed the actual sed
commands inside single quotes. Sed doesn't require that commands be
quoted this way, but you will need to use quotes if the sed command
includes characters that are special to the shell, such as
$ or *. This example doesn't
have any special characters, so we could just as easily have left
out the quotes. Try it and see.

Without the input file foo, sed reads from standard input, so
we could achieve the same result with the command

sed 's/Fred/Barney/g' < foo

or

cat foo | sed 's/Fred/Barney/g'

Note that the first two versions are generally preferred to
the third. Using cat just to send input into a pipe creates an
extra process which can often be avoided.

We also have to consider the output. By default, the results
appear on standard output, but this isn't always what we want. One
option is to pipe the output through a pager, for example

sed 's/Fred/Barney/g' foo | more

or to redirect it to a file

sed 's/Fred/Barney/g' foo > bar

While it is often tempting to write

sed 's/Fred/Barney/g' foo > foo

the only thing this achieves is to delete contents of the
file foo! Why? Because the first thing the shell does with this
command is to open the file foo for output, destroying what was
there already. When it tries to read from foo, there is nothing
there to read. The result is an empty file. This is an easy mistake
to make when redirecting output in this way, so do be
careful.

Awk is a bit more flexible than sed; it is a full-fledged
programming language in its own right. However, don't let that put
you off. Writing simple programs in awk is surprisingly easy, and
it often doesn't feel like a programming language [See page 46 of
Linux Journal issue 25, May 1996—ED]. For
example, the command

awk '{print NR, $0}' foo

prints the file foo, numbering each line as it goes. Awk can
also read its input from a pipe or from standard input, exactly
like sed, and also writes on standard output, unless you redirect
it. The bit between the quotes (which are necessary, since the
{} characters are also special characters to the
shell) is the awk program. I said they can be simple, didn't I? An
awk program is simply a sequence of one or more pattern-action
statements, in the form

pattern { action }

Each input line is tested against each pattern in turn. When
an input line matches a pattern, the corresponding action is
performed. Either the pattern may be empty, in which case every
line matches, or the action may be empty, in which case the default
action is to print the line.

In the example above, the pattern was empty, so every line
matched. The action was to print NR, which is a
built-in awk variable containing the number of lines read so far,
and then print $0, which is the current
line.

Going On

Now that we've seen the basic idea behind sed and awk, we're
going to look at some examples. The best way to learn something is
to actually do it, and I recommend that you try out some of these
examples yourself as you go along, possibly even with one eye on
the man pages. We certainly aren't going to cover everything that
sed and awk can do, but you will, it is hoped, have more confidence
to try things out yourself once you've finished reading this
article.

Our first example is to remove all the spaces from a
document. This is easily achieved using sed:

sed 's/ *//g' foo

This is like the earlier example with Fred and Barney, only
here we have used a regular expression: ' *'
(the quotes are included so that you can see the space that is part
of the regular expression). sed's s (for
substitute) command using regular expressions just like grep. The
regexp ' *' matches one or more spaces, which
are replaced with nothing—they are deleted.
This command doesn't deal with tabs, as it stands, but you could
modify it to match one or more occurences of either a tab or a
space:

As Linux continues to play an ever increasing role in corporate data centers and institutions, ensuring the integrity and protection of these systems must be a priority. With 60% of the world's websites and an increasing share of organization's mission-critical workloads running on Linux, failing to stop malware and other advanced threats on Linux can increasingly impact an organization's reputation and bottom line.

Most companies incorporate backup procedures for critical data, which can be restored quickly if a loss occurs. However, fewer companies are prepared for catastrophic system failures, in which they lose all data, the entire operating system, applications, settings, patches and more, reducing their system(s) to “bare metal.” After all, before data can be restored to a system, there must be a system to restore it to.

In this one hour webinar, learn how to enhance your existing backup strategies for better disaster recovery preparedness using Storix System Backup Administrator (SBAdmin), a highly flexible bare-metal recovery solution for UNIX and Linux systems.