Category Archives: Awk

For many system administrators, Awk is used only as a way to print specific
columns of data from programs that generate columnar output, such as netstat
or ps. For example, to get a list of all the IP addresses and ports with open
TCP connections on a machine, one might run the following:

# netstat -ant | awk '{print $5}'

This works pretty well, but among the data you actually wanted it also includes
the fifth word of the opening explanatory note, and the heading of the fifth
column:

Matching patterns

One common way is to pipe the output further through a call to grep, perhaps
to only include results with at least one number:

# netstat -ant | awk '{print $5}' | grep '[0-9]'

In this case, it’s instructive to use the awk call a bit more intelligently
by setting a regular expression which the applicable line must match in order
for that field to be printed, with the standard / characters as delimiters.
This eliminates the need for the call to grep:

# netstat -ant | awk '/[0-9]/ {print $5}'

We can further refine this by ensuring that the regular expression should only
match data in the fifth column of the output, using the ~ operator:

# netstat -ant | awk '$5 ~ /[0-9]/ {print $5}'

Skipping lines

Another approach you could take to strip the headers out might be to use sed
to skip the first two lines of the output:

# netstat -ant | awk '{print $5}' | sed 1,2d

However, this can also be incorporated into the awk call, using the NR
variable and making it part of a conditional checking the line number is
greater than two:

# netstat -ant | awk 'NR>2 {print $5}'

Combining and excluding patterns

Another common idiom on systems that don’t have the special pgrep command is
to filter ps output for a string, but exclude the grep process itself from
the output with grep -v grep:

# ps -ef | grep apache | grep -v grep | awk '{print $2}'

If you’re using Awk to get columnar data from the output, in this case the
second column containing the process ID, both calls to grep can instead be
incorporated into the awk call:

# ps -ef | awk '/apache/ && !/awk/ {print $2}'

Again, this can be further refined if necessary to ensure you’re only matching
the expressions against the command name by specifying the field number for
each comparison:

# ps -ef | awk '$8 ~ /apache/ && $8 !~ /awk/ {print $2}'

If you’re used to using Awk purely as a column filter, the above might help to
increase its utility for you and allow you to write shorter and more efficient
command lines. The Awk Primer on Wikibooks is a really good reference for
using Awk to its fullest for the sorts of tasks for which it’s especially
well-suited.