How to Use Awk in Linux - Most Common Scenarios

How to Use Awk in Linux - Most Common Scenarios

Last updated on July 10, 2018

Text manipulation is a large part of many administration tasks. Linux commands and log files often produce reams of output that need to be parsed differently by different individuals. The same output might need to be read in various ways depending on the need. Earlier, I'd looked at commands to parse the contents of large files and make changes to them using tools like sed. Awk is a command that allows us to do some of the same things - with a focus on columns and pattern matching.

In this tutorial, I'm going to show you how to use awk in Linux using some extremely common examples. Namely the matching of columns and how to get the output that you want.

Column Based Output

Many Linux commands list their output in columns. Take a basic command like "ls -l" for example. It displays information about each file or directly on its own line. And each column is a properly of that file. So, we have a column for the name, the date, the permissions etc. However, this output can quickly become overwhelming when you're searching for something specific.

Another example is the package history commands. In CentOS/RHEL for example, you can use "yum history" to get a complete list of all the commands relating to package management since the beginning. I personally choose to use "dnf", but the end result of the same. Here's the output of the "dnf history" on my machine:

Here you can see that the output is rendered in columns. Now what if you want to extract only the actual commands themselves. These are highlighted above in red. It looks as if it's the second column, but here appearances can be deceiving. It's important to know the structure of the output before you manipulate it. In reality the commands here are actually a combination of columns 3 and 4 ! Column 2 is the demarcation symbol "|".

We can use awk to print out column number 3 and 4 using this command:

dnf history | awk '{print $3, $4}'

As you can see, we pipe the output of "dnf history" to awk, which takes it and uses dollar signs ($) to print the columns we want. This will give us the following output:

If you want to export this output into a text file, you can also remove the first 3 lines which contain junk data by telling awk to only print lines if the line number is greater than 3. Like this:

dnf history | awk '{if (NR>3) {print $3, $4}}'

You can see that we've added a new conditional statement and nested the original "print" command inside it. NR here stands for "Number of Records". This gives the following output:

As you can see, the first three lines have been removed.

Pattern Matching with Awk

Let's say you want to filter the above output to only print those lines which are related to the installation of software. To do this, we want to identify lines in the 3rd column that have the word "install" inside them. Here's the command to accomplish that:

dnf history | awk '$3 ~ /install/ {print $3, $4}'

To create a pattern that matches the 3rd column, we use '$3 ~ /install/' . If you don't care about matching a specific column, you can omit the column specification along with the tilde (~) sign. This will then match the pattern across the entire line, and not just a specific column. Here's the output of the above command:

You can now export this output into a text file and feed it to another system to replicate the packages you have installed on the origin server. This is just one of the many uses of awk - the ability to filter the output of commands with pattern and column matching is incredibly useful. While there's a lot more to awk than just this, it's easily the most common use case scenario. I hope you found it useful!