This blog is about the Linux Command Line Interface (CLI), with an occasional foray into GUI territory.
Instead of just giving you information like some man page, I hope to illustrate each command in real-life scenarios.

Search This Blog

Saturday, April 26, 2008

awk is not an obvious choice as a tool for strictly extracting rows from a text file. It is better known for its column/field manipulation capabilities in a text file. More obvious choices are sed, and perl. You can see how sed does it in my earlier entry.

If you opt for awk, you can use its NR variable which contains the number of input records so far.

Suppose the text file is somefile:

$ cat > somefile.txtLine 1Line 2Line 3Line 4

To print a single line number, say line 2:

$ awk 'NR==2' somefile.txtLine 2

If the text file is huge, you can cheat by exiting the program on the first match. Note that this hack will not work if multiple lines are being extracted.

$ awk 'NR==2 {print;exit}' somefile.txtLine 2

To extract a section of the text file, say lines 2 to 3:

awk 'NR==2,NR==3' somefile.txtLine 2Line 3

A more interesting task is to extract every nth line from a text file. I showed previously how to do it using sed and perl.

Using awk, to print every second line counting from line 0 (first printed line is line 2):

$ awk '0 == NR % 2' somefile.txtLine 2Line 4

To print every second line counting from line 1 (first printed line is 1):

If I ever need to revisit that level4 subdirectory, re-running those same commands in the same order from the command history can be a real chore. You have to keep finding your way back in history to execute the next command.

bash has a short cut, Cntl-O (as in Control-Oh), that is your friend here.

Simply go back to the first command in the series (cd /home/peter).Hit Cntrl-O (instead of Enter), and it will run that command, and automatically display the next command for you (cd level1).

You can do one of several things at this point:

You can hit Cntl-O to run the currently displayed command, and display the next one (cd level2).

You can hit Enter to run the current command, and terminate the sequence (next command is NOT displayed).

You can hit Cntrl-C, and the current command is NOT executed, and the next command is NOT displayed.

Saturday, April 19, 2008

Enough about lines for now. Let's turn our attention to extracting columns and delimited fields in a text file. For instance, one task is to extract columns 5 to 7 in a file. Sometimes, the data you want reside in variable-length fields that are delimited by some character, say ",". A sample task is to extract the second field in a comma-delimited file.

As usual, there are more than 1 way to accomplish the tasks. The tools that we will use are cut, awk, and perl.

The text file is somefile.

$ cat > somefile
1234567890
1234567890
1234567890
1234567890

To extract fixed columns (say columns 5-7 of a file):

$ cut -c5-7 somefile
567
567
567
567

$ perl -pe '$_ = substr($_, 4, 3) . "\n"' somefile
567
567
567
567

The current line ($_) is replaced with substr($_, 4, 3), the substring starting from column 4 (perl is 0-based) for 3 characters.

To illustrate extracting a particular field, let's use /etc/passwd, a colon-delimited file. Say we extract the 6th field (home directory of users).

$ cut -d: -f6 /etc/passwd

$ awk -F : '{print $6}' /etc/passwd

$ perl -p -e '$_ = (split(/[:\n]/))[5] . "\n"' /etc/passwd

Here, I used the split function to separate out the words delimited by colon and the new line. The output of split is a list, and we assign the 5th element (perl is 0-based) to the current line. \n is necessary as a delimiter [:\n]; otherwise extracting the last field will have an extra new line.

If you think of some simple way to do this, please share with us using comments.