Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.

Please support our Patrons

Our hosting is kindly provided by Josh from AnHonestHost.com. We would appreciate it if you could donate to help reduce his costs in funding the hosting. He is also accepting bitcoins to 1KsxJr9HtsdaUeU7yaV9bk9bQi21UPBtUq
Please also consider supporting the https://archive.org/donate/ who are now hosting our media files. Right now, a generous supporter will match your contributions 3-to-1. So your $5 donation results in $20 for the Internet Archive.

Episodes about using Awk, the text manipulation language. It comes in various forms called awk, nawk, mawk and gawk, but the standard version on Linux is GNU Awk (gawk). It's a programming language optimised for the manipulation of delimited text.

Gnu Awk - Part 4

Introduction

This is the fourth episode of the series that b-yeezi and I are doing. These shows are now collected under the series title “Learning Awk”.

Recap of the last episode

Logical Operators

We have seen the operators ‘&&’ (and) and ‘||’ (or). These are also called Boolean Operators. There is also one more operator ‘!’ (not) which we haven’t yet encountered. These operators allow the construction of Boolean expressions which may be quite complex.

If you are used to programming you will expect these operators to have a precedence, just like operators in arithmetic do. We will deal with this subject in more detail later since it is relevant not only in patterns but also in other parts of an Awk program.

The next statement

We saw this statement in the last episode and learned that it causes the processing of the current input record to stop. No more patterns are tested against this record and no more actions in the current rule are executed. Note that “next” is a statement like “print”, and can only occur in the action part of a rule. It is also not permitted in BEGIN or END rules (more of which anon).

The BEGIN and END rules

The BEGIN and END elements are special patterns, which in conjunction with actions enclosed in curly brackets make up rules in the same sense that the ‘pattern {action}’ sequences we have seen so far are rules. As we saw in the last episode, BEGIN rules are run before the main ‘pattern {action}’ rules are processed and the input file is (or files are) read, whereas END rules run after the input files have been processed.

It is permitted to write more than one BEGIN rule and more than one END rule. These are just concatenated together in the order they are encountered by Awk.

Awk will complain if either BEGIN or END is not followed by an action since this is meaningless.

Variables, arrays, loops, etc

Learning a programming language is never a linear process, and sometimes reference is made to new features that have not yet been explained. A number of new features were mentioned in passing in the last episode, and we will look at these in more detail in this episode.

Long notes

With a view to making portable notes for this series I have included ePub and PDF versions with this episode. Feedback is welcome to help decide which version is preferable, as are any suggestions on the improvement of the layout.

Comments

Comment #1 posted on 2016-11-23T08:13:17Z by Otto

I always shied away from awk - yet another scripting language, but now I see how associative indexing ("hashes") may be useful.

Comment #2 posted on 2016-11-27T13:58:23Z by Dave Morriss

Thanks

Glad you found it useful. Keep listening, b-yeezi and I will be talking more about such arrays as we proceed with the series.

Comment #3 posted on 2017-12-09T17:17:19Z by Ron Strelecki

GNU AWK, part four

Love the episode, and the series.

I think that in your hello world example that demonstrates the FS built-in variable, the character used should not be a comma, but rather something distinct like a pipe (or some other character that does not have a different context in language). I understand that typically FS will be switched to a comma, if anything, but as the print statement uses a comma for a different function, it can be confusing.

$ awk -F "," 'BEGIN{print "FS is",FS}'
FS is ,

$ awk -F "|" 'BEGIN{print "FS is",FS}'
FS is |

Comment #4 posted on 2017-12-10T12:52:13Z by Dave Morriss

Thanks Ron

Thanks for the comment.

When I wrote this example it never occurred to me that it could be confusing, but now you point it out, yes it is. I think I was keen to show that -F on the command line is the variable FS in the script, and having just shown an example of -F "," just continued to use it!

I was also keen to make it clear that the comma in a print statement is where Awk puts the contents of OFS, so I guess I lost sight of the example a little in my enthusiasm :-)

I will consider modifying these notes in the light of your suggestion.

Comment #5 posted on 2017-12-18T15:21:42Z by Ron Strelecki

GNU Awk, part four

I think if you put what you suggested in the notes (that inside a print statement, Awk interprets a comma as OFS) that would be perfect! When learning any language, context variation is a consistent bugaboo. Wait, why does a semi-colon mean one thing here, and something else entirely there? So doing it deliberately, and then pointing it out is definitely beneficial, and points out the internal workings of the language.

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).