ls [remote-directory [local-file]]
Print a listing of the contents of a directory on the remote
machine. The listing includes any system-dependent informa-
tion that the server chooses to include; for example, most
UNIX systems will produce output from the command `ls -l'.
If remote-directory is left unspecified, the current working
directory is used. If interactive prompting is on, ftp will
prompt the user to verify that the last argument is indeed
the target local file for receiving ls output. If no local
file is specified, or if local-file is `-', the output is
sent to the terminal.

Having a local copy of the package directory listing, then I would need to extract the month, day and time fields and write them to a temporary file. The last step is sorting with 'sort -u' to produce a listing of the unique dates.

3 Using 'awk'

In an interview with the Australian Computer World magazine Alfred V. Aho, one of the three architects of the 'awk' programming language, summarizes the language as follows:

"AWK is a language for processing files of text. A file is treated as a sequence of records, and by default each line is a record. Each line is broken up into a sequence of fields, so we can think of the first word in a line as the first field, the second word as the second field, and so on. An AWK program is of a sequence of pattern-action statements. AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed."

When 'awk' reads a line, these words or fields will be automatically split and stored in numbered variables like $1, $2, $3 etc. Taking a line from a directory listing as example.

Unfortunately these arrays only allow a numeric index. While normal arrays only take a numeric subscript or index, 'awk' supports what is called an associative array or hash table. These data structures accept non-numeric keys to lookup the corresponding value. Thus our month to number mapping problem can be solved like this:

The two lines "Nov 10" and "Jan 21" are fed to a short 'awk' script. The 'awk' initialization block BEGIN { month["Nov"]="11" ; month["Jan"]="01" } defines and stores mappings of the text strings "Nov" and "Jan" to "11" and "01" in the hash table 'month'.

The { print month[$1], $2 } block will be executed for each line, fed to 'awk'

For the first line, the 'Nov' field will be assigned to the variable $1. The second field '10' to $2. The command print month[$1] will be expanded to print month["Nov"], which will produce the corresponding value of "11".

This same procedure will be repeated for the "Jan 21" line, resulting into the "01 21" output.

Both 'awk' and 'sh' know a printf function similar to the one found in the standard C library.

To shorten the examples, we will first use the shell's printf. See printf(1)

To format a unsigned decimal number we use the %u conversion specification. Between the '%' and 'u' we insert a padding character '0' and the field width '2' so we get %02u.

Code:

$ printf "%02u\n" 1
01
$ printf "%02u\n" 12
12

A test using 'awk':

Code:

$ echo Jan 1 | awk '{ printf "%s %02u \n", $1, $2 }'
Jan 01

The name of the month in $1 is printed with a %s conversion specifier , while the '1' assigned to $2 will be formatted according to %02u.

The other problem was the same field could have either a time string of 5 positions, or a year only using 4 positions. By choosing a field width of 5 and a blank for padding we can solve this last issue.

Merging all the conversion specifications into one statement, the final printf for 'awk' is:

Code:

printf "%s-%02u % 5s %-40s %10u\n", month[$6], $7, $8, $9, $5
%s : prints the number produced by month[$6]
- : prints a literal '-' to separate month from day
%02u : day of the month as found in $7
: a space
% 5s : the time or year from $8 aligned right in field of 5
: a space
%-40s : file name from $9, left aligned in a 40 positions wide field
: a space
%10u : file size in $5, justified right in a 10 position field
\n : a newline or linefeed

We follow the recommended practice of using the -w option, in order to have perl issues warnings about questionable constructs.

The 'use strict' pragma forces the programmer to declare all the variables and subroutines he will be using.

Lines 5-6 declare and initialize the associative array or hash '%month'. To print the number of the month of July you would have to use: print $month{Jul} . This is not much different from print month["Jul"] in 'awk'.

Unlike 'awk', 'perl' does not automatically split lines into variables. One advantage is that we can give meaningful names to the fields. We declare them as local variables to the 'reformat' subroutine in line 14.

An alternative would be to use an array, e.g. @fields for storage so we could retrieve the month field with $field[6]

Code:

@fields = split ;

The variable '$len' is used in the 'printf' statement to allow easy change of the width of the file name field.

Lines 17-23 contain a loop statement which reads from standard input. Each line is assigned to the variable $_. This variable, unless overridden, is used as default in many 'perl' contructs.

For example the skipping of empty lines or whitespace-only lines as done in line 18, actually is a short cut for next if ( $_ =~ /^\s*$/ ) .

In program line 19, lines starting with the text 'total' are also skipped. Here again $_, holding the current line read from standard input, is the implicit variable. The 'i' modifier makes the pattern match a case-insensitive one.

The variables to receive the fields are emptied in line 20.

The splitting of the line into fields is done with the function 'split'. By default, just like in 'awk', this splitting uses whitespace as separator. It is also another instance where $_ is assumed to hold the text to be split.

The 'perl'printf has a similar syntax as the one in 'libc' from the C programming language. A notable exception is that 'perl' allows you to use a variable in the format conversion specifiers, as is done with the variable $len in line 22.