Shell Functions and Path Variables, Part 3

Suppose you log on to your UNIX system and discover, for
reasons beyond your control, that PATH is full of duplicate
entries. (Humour me. It does happen. Maybe your system
administrator modified /etc/PATH inadvisedly). Let's assume these
duplicates are making your PATH undesirably long. Is there anything
you can do to clean things up? Yes, you can type at the
prompt:

$ uniqpath

This will remove any duplicate entries from your path,
leaving the order of the remaining pathels intact. For example:

As usual, $pathvar contains the name of the
pathvar we want to modify. The code is rather similar to that of
delpath. The first line generates a variable
(npath) containing the unique path elements, and
the second line rebuilds the pathvar from those elements using
makepath. We don't use an external file to store the pathels, but
keep everything in shell variables. This is done in order to
demonstrate an alternative technique—there is no deeper reason.

The first line runs listpath to break the pathvar into
separate lines and pipes them through an
awk filter which removes duplicate
pathels. You may be wondering why we don't just use the
uniq program instead of awk's
magic. It's because uniq will remove duplicate lines from its input
only if they happen to be adjacent. In our case, the duplicate
pathels will generally not be adjacent, so
uniq won't work. “Aha,” you say,
“why not use sort -u? That will sort the lines
and remove duplicates.” True enough, however, it may also modify
the directory search order, if we ran
uniqpath to alter
PATH. Usually, people care about the order in
which their PATH directories are searched, and it's a bad idea to
modify it.

Thus, we have the awk solution. This uses a powerful feature
of awk known as an associative array or hash (if you have a Perl
background). If you're a C programmer, you'll know what an array
is: a group of objects of the same type, indexed by an integer. The
contents of an array can be accessed by expressions like values[0]
or values[20], which refer to the first and twenty-first elements,
respectively. A hash is rather like an array which can be indexed
by an arbitrary string of characters. So, in awk notation, we could
write

age["bill"]=27

to assign 27 to the hash element indexed by the string
bill in the hash called age.
Let's look at the awk code shown above.

Between the single quotes, we have a block of code run each
time awk reads a new line from its standard input. When awk reads a
line, it is stored in a special variable called
$0, and we use $0 as an index
into a hash called seen. (We haven't declared
this anywhere—that's okay in awk. Variables spring into existence,
with numerical value 0, when they appear in the code). We use the
seen hash to tell us whether awk has already
seen an identical line of input since it started executing. Let's
see what happens in the NEWP example shown
above.

First, listpath splits NEWP into lines
containing the following strings: “fred”, “bill”, “steve”,
“fred”, “dave” and “bill”, which are read in that order by
awk. awk stores each line it reads
in $0, so $0 takes on the
values “fred”, “bill” and so on, in turn. Each time a line is
read, the corresponding element of the seen hash
is incremented (by the line seen[$0]++) and is
printed only if it has been seen exactly once (by the print
statement in the if block, which prints
$0 to standard output by default). If we look at
the hash element seen["fred"], this is initially
0 and is then set to 1 when awk reads the first “fred” line,
remains at 1 for the next two lines, and is set to 2 when awk reads
the second “fred” line. It is printed only when it is seen for
the first time. C programmers should note how syntactically elegant
this solution is and how little code is required when compared to
the equivalent in C.

edpath

The final pathvar function we're going to see is
edpath. This breaks the pathels in
a pathvar into separate lines, writes them to a temporary file and
runs an editor on that file. You can edit the pathels to your
heart's content and quit from the editor when you're finished. The
pathvar is then reconstructed from the modified lines in the file.
edpath allows you to perform
arbitrary modifications on a pathvar. I use it most often when I
wish to swap the order of directories in
PATH.

The code for edpath is fairly straightforward (ignoring once
again the boring details of option handling):

Let's skip the first three lines for now. The real work is
done by the block of code starting with listpath. This follows a
similar pattern as delpath and uniqpath. First, we separate the
pathels in the pathvar using listpath, but this time, we redirect
the output into a temporary file. The next line edits that file.
The expression ${EDITOR:-vi} may be unfamiliar;
it means “Use the value of the EDITOR variable
if it is non-null, else use vi.” This allows the user to specify
his favourite editor by setting the EDITOR
environment variable (to Emacs, perhaps) but uses vi if he has not
done so. Note that the edit
command is run in the foreground, so the shell will wait until the
editor process terminates before running any more commands from the
shell function. When this occurs, the modified pathvar will be
reconstructed by the line starting with eval. If
you read the description of delpath given above, you'll know how
this line works.

Lines 2 and 3 of the code are a safety net. They store the
initial value of the pathvar to be edited in a new environment
variable. If the user is editing PATH, for
example, then the code creates a variable called
OLDPATH. If the user makes unwanted
modifications to her PATH, she can simply
type:

Trending Topics

Webinar: 8 Signs You’re Beyond Cron

Scheduling Crontabs With an Enterprise Scheduler
11am CDT, April 29th

Join Linux Journal and Pat Cameron, Director of Automation Technology at HelpSystems, as they discuss the eight primary advantages of moving beyond cron job scheduling. In this webinar, you’ll learn about integrating cron with an enterprise scheduler.