[[ When I sent out this week's quiz, I forgot to mention that it had
been contributed by Geoffrey Rommel, who also contributed the
discussion below. Thank you, Pr. Rommel! - MJD ]]
This quiz is phrased for Unix systems. If it makes sense to
write a solution for Windows or other systems, feel free to do
so.
The usual way to look for a character string in files in Unix
is to use grep. For instance, let's say you want to search for
the word 'summary' without regard to case in all files in a
certain directory. You might say:
grep -i summary *
But if there is a very large number of files in your
directory, you will get something like this:
ksh: /usr/bin/grep: arg list too long
Now, you could just issue multiple commands, like this:
grep -i summary [A-B]*
grep -i summary [C-E]*
etc.
... but that's so tedious. Write a Perl program that allows
you to search all files in such a directory with one command.
You're probably wondering:
- Should I use grep? egrep? fgrep? Perl's regex matching?
- Should there be an option to make the search case-sensitive or not?
- Should we traverse all files under all subdirectories?
You can decide for yourself on these questions. There is one
other requirement, though: the program must not fail when it
finds things for which grepping does not make sense
(e.g. directories or named pipes).
----------------------------------------------------------------
This quiz was suggested to me by a directory on one of my servers where all
of our executable scripts are stored. This directory now has over 4200
scripts and has gotten too big to search.
The solution shown here works for my purposes, but I do not wish to
depreciate the ingenious solutions found on the discussion list. I will try
to evaluate and discuss them in a separate message.
As MJD mentioned, Perl regex matching is clearly superior to the
alternatives. Since the original purpose was to search a directory of
scripts, the search is not case-sensitive; that option could be added
easily enough. We search only files (-f) in the specified directory, not in
lower directories. I also test for "text" files (-T) because my Telnet
client gets hopelessly confused if you start displaying non-ASCII
characters.
#!/usr/bin/perl
# The bin directory is too large to search all at once, so this does
# it in pieces.
($PAT, $DIR) = @ARGV[0,1];
$DIR ||= "";
die "Syntax: q16 pattern directory\n" unless $PAT;
open(LS, "ls -1 $DIR |") or die "Could not ls: $!";
@list = ();
while () {
chomp;
push @list , (($DIR eq "") ? $_ : "$DIR/$_");
if (@list >= 800) {
greptext($PAT, @list);
@list = ();
}
}
greptext($PAT, @list);
close LS;
exit;
sub greptext {
my ($pattern, @files) = @_;
foreach $fname (@files) {
next unless -f $fname && -T _;
open FI, $fname;
while () {
chomp;
print "$fname [$.]: $_\n" if m/$pattern/oi;
}
close FI;
}
}
----------------------------------------------------------------
[[ Administrative note: So far very few people have contributed
quizzes. Right now we have one expert and one regular quiz ready
to go. We need more, because unless more are contibuted, we will
run out in two weeks.
This mailing list has 1257 people subscribed to it. If each person
contributed just one quiz, we would be all set for the next 24
years.
Please send quizzes, or even just quiz ideas, to perl-qotw-submit.
Thanks, - MJD ]]