to be used when searching Perl code that
determines the subroutine and package contexts of the match by
defining the environment variables C and C to
be respectively C^sub\s+\w/> and C^package\b/>.
=item B
Explicitly set the secondary context PERLEXPR.
=back
=item B I
=over 4
=item B
Color the output.
The coloring scheme can be configured by the B
environment variable. Its expected format is a comma separated
list of color specifiers whose format is:
TYPE = d? COLOR (o d? COLOR)?
The TYPE and COLOR are single letters whose possible values are
listed below. If the color is prefixed with a B then this
indicates to use the bold version of that color. The optional
B part indicates a background color (mnemonic: on).
TYPEs: COLORs:
f filename r red
c colon g green
l line number b blue
b byte offset c cyan
n non matching part of the line m magenta
m matching part of the line y yellow
z -z context line k black
y secondary -z context line w white
For example, if B is C, then the
filename will be colored in green, and the matching text
as bold yellow on blue.
This option is disabled if the output is not directed to a terminal.
On the Windows platform, this option requires that the B
module C is installed.
=item B
Same as B, but do not disable the coloring if the output is
not directed towards a terminal.
=back
=item B
Show filenames using the Window's convention of separating
the parts using a backslash.
=item B
Specify an encoding for the output eg. C.
To output the ENCODING's appropriate BOM, add a C eg. C.
=back
=head2 B specials
=over 4
=item B
Prints out useful debugging information, including the
internally used Perl search routine.
Additional details are displayed if B is specified.
=item B
Ensure each printed line ends in a (native) newline.
=item B I
Each of these options runs some Perl code, but at a different
stage of B's processing. The Perl code is specified by
its argument. If the argument is a single word, then it is
assumed to be an I where the actual B is the
value of the environment variable named BI.
If the argument is not a single word, then it is assumed to be
the B. Note that the Perl code specified can be any
Perl statement(s) ie. it's not limited to just an expression.
=over 4
=item B, B
The Perl code is run for each line of input before it is tested
by PERLEXPR for a match. For example, to count all occurrences
of a word: C.
=item B, B
The Perl code is run before each file is opened.
=item B, B
The Perl code is run after each file has been processed.
=item B, B
The Perl code is run after all the files have been processed.
=item B, B
The Perl code is run after the file has been opened,
but before it is read.
=back
=item B I
=over 4
=item B
Write the output to a file named CIC.
=item B
Same as B, but also include a header at the top of the
output file giving details on the search performed.
=back
=item B
This allows for a user defined Perl subroutine to process each
FILE and give B an alternative input stream to search.
Usage examples include searching non text based file formats
such as B, or searching the individual files inside
a C archive.
The Perl subroutine can be written in any of the B
customization files. The subroutine used is determined according
to the file extension of the input file. The hash variable
C is used to map a file extension to a reference of the
subroutine that will handle files with that extension. For archive
files such as B files, prefix the file extension with a C so
that B knows to recurse into them to look for further matches.
# peg_ini.pl
# Define a mapping between the file extension and subroutine:
%Peg_S = (
'pdf' => \&process_pdf,
'*zip' => \&process_zip,
);
When B processes a file whose file extension is in C,
then it calls the appropriate subroutine with 2 arguments: the
relative filename (needed to C the file) and its full pathname
(needed for error messages). The subroutine should perform its
processing, and then call the subroutine C with 2 or 3 arguments:
a new Perl filehandle for B to process, a filename to use for
this input, and optionally a boolean flag that indicates if this
stream is within an archive. If the subroutine returns false, then
B will continue to process the file as usual, otherwise it will
continue with the next file.
The following code shows how this mechanism can be used to make
B process the individual files within a B archive, and
text within a B. Note that it relies on the availability of
the F and F programs.
# peg_ini.pl
sub process_zip {
my ($file, $fullpath) = @_;
# Determine the files within the .zip file
my @filelist = `unzip -Z1 \"$file\" 2>&1`;
if ($?) {
print STDERR "unzip failed with $fullpath: $?\n";
return 0; # signal to process the file as usual
}
# Extract each file in turn
foreach my $f (sort @filelist) {
$f =~ s/\015?\012\z//;
# Avoid extracting files which will be skipped due to "-pp"
next unless pp($f);
my $cmd = qq(unzip -p "$file" "$f");
open(my $fh, "$cmd|")
or die "can't extract $f from $fullpath: $!\n";
# Make peg process this.
S($fh, "$fullpath -> $f", 1);
}
return 1; # signal to continue with next file
}
sub process_pdf {
my ($file, $fullpath) = @_;
my $tempfile = "_tempfile.$$";
system "pdftotext \"$file\" $tempfile";
if ($?) {
print STDERR "pdftotext failed: $?\n";
unlink $tempfile;
return 0;
}
unless (open($fh, "
Buffer output. This sets C to zero.
The default behaviour is to flush each line of output. This ensures
output is shown immediately on the terminal, and that output piped
to another command is processed immediately. However, this is slow
when redirecting massive amounts of output to a file. For example,
C<< peg -r foo > ../log.txt >> may run significantly faster by
using B.
=item B
This restores C to its original contents prior to printing.
Only useful if C is changed within PERLEXPR. For example,
C.
=item B I
=over 4
=item B
Print the value of C at EOF. If C is a reference to an array
then the elements in that array are printed; else if C is a
reference to a hash then the keys of the hash are printed in
Ced order, with the values displayed if defined.
=item B
Same as B, but display the value of C once, after all
files have been processed.
=item B and B
Same as B and B respectively, but uses C to
display the value of C.
=back
=item B
When using B, show a continually updated progress message
showing which file is currently being processed.
=item B
Specify an alternative input record separator string
(Perl's C variable). See L.
For example, to process files with MAC style line endings of a
single linefeed character on a Windows machine, you would need
C.
=item B
Print the names of the files matched by the last successful run.
This option does not use, nor require, a PERLEXPR. Additionally,
the format can be modified by any of the following options:
=over 4
=item B
Show the full path names.
=item B
Show the files using backslashes instead of forward slashes.
=item B
Show each file's last modification time and size.
=item B
Show each file's I.
=item B, B, B & B
Sort the files respectively oldest first, oldest last,
smallest first or smallest last.
=back
=back
=head2 Miscellaneous
=over 4
=item B
Immediately exit with status 0 when a match is found.
=item B I
=over 4
=item B
Suppress all B's error/informational messages to STDERR.
=item B
Suppress a subset of B's error/informational messages.
In particular, those for unreadable directories emitted
when either B or B is used, and those for the assumed
encoding of files when using B.
=back
=item B I
When called as C, display version information for B
and the interpreting B.
Otherwise it makes B print out progress messages to STDERR.
Also sets the Perl variable C to true which is
available for use by subroutines in the customization files.
=item B I
This disables any previously specified options.
Useful for disabling options set in B,
or when modifying a previous command.
If it used as C, then only the options listed after the
comma are disabled. For example, C will disable B.
Additionally, if it is double specified as the first argument,
ie. C, then the ini files will not be loaded.
=item B
Display the approximate time taken in seconds to complete the search.
=item B
Ignore the results files created by B.
=item B
Explicitly end options.
Allows filenames beginning with a C to not be interpreted
as options. Also used by the B, B and B options to
determine which arguments are PERLEXPRs and which are files.
=item B
Show this Iful documentation! If an option is specified,
then just the documentation for that particular option is shown.
=back
=head1 AUTOVARS
Peg automatically provides the following variables that can be
used within the PERLEXPR:
=over 4
=item C
When B is used, this contains the current line of context.
=item C and C
For each file processed, C is its I as output by B,
while C is the path used to C it.
=item C
This is the result of a split applied to the input line.
That is, it contains a list of the whitespace delimited strings
in the current line.
=item C
This contains the lines of the current file up to the current line
so that C is the first line of input and C is the
previous line of input. This array can be used to test for matches
over consecutive lines.
=item C and C
C contains as keys all the alphanumeric words (matches C)
encountered so far in the file. The values are the number of times
that word has been seen.
C contains in order all the alphanumeric words (matches C)
in the current line.
=back
=head1 AVAILABLE SUBROUTINES
B provides some subroutines that can be used in the PERLEXPR:
=over 4
=item * C
This can be used to check for a match in the preceding lines.
It returns 1 on a match and 0 otherwise.
Its first argument is either a regular expression I or a
subroutine reference which is called with each preceding line set
to C. An optional second argument specifying how many lines back
to check (the default is 10). Additionally, if the second argument is
prefixed with a C then this indicates not to check the current
line; and if the second argument is a C then this will check all
the previous lines up to that point.
For example, C
will return lines containing an C where there is also a C or
C in the preceding 5 lines. This could also be written as
C.
=item * C
This can be used to check for where a line matches one of a list of
regular expression patterns and also one of the other patterns
matches in the preceding few lines. If the first argument is a
reference to a number, then this will be used as the number of
preceding lines B to check against (the default is 10).
It returns 1 if it finds a match and 0 otherwise.
=item * C
This highlights each string found to match the given PATTERN.
It returns the number of matches found. For example,
C.
The optional color specifier argument takes the same format as used
by B. For example, to highlight all occurrences of I
and I in a file with bold magenta and blue on yellow:
C.
=item * C
This returns the list of files matched by the last successful run.
If it is called with a true argument (eg. C) then
the full pathnames are returned; otherwise the files are relative to
the current directory. For an example of its use, see the B
option defined in the example F below.
=back
=head1 CUSTOMIZATION
At startup, B will load the following Perl files if they exist:
=over 4
=item * A site wide customization file.
F in the same directory that the B script resides.
=item * A user specific customization file.
F in the user's B directory.
=item * A current directory customization file.
F in the current working directory.
=back
These files should contain valid Perl code. They are primarily
intended for the setting of B's configuration environment variables,
but can also process C and so be used to extend B's command
line functionality.
=head2 Defining named command line options
Named command line options can be defined by adding I/I
pairs to C. If B is given on the command line
the relevent code is called with two arguments: a reference to the array
containing the remaining command line arguments, and a reference to the
array containing the file list. The general syntax is:
$::Peg_longopt{'option-name'} = sub {
my ($argv_ref, $filelist_ref) = @_;
# Define functionality for "--option-name".
};
=head2 Example B code
# Make "peg # ..." a shortcut for "peg -l +1 ..."
if (@ARGV and $ARGV[0] eq '#') {
splice @ARGV, 0, 1, '-l', '+1';
}
# Establish some useful default options:
$ENV{PEG_OPTIONS} ||= q{ -ssJJ+#_ -p "$File !~ m#(^|/)(\.svn|CVS)/#" };
# Configure some -p & -z aliases:
$ENV{PEG_P_P} ||= '/\.(pm|pl)$/i';
$ENV{PEG_Z_P} ||= '/^(\s*sub\s+\w|=head|__(END|DATA)__)/';
# Define a "--pager" option that pipes output through "less".
$::Peg_longopt{pager} = sub {
my $argv_ref = shift;
unshift @$argv_ref, '-Y,J#', '-JJJ##';
$! = $? = 0;
open(PAGER_OUT, '|-', "less -mR") && !$! && !$?
or die "unable to pipe STDOUT via less\n";
*STDOUT = \*PAGER_OUT;
};
# Define a "--vim NUM" option that opens the NUM-th match in vim.
$Peg_longopt{vim} = sub {
my $argv_ref = shift;
my $n = shift @$argv_ref or die "Usage: --vim NUM";
my @matches = last_matches();
my $file = $matches[$n-1];
system "vim \"$file\"";
exit;
};
1;
=head1 EXIT STATUS
The following exit values are returned:
0 one or more matches were found
1 no matches were found
2 peg did not complete normally
=head1 EXAMPLES
=over 4
=item 1. Search recursively for all VHDL constant definitions:
% peg "/^\s*constant\s.*:=/i"
=item 2. Find the instance names of CTS buffers in a verilog netlist:
% peg -N "/^\s*CTS\w*\s+(\w+)\s*\(/ and $_ = $1" foo.v
=item 3. Extract the entity declaration section from a VHDL file:
% peg "s/\s*--.*$//, /\bentity\b/i .. /\bend\b/i" bar.vhd
=item 4. Search for the sequence A,B,C split over 3 consecutive lines:
% peg -B2n "$P[-2]=~/A/ and $P[-1]=~/B/ and /C/" bam
=item 5. Sum up the entries in the last column of a file:
% peg -Z "$Z += $F[-1]" report.txt
=item 6. Search for "main" in C files below the current directory:
% find . -name "*.c" | peg -Xw main
% peg -wp .c main
=back
=head1 ENVIRONMENT
Options can be set via the environment variable B.
The colors used when running with B can be configured with B.
Aliases for B's file extension matching regular expressions are
set via BI.
Aliases for B's context matching regular expressions are
set via BI.
The environment variable B can be used to provide
a list of possible encodings to test for when using B.
The filename "-" indicates to read from standard input only if no other
file is listed; otherwise it is treated as an ordinary filename.
=head1 PLATFORM ISSUES
Filenames constructed while traversing the directory structure during
B or B searches are by output in the UNIX B> separated style.
On Windows, command line filenames containing either a C or a C>
undergo glob expansion. Note that B replaces consecutive Cs with
a C> separated list of Cs. For example, the filename C is
treated as C.
On Windows, you should install the B module C.
This will enable B to work, and will also ensure that the correct
output code page is used.
=head1 COREQUISITES
Win32::Console::ANSI
=head1 SCRIPT CATEGORIES
Search
=head1 README
Yet another imitation of the UNIX B program,
but with the power of Perl expressions.
=head1 HISTORY
=over 4
=item v0.1 summer 1996.
Born as "pgrep".
=item v1.00 September 1999.
Released to CPAN.
=item v2.00
Use File::Find to traverse directories.
Better support for running on Windows.
Now in color!
Lots of new options.
=back
=head1 SEE ALSO
L, L, L