Hi. As an exercise, I have written a little script that reads through a directory looking for a string in each file. Because I use a regex to do the find, I need to escape those characters that have a special meaning in a regular expression. I am attempting to do this in the code, but am not having much luck. The code I'm working with is --

The first thing I discovered is that it is impossible to code if ($input eq "?") {$input = "\$input";} presumably because the leading backslash is interpreted as an attempt to create a reference to $input. And while this code works fine with search arguments like "Chapter" or "Chapter 6", for example, when I enter an argument of "???" the print Now scanning... command prints out each file in the directory whose name is 3 characters long. Searching for "??" prints those files whose name is two characters long, and so on. If, when I invoke the script, I enter \?\?\?, the script works as expected, when, of course, I would have thought I would get a final argument of \\?\\?\\?. If I change the code to read if ($_ eq "A") {$_ = '\A';} and enter the script with a search argument of "AAA", it correctly searches for \A\A\A. Thus, it seems that the problem has to do with the fact that the search is for a question mark. Except that, if I code my $input = "?"; if ($input eq "?") {$input = '\?';} print "INPUT = $input\n"; print INPUT prints "INPUT = \?". Thanks in advance, s660117

The first thing I discovered is that it is impossible to code if ($input eq "?") {$input = "\$input";} presumably because the leading backslash is interpreted as an attempt to create a reference to $input.

Thanks for your reply, FishMonger. OK... I commented out the code that checks for special characters and coded $str = quotemeta($str); instead. In this case, the print "Now scanning..." line now prints Now scanning . for "INC\ map\ ref\ use", where INC, map, etc are the names of three character members in the directory and the spaces are escaped. So, I think two problems remain: that an argument of "???" still results in the member names and that I don't need to escape a blank. s660117

-d $dir || die "Directory not found. Terminating $!"; print "Now scanning $dir for \"$str\"\n"; -------------------------------------------------------------------------- This code is the start of a larger script, but is self-contained and is meant to be executed with three arguments: a switch (which is only relevant later on), a search argument, and a directory to be searched. The rest of the script reads through each file in the directory, looking for the search string. It uses a regex to find the string, hence the need to escape the special characters listed in the if clauses.

This code works if I enter a character string for the search arguments, even when the string contains a space. Thus, "perl myscript -s AAA . " causes the 'print "Now scanning..."' line to display "Now scanning . for AAA" in the current directory, and "perl myscript -s AAA BBB ." causes the line to print "Now scanning . for AAA BBB". If I enter "perl my_script -s +++ .", I get "Now scanning for \+\+\+"; if I enter "perl my_script -s ... .", I get "Now scanning for \.\.\.".

The problem occurs when I enter either a question mark or an asterisk as the search argument. Entering "perl my_script -s ??? ." causes the print line to write that it is now scanning for each of the files in the current directory whose name is 3 characters long: "Now scanning for INC, map, ref, use."; entering "perl my_script -s ?? ." causes the print line to list those files whose name is two characters in length. Entering "perl my_script -s * ." cause the now scanning line to list all files in the current directory.

FishMonger, Thanks for your reply. Actually, the code I gave you functions as a complete script. Input is accepted from the command line, is parsed, and the second argument is processed. In my script, I have an exit statement immediately following the "print Now Scanning" statement. Try it and see. s660117

Thanks for the reply, Fishmonger. I have a couple of questions and concerns. 1) Did you have the same experience as me with the code as written? 2) Why does the question mark elicit the names of members in the directory instead of being taken as a simple chacter? 3) What does \E do? I can't find it in Programming Perl and taking it out appears to have no effect. 4) Why code $pattern =~ s/\./\./; if all this does is substitute a character for the same character? Again, taking it out has no effect. 5) I am having trouble integrating your code into my program since at no time does the later code know whether metacharacters might be in the search string, and, if so, which ones are present. Thanks, s660117

If you want me to test or review your code, you first must post it within code tags which retains the formatting (indentation) and makes your code readable (assuming you actually formatted it correctly).

Programming Perl is a great resource book, but not the best choice for learning about regular expressions. If you want a book for that purpose, you should get "Mastering Regular Expressions, 3rd Edition". http://shop.oreilly.com/product/9780596528126.do

You can also learn about regex's by reading the documentation that comes with perl. The program that is used to access the docs is 'perldoc'. If you issue the command 'perldoc perl', you'll get an index of the documentation. There are several related to regex's, such as 'perldoc perlretut', and 'perldoc perlre'.

Your code and question(s) are telling me that this is your class homework assignment. With that in mind, I will critique portions of your code making a few suggestions, but I won't provide a complete solution.

No where in this thread do you indicate that you're using the strict and warnings pragmas. Those pragmas should be in every Perl script you write so, if you're not using them, please add them to your script.

Fishmonger, I am not in a class. I am teaching myself Perl and coded my routine as an exercise. I cut and pasted your code, but could not get it to work. To begin with, it chokes on the usage subroutine. As first I thought I had to include Pod::Usage, but that doesn't help. If I comment out the "die" line and execute the code, it refuses to recognize a directory of '.', telling me instead that the variable is undefined. Finally, your updates do nothing to help me with the need/desire to backslash the metacharacters within the script, which was the reason for my original post. Thanks again, s660117

The usage() sub, which should demonstrate how to execute the script, was intended to be written by you. Typically when using the Getopt::Long module I also use the Pod::Usage module and write the POD documentation in the script. In this case I chose not to do that because I didn't want to cover how to write POD documentation.

I did make an error in that example. The -f and -s options should be booleans, not string options.

Here's an updated version of that example, which only handles the parsing of the script arguments.

Code

#!/usr/bin/perl

use strict; use warnings; use Getopt::Long; use File::Find; use Data::Dumper;

My prior post showed how to use \Q...\E to escape the metacharacters in a regex and I referred you to the perl documentation for additional explanations. Do you need more info in that area?

I noticed that you're making recursive calls to your data_for_path() sub to process a directory tree. It would be much better/cleaner to use the File::Find module for that purpose.

As I previously mentioned, I'd split your sub into 2 separate subs, one to handle the -f option and one to handle the -s option. Then it would be a simple if conditional to decide which one of those gets executed as the "wanted" function for File::Find.

1) Now when I execute perl my_script -p -s -d '.' -s "ZZZ" with $str and $file defined as booleans, the Print Dumper line prints -- $VAR1 = '-s'; $VAR2 = '.'; $VAR3 = undef; $VAR4 = '1'; Now scanning . for "1", where the value of $str is '1'. If I change those GetOptions to strings, I get -- $VAR1 = '-s'; $VAR2 = '.'; $VAR3 = undef; $VAR4 = 'ZZZ'; Now scanning . for "ZZZ", and the value of $str is available for use.

2) When I execute perl mygrep_pg -o -s -d '.' -s ???, without quotes around the string, I still see the first member in the directory whose name is three characters long -- $VAR1 = '-s'; $VAR2 = '.'; $VAR3 = undef; $VAR4 = 'INC'; Now scanning . for "INC". I would love to understand what is happening here.

First, you need to understand that anything passed to the script which is not configured as an option (i.e., is the value of that option) needs to be quoted.

You seem to have altered the meaning of the -s option and added a new -o option, which I'm not sure what you intend that to do. We need to keep these things consistent and clear so that we know what they're used for.

The meaning of the options I used are:-d = the top level directory to start the search -p = the regex pattern to be used when matching the filename or file contents -f = boolean to indicate that the pattern is to match filenames -s = boolean to indicate that the pattern is used in searching the file contents

The -d and -p options require a quoted string for their value. The -f and -s options are booleans which do not get values passed to them on the command line.

If what you need differs from those definitions, then please provide the list of options you want to use and a clear definition for each.

For now, lets not be concerned with quoting the meta characters. Lets get the options figured out.

FishMonger, Maybe I'm being dense. The behavior of the script depends on an option that is equal to either -s or -f. When the option is equal to -s, the script expects as input the name of the directory to be searched and the string that it will attempt to find in each of the directory's files. When the option is -f, the script function like File::Find: it will look to see if the specified string is the name of a file in the specified directory. Given this, I don't see how the input string and input directory can be coded as boolean values. s660117

You can approach it in 2 different ways. You could use a single -o option where its value is 's' or 'f'. Or, IMO, the better approach would be to separate those as two boolean flags -s ($str) and -f ($file). The script could then use a simple conditional test to see which one is set to 1 (i.e., true) and then execute the corresponding subroutine.

example: (I'm assuming both cases require the directory and pattern to use in the search.

Ok... got it. Now the only problem is that I check both $str and $file in the course of the script and GetOptions will set one of them to undef. I suppose the only way around this is to set $file equal to '0' when $str is equal to '1' and vice versa.