A text file contains lines of fields seperated by some delimiter; space for example; some lines but not all will have in the second field a value like an inital (e.g., A. or B. or Jr.) for those lines with such a value in field 2, we wish to have that value deleted. We also wish to do this as a one-liner. Here's what we have so far, but this omits all the other lines without a period in any field. What say you?

The following lines illustrate the issue: Thomas, A. Alexandria S. Perl Programmer Las Vegas NV.Williams, Jr., Michael A. C Programmer New York N.Y.Silver, Susan Java Programmer San Jose CAAltips, Alvin C. Tcl Programmer Chicago IL. The challenge is to remove the A. and Jr., from the first two records, leaving the remaining fields in all lines with periods intact, and moreover leaving all lines in the in.file.Thank you.

That's odd, when I run it it takes care of the Jr., and III., cases. Not the P.J., or A.B.C. cases though (they didn't appear in your example data). I made some changes to cover more cases, it looks like this in a script:

Code

#!/usr/bin/perl use warnings; use strict;

while (<DATA>) { s/ (\w+,) # one or more word characters follow by a comma, captured into $1 \s+ # one ore more white spaces (?: # non-capturing group \w # word character \.? # 0 or 1 period )+ # one or more of the "word character-maybe period" pattern [,.]+ # one or more periods or commas \s+ # one or more white spaces (.*) # everything else, captured into $2 /$1 $2/x; # matches in first and second capturing group print; }

Here's the real data examples (and that's the ballgame), the remaining Jr. entry is Park, notice Park does not include a comma. Notice the only III line is Mastando, which also does not include a commo in the first field:

The pattern matching approach is great. Can Perl identity by fields and then evaluate if that field (whether $2, $4, or $6) contains a period then delete it? If the field contains any number then delete it, etc.

The /\d/ has the effect of removing any field with a number, including fields with alpha+numbers. To be sure is it possible to be more surgical and only remove a chosen field but leaving the others intact?

The above code has the unfortunate side effect of removing the 2nd field if the line contains a period in any other field (for example, when the 3rd field contains A., or B., or C., then the 2nd field is deleted).