From

Thank you

Sorry

There are a lot of really nice scripting languages available to Unix admins, but Perl is still one of my favorites for doing any work that involves regular expressions -- any text that you can describe with a pattern. If you want to locate or change chunks of text that match some particular specification, you can probably throw together a script in Perl that will do the work fairly easily. In this week's post, we're going to examine some easy fixes for common problems that take advantage of Perl's versatile nature.

Removing blank linesThe first trick is removing blank lines. In Perl, recognizing a blank line is easy. Since ^ represents the beginning of a line and $ represents its end, ^$ represents a line that begins and ends and has nothing in between. You can expand this to also match lines that contain white space by changing the expression to ^\s*$. The \s means "white space", so \s* matches zero or more characters of white space.

To skip over blanks lines in a perl script, you have several choices. You could use a "next if /^$/" (skip if empty) command or a "next if /^\s*$/" skip if empty or only white space. Alternately, you could take the approach of printing only if (/\S/) (print if there is text) or print if (!^$) (print if not empty).

This script nugget would show only lines containing text by skipping blanks lines:

while ( ) {
next if /^\s*$/; # skip blank lines
print;
}

This one would only print lines containing text:

while ( ) {
print if (/\S/;
}

This code displays only lines that aren't empty:

while () {
print if (!/^$/); # print only if NOT empty
}

This handy one-liner removes blank lines, but also saves the original file to .old. The new file (the one without blank lines) assumes the original filename.

Removing whitespace at the beginnings and ends of linesRemoving whitespace at the beginnings or ends of lines can facilitate later processing by reducing the options that you need to consider.

To remove leading whitespace:

$string =~ s/^\s+//

To remove trailing whitespace:

$string =~ s/\s+$//

Removing non-Ascii charactersRemoving characters that don't fall within the range of the traditional ASCII character set is a little tricky. In the command below, we're using the perl tr command to map the range of characters between hex 80 (decimal 128) to hex FF (decimal 255) aqnd deleting them (d). This isn't going to be useful to you if you're using an extended character set.

$string =~ tr/\x80-\xFF//d;

Removing carriage returnsWhen using perl, the expression \r represents a carriage return while \n is a linefeed. You can easily remove carriage returns from a string variable as shown below.

$str =~ s/\r//g;

Carriage returns and linefeeds are removed by combining \r and \n in that order, of course, since that's the order in which they appear in text files, like those that are created on Windows systems.

$str =~ s/\r\n//g;

This one-liner removes carriage returns in a file using the same logic:

perl -p -i -e 's/\r//g' dosfile

This can be especially handy if you don't have a tool like dos2unix available. Plus, I really like the "in place" (no temporary files or copies involved) nature of perl commands with the "pie" (-p -i -e) arguments. And it easily turns into an alias:

alias rmCR='perl -p -i -e '\''s/\r//g'\'''

Adding line numbersTo add line numbers to the contents of a file, try this:

perl -p -i -e '$_ = "$. $_"' myfile

This one takes a bit of explanation. The $. variable contains the line number, so the command reads "change the line to the line preceded by the line number".

Replacing textPerl pie commands make replacing text within a file fairly easy, though you should always be careful that you're not changing text you didn't mean to change along with the text you hoped to target.

perl -p -i -e 's/2011/2012/g' filename

Renaming filesThis command renames files to their lowercase character equivalents. Read this as "for every argument provided (i.e., the file list "*"), rename the file to lowercase unless a file by that name already exists".

perl -e 'for (@ARGV) { rename $_, lc($_) unless -e lc($_); }' *

And, of course, you could do the same thing with uppercase by replacing lc with uc in both locations.

Sandra Henry-Stocker has been administering Unix systems for over 25 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She currently works for TeleCommunication Systems -- a company that builds innovative technologies to make critical connections happen -- where no one else necessarily shares any of her opinions.

The opinions expressed in this blog are those of the author's and do not necessarily represent those of ITworld, its parent, subsidiary or affiliated companies.