Robert's Perl Tutorial

THIS DOCUMENT IS COPYRIGHTED.
Reproduction in whole or part is prohibited. Please email me at robert@netcat.co.uk if you want to use
this information anywhere.

The location of this document is
http://www.sthomas.net/oldpages/roberts-perl-tutorial.htm
mirrored from
http://www.netcat.co.uk/rob/perl/win32perltut.html

Introduction

This tutorial is...

A basic Perl course primarily for use on Win32 platforms. It assumes that
the reader knows nothing of programming whatsoever, but needs a solid grounding
for further work. After you finish this course you'll be ready to specialise in
CGI, sysadmin or whatever you want to do with Perl.

This tutorial is not...

A reference manual. You won't find all the regex stuff under
Regex. I think it's more fun to learn the basics then add little extras along
the way. Keeps you awake, and it is a good excuse for not organising it better.

A FAQ.

Politically correct.

Complete. Please don't finish the course and assume you know
all there is to know about Perl. There is certainly enough here to get you
started, but consider the contents of this course as the tip of the iceberg.
Except maybe a little warmer.

What you need to know

You need to be able to differentiate between a PC and a toaster. No
programming experience is necessary. You do need to understand the basics of PC
operation. If you don't understand what directories and files are then you'll
find this difficult. You might find it difficult even if you do :-)

You do need to exercise the brain cells, and you need time.

What you need to have

A PC which can run a Win32 operating system. That's Windows NT 3.5,
3.51, 4.0 or later, or Windows 95 or Windows 98. Not Windows 3.1. Sorry. Now,
you finally have a reason to upgrade.

You need to get hold of a copy of Perl, so for that you might need
an Internet connection. But if you can get it some other way, you don't.

Note: You don't even need a Win32 PC if you are comfortable
installing Perl under other operating systems like Linux, but not all the
information here will be relevant.

You don't need a complier. Perl is an interpreted language, which means you
run code directly, not compile it then run it.

How to use this tutorial...

Just work through from start to finish.

Generally, the explanation follows the code sample. Before you read the
explanation, try and work out what the code does. Then check if you're right.
In this way, you'll derive maximum value from the tutorial and exercise the old
grey cells a little.

When you finish, please send me a critique. In fact, send one even if you
don't finish. I appreciate all feedback! Please note -- I am not
a source of free technical support. Do not email me your general Perl problems.
If you want support, ask on Usenet or the ActiveState mailing lists. That said,
I welcome problems related to the tutorial itself.

Conventions used in this Tutorial

The humour is non-conventional. I think. Of more importance, the text is
coloured strangely in places. My intention is to aid your comprehension, not
attempt beautification. The meaning of the colours:

Sometimes you'll need to type something in on the command line.
These commands will be in green, for example :perl
changeworld.pl parm1 datafile.txt

Code that you should load into your editor and run is in blue
(don't run this now, it's just an example):

while (<DATFILE>) {
printf "%2s : $_",$.;
}

when functions are referred to in the text, their names are
highlighted in red. For example, later we discover an interesting function
called split.

All the code examples have been tested, and you can just cut'n'paste (brave
statement). I haven't listed the output of each example. You need to run it
and see for yourself. Consider this course interactive. Consider it any which
way you like.

Use of this document

Personal Printouts

Fine by me, feel free print to a copy for your own use.

Intranet usage

Just email me and let me know.

Mirroring

Again, all I ask is an email.

Translations

Every so often someone offers to translate the tutorial. Nobody has actually
done so. If you want to, the conditions are:

You don't change the text other than what can be reasonably expected
during a translation;

The content, format and notices authorship remains the same;

You can add a 'translated by' notice in the intro and at the end,
plus your own message;

Version numbers are respected but the ISO code for your country is
added, eg 3.3.2.ES;

and you need to email me to discuss.

Remember this document is copyrighted and all associated
rights are strictly reserved.

A Short Introduction To Perl

If you already understand what Perl is designed to do, know its features
and limitations then you can skip this very small but highly informative
section, over which I laboured long and hard for those that didn't know. If you
are really sure, jump to the Setup Section.

What is Perl?

Perl is a programming language. Perl stands for Practical Report and
Extraction Language. You'll notice people refer to 'perl' and "Perl". "Perl" is
the programming language as a whole whereas 'perl' is the name of the core
executable. There is no language called "Perl5" -- that just means "Perl
version 5". Versions of Perl prior to 5 are very old and very unsupported.

Some of Perl's many strengths are:

Speed of development. You edit a text file, and just run it.
You can develop programs very quickly like this. No separate compiler needed. I
find Perl runs a program quicker than Java, let alone compare the complete
modify-compile-run-oh-no-forgot-that-semicolon sequence.

Power. Perl's regular expressions are some of the best
available. You can work with objects, sockets...everything a systems
administrator could want. And that's just the standard distribution. Add the
wealth of modules available on CPAN and you have it all. Don't equate
scripting languages with toy languages.

Usuability. All that power and capability can be learnt in
easy stages. If you can write a batch file you can program Perl. You don't have
to learn object oriented programming, but you can write OO programs in Perl. If
autoincrementing non-existent variables scares you, make perl refuse to let
you. There is always more than one way to do it in Perl. You decide your style
of programming, and Perl will accommodate you.

Portability. On the Superhighway to the Portability Panacea,
Perl's Porsche powers past Java's jaded jalopy. Many people develop Perl
scripts on NT, or Win95, then just FTP them to a Unix server where they run. No
modification necessary.

Editing tools You don't need the latest Integrated Development
Environment for Perl. You can develop Perl scripts with any text editor.
Notepad, vi, MS Word 97, or even direct off the console. Of course, you can
make things easy and use one of the many freeware or shareware programmer's
file editors.

Price. Yes, 0 guilders, pounds, dmarks, dollars or whatever.
And the peer to peer support is also free, and often far better than you'd ever
get by paying some company to answer the phone and tell you to do what you just
tried several times already, then look up the same reference books you already
own.

What is ActivePerl? Are the other Perls inactive?

A company named ActiveState exists to provide Perl tools for the Win32
environment. ActiveState used to be ActiveWare, and before that it was sort of
a part of Hip Communications. It now appears to be happy with its current name,
having not changed it for over a year. Win32 means, at the time of writing,
Windows 95, Windows 98 and Windows NT. It does not mean Windows 3.11,
even with Win32s installed.

Prior to Perl version 5.005, there was one version of Perl for Win32, and
another for all the other systems. The other version was known as the "native
version".

The Win32 version was developed by ActiveState, called "Perl for Win32" and
typically lagged slightly behind the native version. As of the 5.005 release,
Perl for Win32 and the native version have merged -- the native version now
supports Win32 directly and doesn't need any tweaking by ActiveState.

ActiveState have dropped "Perl for Win32" and renamed their distribution,
which comes with an InstallShield installer, "ActivePerl".

Incidentally, a few months before 5.005 merge the native Perl version was
changed so it would run on Win32 directly. This version was best known by the
creator's name, "Gurusamy Sarathy". However, there were still quite a few
differences between it and Perl for Win32, so many people ran both. The merge
brought the best of both worlds together.

Can I run Perl on my computer?

Probably. Perl runs on everything from Amigas to Macintoshes to Unix boxen.
Perl also runs on Microsoft operating systems, namely Windows 95, Windows 98
and Windows NT 3.51 and later. There are versions of Perl that run on earlier
versions of these operating systems but they are no longer developed or
supported. See http://www.perl.com/ for full
details.

What can I do with Perl ?

Just two popular examples :

The Internet

Go surf. Notice how many websites have dynamic pages with .pl
or similar as the filename extension? That's Perl. It is the most popular
language for CGI programming for many reasons, most of which are mentioned
above. In fact, there are a great many more dynamic pages written with perl
that may not have a .pl extension. If you code in Active Server
Pages, then you should try using ActiveState's PerlScript. Quite frankly,
coding in PerlScript rather than VBScript or JScript is like driving a car as
opposed to riding a bicycle. Perl powers a good deal of the Internet.

Systems Administration

If you are a Unix sysadmin you'll know about sed, awk and shell scripts.
Perl can do everything they can do and far more besides. Furthermore, Perl does
it much more efficiently and portably. Don't take my word for it, ask around.

If you are an NT sysadmin, chances are you aren't used to programming. In
which case, the advantages of Perl may not be clear. Do you need it? Is it
worth it?

After you read this tutorial you will know more than enough to start using
Perl productively. You really need very little knowledge to save time. Imagine
driving a car for years, then realising it has five gears, not four. That's the
sort of improvement learning Perl means to your daily sysadminery. When you are
proficient, you find the difference like realising the same car has a reverse
gear and you don't have to push it backwards. Perl means you can be lazier.
Lazy sysadmins are good sysadmins, as I keep telling my boss.

A few examples of how I use Perl to ease NT sysadmin life:

User account creation. If you have a text file with the
user's names in it, that is all you need. Create usernames automatically,
generate a unique password for each one and create the account, plus create and
share the home directory, and set the permissions.

Event log munging. NT has great Event Logging. Not so great
Event Reading. You can use Perl to create reports on the event logs from
multiple NT servers.

Anything else that you would have used a batch file for, or
wished that you could automate somehow. Now you can.

What can't I do with Perl ?

The question is, "what shouldn't I do with Perl". Write office suites is one
answer. Perl, like most scripting languages, is a glue language designed for
short and relatively simple tasks. Just don't equate this philosophy with a
lack of power or "serious" features.

Support

See the FAQs at www.perl.com. Of course there are Usenet groups, but also
many mailing lists. Microsoft Windows users will be interested in those hosted
by http://www.activestate.com/ which
discuss all things Perl and Windows.

Please, before you ask any question, anywhere:

Make sure you read the group charter. Many people put time and effort into the
creation of those charter in the interests of efficient discussion, so don't degrade the
discussion quality and insult us by ignoring the guidelines.

Read the FAQs at least twice. Try and find related FAQs. Try hard. You won't be
popular if you post a question starting "I've looked at all the FAQs..." and
then ask something that actually is in the FAQs. Or the manual for that matter.
Believe me, it will be patently obvious to all on the list if you haven't done your
homework.

Carefully phrase the questions and provide source code because if you do that,
you may well end up solving the problem yourself because you have thought it through a
little more.

Think to yourself -- honestly -- if I was a busy Perl Professional, would
I want to answer my own question?

Does it clearly state what I want an answer to? Preferably just one question
at a time. Am I being unreasonable, for example asking for someone to code it
for me? Have I shown evidence that I have tried to help myself? Have I made any
mistakes in grammar? Is it polite? Is there enough information in there for the
answer to be given?

Why should you care? Well, if you ask poorly-formed questions or those
already answered in the FAQ...let's just say you won't get the answers you
want. If you care about your online reputation and wasting other people's time
-- two more reasons.

Setup

There are four stages:

Get the software.

Install it.

Run a test Script.

Celebrate or troubleshoot.

1. Getting the Software

An old version of Perl for Win32 is included with the Windows NT Resource
Kit. It is sadly out of date. Follow the steps below to get a newer version.
Having said that, you can complete the tutorial with the Resource Kit version
but you should upgrade as soon as you can.

Go to http://www.activestate.com/
and follow the links to download ActivePerl. It will be a single file, and the
name will be something like api508e.exe. The i stands
for Intel. If you have an Alpha, download apaXXXe.exe. If you're
not sure, download the Intel version.

The 508e is the version number, so expect this to change quite
rapidly. The file size will be just over 5Mb, so it will take a while to
download via modem. If you know how to use FTP, try
ftp.activestate.com/activeperl/.

When you find ActivePerl, save the file into any directory you please. I
like to organise my downloads into c:\downloads but that is just
personal preference. As long as ActivePerl ends up on your hard disk somewhere
it doesn't matter.

2. Installation

So you now have apixxxx.exe. If you forget where you saved it,
don't panic, just run Windows Explorer and search for api*e.exe

Double-click the apixxxx.exe. You'll see the fantastic ActivePerl
graphic and be advised to close all open applications before proceeding. The lizard thing
is a gecko, which adorns the famous O'Reilly book "Learning Perl on Win32
Systems". This tutorial is aimed at a more basic level than that book, in terms of
the author's knowledge, intended audience and quality of humour.

Agree to the license agreement or cancel the install, stop this tutorial and deny
yourself any hope of hackership.

Destination directory is whatever you want. I usually
install Perl in c:\progs\perl rather than c:\program
files\perl because many Win32 programs don't properly handle long
filenames, let alone those with spaces in. Or you could accept the default.
Your choice.

Select Components. All you'll need for this tutorial is "Perl for Win32
Core", but installing the "Online Help and Documentation" and "Example
Files" is highly recommended. If you run Internet Information Server (IIS) 3 or
later, or Personal Web Server (PWS), then install "Perl for ISAPI" and
"PerlScript" too, although don't try either of these until you are proficient
with the basics. The phrase running before walking comes to mind.

Select Options.

"Associate '.pl' with Perl.exe". If you select this
option then you can just type in the name of a script at the command line, or
double-click it and the script will run. If you don't, then in order to get a
script to execute you'll need to type:perl myscript.pl to
execute myscript.pl. Personally, I prefer double-clicking to allow
me to edit the file so I do not select this option. Also, perl has a plethora
of command line arguments which are difficult to pass to a script if you run it
by association. For the purposes of this tutorial I'm assuming that you haven't
associated .pl with perl.

"Add the Perl bin directory to your path". Do this,
otherwise you'll have to specify the full path to perl.exe every time you use
it. Not fun.

"Standard I/O redirection for IIS". If you run IIS
or PWS, select this. It is a Good Thing. Understand it later.

IIS Options If you use IIS or PWS you'll have this screen -- just accept both
options.

Program Folder whatever your preference is. This is just a link to the
documentation, to the perl.exe itself.

Confirmation make sure that what is displayed is what you have selected...

The install program will now copy files. At the end it will run a few perl scripts
itself, which briefly appear as DOS boxes. Don't worry, it is all quite normal.

Release notes. Well worth a read.

Reboot! Just so the path statement takes effect. In any case, it is always good
practice to reboot after a new install.

3. Testing - Your First Perl Script

So you know what this tutorial is designed to do. You know what Perl is
designed to do, and you have even installed it. It is now time to start the
tutorial proper, and actually hack some code.

The Tutorial: The Journey Begins

Your First Time

Assuming all has gone to plan, you can now create your first Perl script.
Follow these instructions, but before you start read them through once, then
begin. That's a good idea with any form of computer-related procedure. So, to
begin:

Create a new directory for your perl scripts, separate to your data files and the perl
installation. For example, c:\scripts\, which is what I'll assume you are
using in this tutorial.

Save the to c:\scripts\myfirst.pl. Be careful! Notepad will may save files
with a .txt extension, so you will end up with myfirst.txt.pl by
default. Perl won't mind, it'll still execute the file. If your version of Notepad does
this, select "All files" before saving or rename the file then load it again.
Better yet, use a decent text editor!

You don't need to exit Notepad -- keep it open, as we'll be making changes very soon.

Switch to your command prompt. If you don't know how to start a command prompt, click
'Start' and then 'Run'. If using Windows 9x, type in 'command' and press enter. If using
NT, type in 'cmd' and press Enter.

Change to your perl scripts directory, for example cd \scripts .

Hold your breath, and execute the script: perl myfirst.pl

and you'll see the output. Welcome to the world of Perl ! See what I mean
about it being easy to start? However, it is difficult to finish with Perl once
you begin :-)

What if it doesn't...?

So you typed in perl myfirst.pl and you didn't see
My first Perl script on the screen. If you saw "bad command or
filename" then either you haven't installed Perl or perl.exe is not in your
path. Probably the latter. Reboot, then try again.

If you saw Can't open perl script "xxxx.pl": No such file or
directory then perl is defintely installed, but you have either got the
name of the script wrong or the script is not in the same directory as where
you are trying to run it from. For example, maybe you saved in script in
c:\windows and you are in c:\scripts so of course
Perl complains it can't find the script. Could you? Well, don't expect Perl to
then. You don't have to run the script from the directory in which it resides,
but it is easier.

Assuming it's now all right...

We need to analyse what's going on here a little. First note that the line
ends with a semicolon ; . Almost all lines of code in
Perl have to end with semicolons, and those that don't have to will accept
semicolons anyway. The moral is -- use semicolons. Sorry; the moral is; use
semicolons.

Oh, one more thing -- if you haven't already done so, continue breathing.

Also note the \n . This is the code to tell Perl
to output a newline. What's a newline? Delete the \n
from the program and run it again:

print "My first Perl
script";

and all should become clear. You have now written your
first Perl script.

Shebang

Almost every Perl book is written for UN*X, which is a problem for Win32.
This leads to scripts like:

#!c:/perl/perl.exe
print "I'm a cool Perl hacker\n";

The function of the 'shebang' line is to tell the shell how to execute the
file. Under UNIX, this makes sense. Under Win32, the system must already know
how to execute the file before it is loaded so the line is not needed.

However, the line is not completely ignored, as it is searched for any
switches you may have given Perl (for example -w to
turn on warnings).

You may also choose to add the line so your scripts run directly on UNIX
without modification, as UNIX boxes probably do need it. Win32 systems
do not. We shall continue with the lesson.

Variables

Scalars

So Perl is working, and you are working with Perl. Now for something more
interesting than simple printing. Variables. Let's take simple scalar variables
first. A scalar variable is a single value. Like $var=10 which sets the variables $var
to the value of 10. Later, we'll look at lists like arrays and hashes,
where @var refers to more than one value. For the
moment, remember that Scalar is Singular. If weird metaphors help, think
of lots of scaly snakes at a singles bar. If that didn't help, I apologise for
putting the thought into your mind.

$ % @ are Good Things

If you have any experience with other programming languages you might be
surprised by the code $var=10. With most languages,
if you want to assign the value 10 to a variable
called var you'd write var=10.

Not so in Perl. This is a Feature. All variables are prefixed with a symbol
such as $ @ % . This has certain advantages, like
making programs easier to read. Honestly, I'm serious! It just takes some
getting used to. The prefixes mean that you can see where the variables
are quite easily. And not only that, what sort of variable it is.
The human language German has a similar principle (except nouns are
capitalised, not prefixed with $ and Perl is easier
to pronounce). You'll agree later, I think.

So, ever onwards. Time to try some more variables:

$string="perl";
$num1=20;
$num2=10.75;
print "The string is $string, number 1 is $num1 and number 2 is $num2\n";

Typing

A closer look...notice you don't have to say what type of variable
you are declaring. In other languages you need to say if the variable is a
string, array, what sort of number it is and so on. You might even have to
declare what type of number it is. As an example, in Java you'd been saying
things like int var=10 which defines the variable var as an
integer, with the value 10.

So, why do these other programming languages force you to declare exactly
what your variables are? Wouldn't it be easier if we could just not bother?

For short programs, yes. For really big projects with many programmers
working on the same application, no. That's because forcing variable type
declaration also forces a certain discipline and rigour which is what you need
on big projects.

As you know, Perl is not designed for gigantic
software engineering efforts. It is all about small, quick programs. For these
purposes you don't need the rigour of variable controls as much, so Perl
doesn't bother.

This idea of forcing a programmer to declare what sort of variable is being
created is called typing. As Perl doesn't by default enforce any rules
on typing, it is said to be a loosely typed language, as opposed to
something like C++ which is strongly typed.

Variable Interpolation

We still haven't finished learning from that humble bit of code. To refresh
your memory, here it is again:

$string="perl";
$num1=20;
$num2=10.75;
print "The string is $string, number 1 is $num1 and number 2 is $num2\n";

Notice the way the variables are used in the string. Sticking variables inside of
strings has a technical term - "variable interpolation". Now, if we didn't have the
handy $ prefix for we'd have to do something like the example
below, which is pseudocode. Pseudocode is code to demonstrate a concept, not designed to
be run. Like certain Microsoft software.

print "The string is ".string." and the number is ".num."\n";

which is much more work. Convinced about those prefixes yet ?

Try running the following code:

$string="perl";
$num=20;
print "Doubles: The string is $string and the number is $num\n";
print 'Singles: The string is $string and the number is $num\n';

Double quotes allow the aforementioned variable interpolation. Single quotes
do not. Both have their uses as you will see later, depending on whether you
wish to interpolate anything.

Changing Variables

Auto(de|in)crements

If you want to add 1 to a variable you can, logically, do this; $num=$num+1 . There is a shorter way to do this, which is
$num++. This is an autoincrement. Guess what this is;
$num-- . Yes, an autodecrement.

The last example demonstrates that it doesn't have to be just 1 you can add
or decrease by.

Escaping

There's something else new in the code above. The \
. You can see what this does -- it 'escapes' the special meaning
of $ .

Escaping means that just the $ symbol is printed
instead of it referring to a variable.

Actually \ has a deeper meaning -- it escapes
all of Perl's special characters, not just $ .
Also, it turns some non-special characters into something special. Like what?
Like n . Add the magic \
and the humble 'n' becomes the mighty NewLine ! The \
character can also escape itself. So if you want to print a single \ try:

print "the MS-DOS path is c:\\scripts\\";

Oh, '\' is also used for other things like references. But that's not even
covered here.

There is a technical term for these 'special characters' such as @ $ %. They are called metacharacters. Perl uses
plenty of metacharacters. In fact, you'll wear your keyboard pretty evenly
during a night's perl hacking. I think it is safe to say that Perl uses every
possible keystroke and shifted keystroke on a standard US PC keyboard.

You'll be working with all sorts of obscure characters in your Perl hacking
career, and I also mean those on your keyboard. This has earned perl a
reputation for being difficult to understand. That's entirely true. Perl
does have such a reputation, no doubt about it.

Is the reputation justified? In my opinion, Perl does have a short but steep
learning curve to begin with simply because it is so different. However, once
you learn the character meanings reading perl code becomes much easier
precisely because of all these strange characters.

Context: About Perl and @^$%&~`/?

Perl uses so many weird characters that there aren't enough to go round. So
sometimes the same character has two or more meanings, depending on its
context. As an example, the humble dot . can join
two variables together, act as a wildcard or become a range operator if there
are two of them together. The caret ^ has different
effects in [^abc] as opposed to [a^bc] .

If this sounds crazy, think about the English language. What do the
following mean to you ?

MEAN

POLISH

LIKE

Mean is, in one context, is a word to used describe the purpose of
something. It is also another word for average. Furthermore, it describes a
nasty person, or a person who doesn't like spending money, and is used in slang
to refer to something impressive and good.

That's five different uses for 'mean', and you don't have any trouble
understanding which one I mean due to context.

Polish, when capitalised, can either mean pertaining to the country Poland,
or the act of making something shiny. And 'like' can mean similar to, or
affection for.

So, when you speak or write English (think of two, to and too) you know what
these words mean by their context. It is exactly the same way with Perl. Just
don't assume a given metacharacter always means what you first thought it did.

To finish off this section, try the following:

Strings and Increments

$string="perl";
$num=20;
$mx=3;
print "The string is $string and the number is $num\n";
$num*=$mx;
$string++;
print "The string is $string and the number is $num\n";

Note the easy shortcut *= meaning 'multiply $num
by $mx' or, $num=$num*$mx . Of course Perl supports
the usual + - * / ** % operators. The last two are
exponentiation (to the power of) and modulus (remainder of x divided by y).
Also note the way you can increment a string ! Is this language flexible or
what?

Print: A List Operator

The print function is a list operator.
That means it accepts a list of things to print, separated by commas. As an
example:

You might have been slightly surprised by the result of that last
experiment. In particular, what happened to our variable $var? It should have been incremented by one, resulting in
Perm. The reason being that 'm' is the next letter after 'l' :-)

Actually, it was incremented by 1. We are
postincrementing$var++ the variable, rather
than preincrementing it.

The difference is that with
postincrements, the value of the variable is returned, then the operation is
performed on it. So in the example above, the current value of $var was returned to the print
function, then 1 was added. You can prove this to yourself by adding the line
print "\$var is now $var\n"; to the end of the example
above.

If we want the operation to be performed on $var before the value is returned to the print function,
then preincrement is the way to go. ++$var will do
the trick.

Subroutines -- A First Look

Let's take a another look at the example we used to show how the
autoincrement system works. Messy, isn't it ? This is Batch File Writing
Mentality. Notice how we use exactly the same code four times. Why not just put
it in a subroutine?

Easier and neater. The subroutine can go anywhere in your script, at the
beginning, end, middle...makes no difference. Personally I put all mine at the
bottom and reserve the top part for setting variables and main program flow.

A subroutine is just some code you want to use more than once in the same
script. In Perl, a subroutine is a user-defined function. There is no
difference. For the purposes of clarity I'll refer to them as subroutines.

A subroutine is defined by starting with sub then
the name. After that you need a curly left bracket { ,
then all the code for your subroutine. Finish it off with a closing brace
} . The area between the two braces is called a
block. Remember this. There are such things as anonymous
subroutines but not here. Everything here has a name.

Subroutines are usually called by prefixing their name with an ampersand,
that is one of these -- & , like so
&print_results; . It used to be cool to omit the
& prefix but all perl hackers are now encouraged
to use it to avoid ambiguity. Ambiguity can hurt you if you don't avoid it.

If you are worrying about variable visibility, don't. All the variables we
are using so far are visible everywhere. You can restrict visibility quite
easily, but that's not important right now. If you weren't worrying about
variable visibility, please don't start. I'd tell you it's not important but
that'll only make you worried. (paranoid ?) We'll cover it later.

Comments

Did you see a # crept in there. That's a comment.
Everything after a # is ignored. You can't continue
it onto a newline however, so if your comment won't fit on one line start a new
one with # . There are ways to create Plain Old
Documentation (POD) and more ways to comment but they are not detailed here.

Comparisons

An iffy start

An if statement is simple. if the day is
Sunday, then lie in bed. A simple test, with two outcomes. Perl
conversion (don't run this):

if ($day eq "sunday") {
&lie_in_bed;
}

You already know that &lie_in_bed is a
call to a subroutine. We assume $day is set earlier
in the program. If $day is not equal to 'Sunday'
&lie_in_bed is not executed (pity). You don't
need to say anything else. Try this:

$day="sunday";
if ($day eq "sunday") {
print "Zzzzz....\n";
}

Note the syntax. The if statement requires
something to test for Truth. This expression must be in (parens), then you have
the braces to form a block.

The Truth According to Perl

There are many Perl functions which test for Truth. Some are if,
while, unless . So it is important you know what truth is, as
defined by Perl, not your tax forms. There are three main rules:

Any string is true except for "" and "0".

Any number is true except for 0. This includes
negative numbers.

Any undefined variable is false. A undefined variable is one which doesn't have a value,
ie has not been assigned to.

Some example code to illustrate the point:

&isit; # $test1 is at this moment undefined
$test1="hello"; # a string, not equal to "" or "0"
&isit;
$test1=0.0; # $test1 is now a number, effectively 0
&isit;
$test1="0.0"; # $test1 is a string, but NOT effectively 0 !
&isit;
sub isit {
if ($test1) { # tests $test1 for truth or not
print "$test1 is true\n";
} else { # else statement if it is not true
print "$test1 is false\n";
}
}

The first test fails because $test1 is undefined.
This means it has not been created by assigning a value to it. So according to
Rule 3 it is false. The last two tests are interesting. Of course, 0.0 is the
same as 0 in a numeric context. But it is not the
same as 0 in a string context, so in that case it is true.

So here we are testing single variables. What's more useful is testing the
result of an expression. For example, this is an expression; $x * 2
and so is this; $var1 + $var2 . It is the
end result of these expressions that is evaluated for truth.

The test fails because 5-5 of course is 0, which is false. The
print statement might look a little strange. Remember
that print is a list operator? So we hand it a list.
First item, a single-quoted string. It is single quoted because it we do not
want to perform variable interpolation on it. Next item is an expression
which is evaluated, and the result printed. Finally, a double-quoted string is
used because we want to print a newline, and without the doublequotes the
\n won't be interpolated.

What is probably more useful than testing a specific variable for truth is
equality testing. For example, has your lucky number been drawn?

The lt operator compares in a string context, and
of course < compares in a numeric context.

Alphabetically, that is in a string context, 291 comes before 30. It is
actually decided by the ASCII value, but alphabetically is close enough. Change
the numbers around a little. Notice how Perl doesn't care whether it uses a
string comparison operator on a numeric value, or vice versa. This is
typical of Perl's flexibility.

Bondage and discipline are pretty much alien concepts to Perl (and the
author). This flexibility does have a drawback. If you're on a programming
precipice, threatening suicide by jumping off, Perl won't talk you out of your
decision but will provide several ways of jumping, stepping or falling to your
doom while silently watching your early conclusion. So be careful.

An interlude -- The Perl Motto

The Perl Motto is; "There is More Than One Way to Do It" or
TIMTOWTDI. Pronounced 'Tim-Toady'. This tutorial doesn't try and mention all
possible ways of doing everything, mainly because the author is far too lazy.
Write your Perl programs the way you want to.

The Comparison Operators Listed

The rest of the operators are:

Comparison

Numeric

String

Equal

==

eq

Not equal

!=

ne

Greater than

>

gt

Less than

<

lt

Greater than or equal to

>=

ge

Less than or equal to

<=

le

The Golden Rule of Comparisons

They may be odious, but remember the following:

if you are testing a value as a string there should be only
letters in your comparison operator.

if you are testing a value as a number there should only be
non-alpha characters in your comparison operator

note 'as a' above. You can test numbers as string and vice versa.
Perl never complains.

More About If: Multiples

It is easy to see what else does. If the
expression is false then whatever is in the else
block is evaluated (or carried out, executed, whatever term you choose
to use). Simple. But what if you want another test ? Perl can do that too.

elsif

If the first test fails, the second is evaluated. This carries on until
there are no more elsif statements, or an
else statement is reached. An else
statement is optional, and no elsif statements
should come after it. Logical, really.

If you run it, it will return the same result - in this case. However, it is
Bad Programming Practice. In this case we are testing a number, but suppose we
were testing a string to see if it contained R or S. It is possible that a
string could contain both R and S. So it would pass both 'if' tests.
Using an elsif avoids this. As soon as the first
statement is true, no more elsif statements (and no
else statement) are executed.

I added some whitespace there for aesthetic beauty. There are other
operators that you can use instead of if and
unless , but that's for later on.

Incidentally, the two lines of code above do not do exactly the same thing.
Consider a maximum age of 50 and input age of 50. Therefore, you should be very
careful about your logic when writing code (nice obvious statement there).

For those that were wondering, Perl has no case statement. This is all
explained in the FAQ, which is located at http://www.perl.com/.

User Input

STDIN and other filehandles

Sometimes you have to interact with the user. It is a pain, but sometimes
necessary, especially for the live ones. To ask for input and do something with
it try this:

Chop

and that fails with a syntax error. Can you spot why? Look at the error
code, look at the line number and see where the syntax is wrong. The answer is
a missing semicolon ( ; ) on the end of the last two lines.

If you add a ; to the end of line 3, but not to
the last line, then the program works as it should. This is because Perl
doesn't need a semicolon to end the last statement of a block. However, I'd
advise ending all your statements with semicolons because you may well be
adding more code to them and it is only one little keystroke.

When you add the semicolon(s), the program runs correctly. The
chop function removes the last character of whatever
it is given to chop, in this case removing the newline for us. In fact, that
can be shortened:

Aside from demonstrating the native English speaker's linguistic talents,
this script also introduces the or logical operator.
We'll cover or and its associates in more detail
later on. First, a word of warning.

Chopping is dangerous, as my friend One Hand Harold will tell you. Everyone
is concerned about various forms of safety these days, and your perl code
should be no exception.

Safe Chopping with Chomp

Rather than just wantonly remove the last character regardless of whatever
it is, without a care in the world, just simply consigning the poor little
thing to the Great Bit Bucket in the Sky, you can remove the last character
only if it is a newline with chomp :

chomp ($name=<STDIN>);

At this point the perl gurus are screaming "I found an error !". Well,
chomp doesn't always remove the last character if it
is a newline but if it doesn't, you have set a special variable, namely
$/ , to something different. I presume that if you do
set $/ you know what it does. It is explained later
in this very document. Of course, being a good pupil, you wouldn't experiment
with the unknown, blindly changing things just for the hell of it to see what
happens.

If you don't, you'll never learn anything useful.

Arrays

Lists, herds -- what are arrays?

Perl has two types of array, associative arrays (hashes) and arrays. Both
types are lists. A list is just a collection of variables referred to as the
collection, not as individual elements.

You can think of Perl's lists as a herd of animals. List context refers to
the entire herd, scalar context refers to a single element. A list is a herd of
variables. The variables don't have to be all of the same type -- you might
have a herd of ten sheep, three lions and two wolves. It would probably be just
three lions and one wolf before long, but bear with me. In the same way, you
might have a Perl list of three scalar variables, two array elements and ten
hash elements.

Certain types of lists are known by certain names. Just as a herd of sheep
is called a flock, a herd of lions is called a pride, a herd of wolves is
called a pack and a herd of managers a confusion, some types of Perl list have
a special names.

Basic Array Work

For example, an array is an ordered list of scalar variables. This
list can be referred to as a whole, or you can refer to individual elements in
the list. The program below defines a an array, called
@names . It puts five values into the array.

@names=("Muriel","Gavin","Susanne","Sarah","Anna");
print "The elements of \@names are @names\n";
print "The first element is $names[0] \n";
print "The third element is $names[2] \n";
print 'There are ',scalar(@names)," elements in the array\n";

Firstly, notice how we define @names . As it is
in a list context, we are using parens. Each value is comma separated,
which is Perl's default list delimiter. The double quotes are not
necessary, but as these are string values it makes it easier to read and change
later on.

Next, notice how we print it. Simply refer to it as a whole, that is in
list context.. List context means referring to more than one element of
a list at a time. The code print @names; will work
perfectly well too. But....

I usually learn something about Perl every time I work with it. When running
a course, a student taught me this trick which he had discovered:

When a list is placed inside doublequotes, it is space delimited when
interpolated. Useful.

If we want to do anything with the array as a list, that is doing
something with more than one value, then refer to the array as
@array . That's important. The
@ prefix is used when you want to refer to more than
one element of a list.

When you refer to more than one, but not all elements of an array that is
known as a slice . Cake analogies are appropriate. Pie analogies are
probably healthier but equally accurate.

Elements of Arrays

Arrays are not much use unless we can get to individual elements. Firstly,
we are dealing with a single element of the list, so we cannot use
@ which refers to multiple elements of the array.
It is a single, scalar variable, so $ is used.
Secondly, we must specify which element we want. That's easy -
$array[0] for the first,
$array[1] for the second and so forth. Array indexes
start at 0, unless you do something which is so highly deprecated ('deprecated'
means allowed, usually for backwards compatibility, but disapproved of because
there are better ways) I'm not even going to mention it.

Finally, we force what is normally list context (more than one element) into
scalar context (single element) to give us the amount of elements in the array.
Without the scalar , it would be the same as the
second line of the program.

How to refer to elements of an array

Please understand this:

$myvar="scalar variable";
@myvar=("one","element","of","an","array","called","myvar");
print $myvar; # refers to the contents of a scalar variable called myvar
print $myvar[1]; # refers to the second element of the array myvar
print @myvar; # refers to all the elements of array myvar

The two variables $myvar and
@myvar are not, in any way, related. Not even
distantly. Technically, they are in different namespaces.

Going back to the animal analogy, it is like having a dog named 'Myvar' and
a goldfish called 'Myvar'. You'll never get the two mixed up because when you
call 'Myvar !!!!' or open a can of dog food the 'Myvar' dog will come running
and goldfish won't. Now, you couldn't have two dogs called 'Myvar' and in the
same way you can't have two Perl variables in the same namespace called
'Myvar'.

More ways to access arrays

The element number can be a variable.

print "Enter a number :";
chomp ($x=<STDIN>);
@names=("Muriel","Gavin","Susanne","Sarah","Anna");
print "You requested element $x who is $names[$x]\n";
print "The index number of the last element is $#names \n";

This is useful. Notice the last line of the example. It returns the index
number of the last element. Of course you could always just do this
$last=scalar(@names)-1; but this is more efficient.
It is an easy way to get the last element, as follows:

print "Enter the number of the element you wish to view :";
chomp ($x=<STDIN>);
@names=("Muriel","Gavin","Susanne","Sarah","Anna","Paul","Trish","Simon");
print "The first two elements are @names[0,1]\n";
print "The first three elements are @names[0..2]\n";
print "You requested element $x who is $names[$x-1]\n"; # starts at 0
print "The elements before and after are : @names[$x-2,$x]\n";
print "The first, second, third and fifth elements are @names[0..2,4]\n";
print "a) The last element is $names[$#names]\n"; # one way
print "b) The last element is @names[-1]\n"; # different way

It looks complex, but it is not. Really. Notice you can have multiple values
separated by a comma. As many as you like, in whatever order. The range
operator .. gives you everything between and
including the values. And finally look at how we print the last element -
remember $#names gives us a number ? Simply
enclose it inside square brackets
and you have the last element.

Do also note that because element accesses such as
[0,1] are more than one variable, we cannot use the
scalar prefix, namely the $ symbol.
We are accessing the array in list context, so we use the
@ symbol. Doesn't matter that it is not the
entire array. Remember, accessing more than one element of an array but not the
entire array is called a slice. I won't go over the food analogies again.

For Loops

A for Loop demonstrated

All well and good, but what if we want to load each element of the array in
turn ? Well, we could build a for loop like this:

which sets $x to 0, runs the loop once, then adds
one to $x , checks it is less than
$#names , if so carries on. By the way, that was your
introduction to for loops. Just to go into a little
detail there, the for loop has three parts to it:

Initialisation

Test Condition

Modification

In this case, the variable $x is initialised to 0.
It is immediately tested to see if it is smaller than, or equal to
$#names . If that is true, then the block is executed
once. Critically, if it is not true the block is not executed at
all.

Once the block has been executed, the modification expression is evaluated.
That's $x++ . Then, the test condition is checked to
see if the block should be executed or not.

For loops with .. , the range operator

There is a another version:

for $x (0 .. $#names) {
print "$names[$x]\n";
}

which takes advantage of the range operator ..
(two dots together). This simply gives $x the value of 0, then
increments $x by 1 until it is equal to
$#names.

foreach

For true beauty we must use foreach .

foreach $person (@names) {
print "$person";
}

This goes through each element ('iterates', another good technical word to
use) of @names , and assigns each element in turn to
the variable $person . Then you can do what you like
with the variable. Much easier. You can use

for $person (@names) {
print "$person";
}

if you want. Makes no difference at all, aside from a little clarity.

The infamous $_

In fact, that gets shorter. And now I need to introduce you to
$_ , which is the Default Input and Pattern
Searching Variable.

foreach (@names) {
print "$_";
}

If you don't specify a variable to put each element into,
$_ is used instead as it is the default for this
operation, and many, many others in Perl. Including the
print function :

foreach (@names) {
print ;
}

As we haven't supplied any arguments to print ,
$_ is printed as default. You'll be seeing
a lot of $_ in Perl. Actually, that statement is not
exactly true. You will be seeing lot of places where
$_ is used, but quite often when it is used, it is
not actually written. In the above example, you don't actually see
$_ but you know it is there.

A Premature End to your loop

A loop, by its nature, continues. If that didn't make sense, start reading
this sentence again.

The old jokes are the best, aren't they?

The joke above is a loop. You continue re-reading the sentence until you
realise I'm trying to be funny. Then you exit the loop. Or maybe somebody
doesn't exit it. Whatever, loops always run until the expression they are
testing returns false. In the case of the examples above, a false value is
returned when all the elements of the array have been cycled through, and the
loop ends.

If you want an everlasting loop, just test an condition you know will always
be true:

while (1) {
$x++;
print "$x: Did you know you can press CTRL-C to interrupt a perl program?\n";
}

Another way to exit a loop is a simple foreach
over the elements, as we have seen. But if we don't know when we want to exit a
loop? For example, suppose we want to print out a list of names but stop when
we find one with a particular title? You are throwing a huge party, someone is
allergic to vodka, and this person has drunk from the punch bowl despite being
assured by someone holding two empty bottles of Absolut that he was just using
the bottles to convey yet more orange juice into said punch bowl. So you need a
doctor, and so you write a Perl script to find one from the list of attendees,
wanting the doctor's name to be the last item printed:

The last operator is our friend. Don't worry about
the /Dr / business -- that is a regular expression which we cover
next. All you need to know is that it returns true if the name begins
with 'Dr '. When it does return true, last is
operated and the loop ends early.

A little more control over the premature ending: Labels

So that's easy enough. But wait! We need a medical, human-fixer type doctor,
not just anyone with a PhD. So, the same principle applies in this example
here:

Aside from showing one way to indent your code, this also demonstrates a
nested loop. A nested loop is a loop within a loop. What happens is that the
@names array is searched for a 'Dr ', and if it is found then the
@medics array is searched to make sure the doctor is a
human-fixing doctor not a professor of physics or something. The regular
expression has been shifted into an if statement,
where it works nicely as it only returns true or false.

The problem with the code is that after we find our medical doctor we want
it to stop. But it doesn't. It only stops the loop it is in, so Dr Pettle
never gets printed. However, the code just carries on with Sir Philip who is
terribly sorry old chap, but can't be of any bally use at all, what ho! What we
need is a way to break out of the entire loop from within a nest. Like so:

Only two changes here. We have defined a label, namely LBL.
Instead of breaking out from the current loop, which is the default, we specify
a label to break out to, which is in the outer loop. This works with as many
nested loops as your brain can handle. You don't have to use uppercase names
but for namespace reasons it is recommended, and you can call your labels
whatever you please. I was just being unimaginative with the name of LBL, feel
free to invent labels called DORIS or MATILDA if that's what floats your
personal boat.

This is worth looking at in more detail. It appears there is no fifth
element of @cities , as referred to by
@cities[2..4] .

Actually, there is a fifth element. Add this to the end of the example :

print "There are ",scalar(@names)," elements in
\@names\n";

There appear to be 8 elements in @names . However,
we have just proved there are in fact 9. The reason there are 9 is that we
referred to non-existent elements of @cities , and
Perl has quite happily extended @names to suit. The array
@cities remains unchanged. Try
poping the array if you don't believe me.

Now we have two arrays. The pop function removes
the last element of an array and returns it, which means you can do something
like assign the returned value to a variable. The
unshift function adds a value to the beginning of the
array. Hope you didn't forget that
&subroutinename calls a subroutine. Presented
below are the functions you can use to work with arrays:

A table of array hacking functions

push

Adds value to the end of the array

pop

Removes and returns value from end of array

shift

Removes and returns value from beginning of array

unshift

Adds value to the beginning of array

Now, accessing other elements of arrays. May I present the
splice function ?

Splice

The first argument for splice is an array. Then
second is the offset. The offset is the index number of the list element to
begin splicing at. In this case it is 1. Then comes the number of elements to
remove, which is sensibly 1 or more in this case. You can set it to 0 and perl,
in true perl style, won't complain. Setting to 0 is handy because
splice can add elements to the middle of an array,
and if you don't want any deleted 0 is the number to use. Like so:

would be appropriate. Certainly Hamburg is removed. Shame, such a great
lake. But note, the array element still exists. There are still four elements
in @cities. So what we need is the appropriate
splice function, which removes the element entirely.

splice (@cities, 1, 1);

Now that's all well and good for arrays. What about ordinary variables, such
as these:

It looks like we have deleted the $car variable.
Pity. But think about it. It is not deleted, it is just set to the null string
"". As you recall (hopefully) from previous ramblings, the null string
evaluates to false so the if test fails.

False values versus Existence: It is, therefore...

Just because something is false doesn't mean to say it doesn't exist. A wig
is false hair, but a wig exists. Your variable is still there. Perl does have a
function to test if something exists. Existence, in Perl terms, means defined.
So:

print "Car is defined !\n" if defined $car;

will evaluate to true, as the $car variable does
in fact exist.

This begs the question of how to really wipe variables from the face of the
earth, or at least your Perl script. Simple.

Basic Regular Expressions

An introduction

Or regex for short. These can be a little intimidating. But I'll bet
you have already used some regex in your computing life so far. Have you even
said "I'll have any Dutch beer ?" That's a regex which will match a Grolsch or
Heineken, but not a Budweiser, orange juice or cheese toastie. What about
dir *.txt ? That's a regular expression too, listing any files ending
in .txt.

Perl's regex often look like this:

$name=~/piper/

That is saying "If 'piper' is inside $name, then
True."

The regular expression itself is between
/ / slashes, and the =~
operator assigns the target for the search.

An example is called for. Run this, and answer it with 'the faq'. Then try
'my tealeaves' and see what happens.

So here $_ is searched for 'the faq'. Guess
what we don't need ! The =~ .
This works just as well:

if (/the faq/) {

because if you don't specify a variable, then perl searches
$_ by default. In this particular case, it would be
better to use

if ($_ eq "the faq") {

as we are testing for exact matches.

Senstivity -- regexes in touch with their inner child

But what if someone enters 'The FAQ' ? It fails, because the regex is case
sensitive. We can easily fix that:

if (/the faq/i) {

with the /i switch, which specifies
case-insensitivity. Now it works for all variations, such as "the Faq" and
"the FAQ".

Now you can appreciate why a regular expression is better in this situation
than a simple test using eq . As the regex searches
one string for another string, a response of "I would read the FAQ first !"
will also work, because "the FAQ" will match the regex.

Study this example just to clarify the above. Tabs and spaces have been
added for aesthetic beauty:

$_="perl for Win32"; # sets the string to be searched
if ($_=~/perl/) { print "Found perl\n" }; # is 'perl' inside $_ ? $_ is "perl for Win32".
if (/perl/) { print "Found perl\n" }; # same as the regex above. Don't need the =~ as we are testing $_
if (/PeRl/) { print "Found PeRl\n" }; # this will fail because of case sensitivity
if (/er/) { print "Found er\n" }; # this will work, because there is an 'er' in 'perl'
if (/n3/) { print "Found n3\n" }; # this will work, because there is an 'n3' in 'Win32'
if (/win32/) { print "Found win32\n" }; # this will fail because of case sensitivity
if (/win32/i) { print "Found win32 (i)\n" }; # this will *work* because of case insensitivity (note the /i)
print "Found!\n" if / /; # another way of doing it, this time looking for a space
print "Found!!\n" unless $_!~/ /; # both these are the same, but reversing the logic with unless and !
print "Found!!\n" unless !/ /; # don't do this, it will always never not confuse nobody :-)
# the ~ stays the same, but = is changed to ! (negation)
$find=32; # Create some variables to search for
$find2=" for "; # some spaces in the variable too
if (/$find/) { print "Found '$find'\n" }; # you can search for variables like numbers
if (/$find2/) { print "Found '$find2'\n" }; # and of course strings !
print "Found $find2\n" if /$find2/; # different way to do the above

As you can see from the last example, you can embed a variable in the
regex too. Regular expressions could fill entire books (and they have done,
see the book critiques at http://www.perl.com/) but here are some useful
tricks:

This time @names is initialised using whitespace
as a delimiter instead of a comma. qw refers to
'quote words', which means split the list by words. A word ends with whitespace
(like tabs, spaces, newlines etc).

The square brackets enclose single characters to be matched. Here
either Karl or
Carl must be in each element. It doesn't have to be
two characters, and you can use more than one set. Change Line 4 in the above
program to:

if (/[KCZ]arl[sa]/) {

matches if something begins with K, C, or Z, then arl, then either s or a.
It does not match KCZarl. Negation is possible too, so try this :

if (/[KCZ]arl[^sa]/) {

which returns things beginning with K, C or Z, then
arl, and then anything EXCEPT s or a. The caret
^ has to be the first character, otherwise it doesn't
work as the negation. Having said [ ] defines single
characters only, I should mention than these two are the same :

/[abcdeZ]arl/;
/[a-eZ]arl/;

if you use a hyphen then you get the list of characters including the
start and finish characters. And if you want to match a special character
(metacharacter), you must escape it:

/[\-K]arl/;

matches Karl or -arl. Although the
- character is represented by two characters,
it is just the one character to match.

Matching at specific points

If you want to match at the end of the line, make sure a
$ is the last character in the regex. This one pulls
out all those names ending in a. Slot it into the example above :

if (/a$/) {

And there is a corresponding character, the caret
^ , which in this context matches at the
beginning of the string. Yes, the caret also negates a character class
like this [^KCZ]arl but in this case it anchors
the match to the beginning of the string.

if (/n/i) {
if (/^n/i) {

The first one is true if the word contains an 'n' anywhere in it. The
second specifies that the 'n' must be at the beginning of the string to be
matched. Use this anchor where you can, because it makes the whole regex
faster, and safer if you know what the first character must be.

Negating the regex

Returning the Match

Now things get interesting. What if we want pull something out of a string ?
So far all we have done is test for truth, that is say yea or nay if a string
matches, but not return what we found. Run this:

Firstly, note the single quotes when $_ is
assigned. If there were double quotes, we'd need \@
instead of @
. Remember, double quotes "" allow
variable interpolation, so Perl looks for an array called
@NetCat which does not exist.

Secondly, look at the parens around the entire regex. If you use parens, a
side effect is that the first match is put into a variable called
$1 . We'll get to the main effect later. The second
match goes into $2 and so on. Also note that the
\@ has been escaped, so perl doesn't think it is an
array. Remember \ either escapes a special character,
or gives a special meaning. Think of it as Superman's telephone box. Imagine
Clark Kent walking around with his magic partner Back Slash.

Notice how we specify in the regex case-insensitivity with
/i and the regex returns the case-sensitive
string - that is, exactly what it found.

See, you can have more than one ! Look at the above regex. Looks
easy now, don't you think ? What about five minutes ago ? It would have looked
like a typing mistake ! Well, there are some hairier regex to come, but you'll
have a good barber.

* + -- regexes become line noise

When you see an if
statement like this, read it right to left. The print
statement is only executed if code on the right of the expression is
true.

We'll discuss this. Firstly, we have the opening parens ( .
So everything from ( to )
will be put into $1 if the match
is successful. Then the first character of what we are searching for,
< . Then we have a dot, or period
. . For this regex, we can assume . matches any character at all.

So we are
now matching < followed by any
character. The * means 0 or more of the previous
character. The regex finishes by requiring > .

This is important. Get the basics right and all regex are easy (I read somewhere once).
An example best illustrates the point. Slot this regex in instead:

The regex starts, logically, at the start of the string. This doesn't mean it starts a
'M', it starts just before M. There is a 'nothing' between the string start and 'M'.

The regex is searching for <* , which is 0
or more < .

The first thing it finds is not < , but
the nothing in between the start of the string and the 'M' from 'My email...". Does
this match ?

As the regex is looking for "0 or more" < ,
we can certainly say that there are 0 < at the
start of the string. So the match is, so far, successful. We have dealt with
<* .

However, the next item to match is > .
Unfortunately, the next item in the string is 'M', from 'My email..". The match
fails at this point. Sure, it matched < without
any problem, but the complete match has to work.

The only two characters that can match successfully at this point are
< or > .
The 'point' being that <* has been matched
successfully, and we need either > to
complete the match or more of < to continue
the '0 or more' match denoted by * .

'M' is neither of them, so it fails at this point, when it has matched

Quick clarification - the regex cannot successfully match <
, then skip on ahead through the string until it matches > . The characters in the string between < > also need to match the regex, and
they don't in this case.

All is not lost. Regexes are hardy little beasts and don't give up easily. An attempt
is made to match the regex wherever possible. The regex system keeps trying the match at
every possible place in the string, working towards the end.

Let's look at the match when it reaches the 'm' in 'work.com'.

Again, we have here 0 < . So the match
works as before. After success on <* the next
character is analysed - it is a > , so the
match is successful.

But, be warned. The match may be successful but your job is not done. Assuming the
objective of was to return the email address within the angle brackets then that regex is
a miserable failure. Watch for traps of this nature when regexing.

Pretty much the same as the above, except the parens are moved so
we return what's only inside the tags, not including the tags themselves. Also
note how / is escaped like so; \/
otherwise Perl thinks that's the end of the regex.

Now, suppose we change $_ to :

$_='HTML <I>munging</I> time
is here <I>again</I> !.';

and run it again. Interesting effect, eh ? This is known as Greedy
Matching. What happens is that when Perl finds the initial match, that is <I> it jumps right to the end of the string and
works back from there to find a match, so the longest string
matches. This is fine unless you want the shortest string. And there is a
solution:

/<I>(.*?)<\/I>/i;

Just add a question mark and Perl does stingy matching. No
nationalistic jokes. I have Dutch and Scottish friends I don't want to offend.

The Difference Between + and *

You know what * means, namely match 0 or
more. If you want to match 1 or more, then use + .
The difference is important.

$_='The number is 2200 and the day is Monday';
($star)=/([0-9]*)/;
($plus)=/([0-9]+)/;
print "Star is '$star' and Plus is '$plus'\n";

You'll note that $star has no value. The match was successful
though. It managed to match 0 or more characters from 0 to 9 at the very start
of the regex.

The second regex with $plus worked a little better, because we
are matching one or more characters from 0 to 9. Therefore, unless one 0 to 9
is found the match will fail. Once a 0-9 is found, the match continues as long
as the next character is 0-9, then it stops.

Now we know this, there is another way to remove an email address from
within angle brackets:

This regex matches <. Then the capturing parens
start. They have no effect on this regex other than to capture the match.
After that, there is a character class, containing one character. As ^
is the first character is the class, it negates the class. That's why
we are using a character class with only one character in it, because it can
be negated.

So far we have matched < and anything that is not
>. The + ensures we match as many characters that
are not <'s as we can. This has the same effect as
.*? but is more efficient. It may also suit your purposes, as
.*? relies on you knowing what you want to match up to, whereas
[^>]+ simply contines matching until it finds something that
fails its criteria. Just make sure you understand the difference because it is
a crucial part of regexery.

Re-using the match -- \1, $1...

Suppose we didn't know what HTML tag we had to match ? It could be B, I, EM
or whatever, and we want everything that is in between. Well, HTML container
tags like B and EM have end tags which are the same as the start tag, except
for the / . So what we could do is:

find out what is inside < >

search for exactly the same tag, but with the closing /

return whatever is in between.

Can this be done ? Of course. This is perl, all things are possible. Now, remember the
side effect of parens. I promise I'll explain the primary effect at some point. If
whatever is in (parens) matches, the result is stored in a variable called $1 . So we can use <(.*?)>
which will find us < then as
many anythings (the . and *
) up to the next, not last > (the
? forces stingy matching).

The result is stored in $1 because we used
parens. Next, we need everything up to the closing tag. That's easy : (.*?) matches everything up until the next character or set of characters. And how exactly do we define where to stop ?

We can use $1 even in the same regex it was
found in. However, it is not referred to within a regex as $1 ,
but \1 .

So we want to match </$1> which in perl
code is <\/\1> . The /
must be escaped because it is the end of the regex, and 1 is escaped so it refers to $1
instead of matching the number 1.

If you want to know how to return all the matches above, read on.
But before that:

How to Avoid Making Mountains while Escaping Special Characters

You want to match this; http://language.perl.com/faq/ . That's a real
(useful) URL by the way. Hint. To match it, you need to do this:

/http:\/\/language\.perl\.com\/faq\//;

which should make the awful metaphor above clearer, if not
funnier. The slash, / , is not
normally a metacharacter but as it is being used for the regular expression
delimiters, it needs to be escaped. We already know that .
is special.

Fortunately for our eyes, Perl allows you to pick your delimiter if you
prefix it with 'm' as this example shows. We'll use a #:

m#http://language\.perl\.com/faq/#;

Which is a huge improvement, as we change /
to # . We can go further with readability by quoting
everything:

m#\Qhttp://language.perl.com/faq/\E#;

The \Q escapes everything
up until \E or the regex delimiter (so
we don't really need the \E above). In this case #
will not be escaped, as it delimits the regex.

Someone once posted a question about this to the Perl-Win32-Users mailing
list and I was so intrigued about this apparently undocumented trick I spent
the next twenty minutes figuring it out by trial and error, and posted a reply.
Next day I found lots of messages telling the poster to read the manual because
it was clearly documented. <face colour='red' intensity='high'> My excuse
was I didn't have the docs to hand....moral of the story - RTFM and RTF FAQs !

Subsitution and Yet More Regex Power

Basic changes

Suppose you want to replace bits of a string. For example, 'us' with 'them'.

$_='Us ? The bus usually waits for us, unless the driver forgets us.';
print "$_\n";
s/Us/them/; # operates on $_, otherwise you need $foo=~s/Us/them/;
print "$_\n";
What happens here is that the string 'Us' is searched for, and
when a match is found it is replaced with the right side of the expression, in
this case 'them'. Simple.

You'll notice that only one substitution was made. To match globally use
/g which runs through the entire string,
changing wherever it can. Try:

s/Us/them/g;

which fails. This is because regexes are not, by default,
case-sensitive. So:

s/us/them/ig;

would be a better bet. Now, everything is changed. A little too much, but
one problem at a time. Everything you have learn about regex so far
can be used with s/// , like parens, character
classes [ ] , greedy and stingy matching and much
more. Deleting things is easy too. Just specify nothing as the replacement
character, like so s/Us//; .

So we can use some of that knowledge to fix this problem. We need to make
sure that a space precedes the 'us'. What about:

s/ us/them/g;

An small improvement. The first 'Us' is now no longer changed, but one
problem at a time ! We'll first consider the problem of the regex changing
'usually' and other words with 'us' in them.

What we are looking for is a space, then 'us', then a comma, period or
space. We know how to specify one of a number of options - the character class.

s/ us[. ,]/them/g;

Another tiny step. Unfortunately, that step wasn't really in the right
direction, more on the slippery slope to Poor Programming Practice. Why?
Because we are limiting ourselves. Suppose someone wrote ' send it to us;
when we get it'.

You can't think of all the possible permutations. It is often easier, and
safer, to simply state what must not follow the match. In this case, it
can be anything except a letter. We can define that as a-z. So we can add that
to the regex.

s/ us[^a-z]/ them/g;

the caret ^ negates the character class, and
a-z represents every alphabet from a to z
inclusive. A space has been added to the substitution part - as the original
space was matched, it should be replaced to maintain readability.

\w

What would be more useful is to use a-zA-Z
instead. If we weren't using /i we'd need
that. As a-zA-Z is such a common construct, Perl
provides an easy shorthand:

s/ us[^\w]/ them/g;

The \w construct actually means 'word' -
equivalent to a-zA-Z_0-9 .
So we'll use that instead.

To negate any construct, simply capitalise it:

s/ us[\W]/ them/g;

and of course we don't need the negating caret now. In fact, we
don't even need the character class !

s/ us\W/ them/g;

So far, so good. Matching the first 'us' is going to be difficult though.
Fortunately, there is an easy solution. We've seen Perl's definition of a
word - \w . Between each word is a boundary. You
can match this with \b .

s/\bus\W/ them/g;

that's \b followed by 'us', not 'bus' :-) Now, we
require a word boundary before 'us'. As there is a 'nothing' at the start of
the string, we have a match. There is a space after the first 'Us', so the
match is successful. You might notice an extra space has crept in - that's
the space we added earlier. The match doesn't include the space any more - it
matches on the word boundary, that is just before the word begins. The space
doesn't count.

Did you notice the final period and the comma are replaced ? They are part
of the match - it is the

Replacing with what was found

\W that matches them. We can't avoid that. We can
however put back that part of the match.

s/\bus(\W)/them\1/g;

We start with capturing whatever the \W
matches, using parens. Then, we add it to the replacement string. The
capture is of course in $1 , but as it is in a regex
we refer to it as \1 .

The final problem is of course capitalising the replacement string when
appropriate. Which in old versions of the tutorial I left as an exercise to
the reader, having run out of motivation. A reader by the name of Paul Trafford
duly solved the problem, and I have just inserted his excellent explanation for
the elucidation of all concerned:

# Solution to the us/them problem...
#
# The program works through the text assigning the
# variable $1 to 'U' or 'u' for any words where this
# letter is followed by 's' and then by non 'word'
# characters. The latter is assigned to variable $2.
#
# For each such matching occurrence, $1 is replaced by
# the letter that precedes it in the alphabet using
# operations 'ord' and 'chr' that return the ASCII value
# of a character and the character corresponding to a
# given natural number. After this 'hem' is tacked on
# followed by $2, to retain the shape of the original
# sentence. The '/e' switch is used for evaluation.
#
# NOTES
# 1. This solution will not replace US (short for
# United States) with Them or them.
#
# 2. If a 'magical' decrement operator '--' existed for
# strings then the solution could be simplified for we
# wouldn't need to use the 'chr' and 'ord' operators.

this checks that there are no non-number characters in
$x . It's not perfect because it'll choke on
decimal points, but it's just an example. Writing your own number-checker is
actually quite difficult, but it is an interesting exercise. Try it, and see
how accurate yours is.

x

I hope you trusted me and typed the above in exactly as it is show (or
pasted it), because the x is not a mistake,
it is a feature. If you were too smart and changed it to a
* or something change it back and see what it does.

except it returns 1, and there were definitely two matches. The match
operator returns true or false, not the number of matches. So you can test
it for truth with functions like if, while,
unless Incidentally, the s///
operator does return the number of substitutions.

To return what is matched, you need to supply a list.

($match) = /<i>(.*?)<\/i>/i;

which handily puts all the first match into
$match . Note that an = is used (for
assignment), as opposed to =~ (to point the regex at a variable
other than $_.

The parens force a list context in this case. There is just the one element
in the list, but it is still a list. The entire match will be assigned to the
list, or whatever is in the parens. Try adding some parens:

In the example above notice /g has been added
so a global replacement is done - this means perl carries on matching
even after it finds the first match. Of course, you might not know how many
matches there will be, so you can just use an array, or any other type of list:

and @words will be grown to the appropriate
size for the matches. You really can supply what you like to be assigned to:

($word1, @words[2..3], $last) = /<i>(.*?)<\/i>/ig;

you'll need more italics for that last one to work. It was only a
demonstration.

There is another trick worth knowing. Because a regex returns true each time
it matches, we can test that and do something every time it returns true. The
ideal function is while which means 'do something as
long the condition I'm testing is true'. In this case, we'll print out the
match every time it is true.

So the while operator runs the regex, and if it is true, carries
out the statements inside the block.

Try running the program above without the /g .
Notice how it loops forever ? That's because the expression always evaluates to
true. By using the /g we force the match to move on
until it eventually fails.

Firstly, notice the subtle introduction of the
or operator, in this case |
, the pipe. What I really want to explain however, is that this regex
matches o followed by rd, ne or ld. Without the parens it would be
/ord|ne|ld/ which is definitely not what we want.
That matches just plain ord, or ne or ld.

(?: OR Efficiency)

The code above functions correctly. If you were wondering what a good
name is, Petra, Peter and Penny qualify. The regex is not as efficient as it
could be though. Think about what Perl is doing with the regex, that you are
just ignoring. Simply throwing away casually. Without consideration as to
the effort that has gone into creating it for you. The resources squandered.
The little bytes of memory whose sole function in life is to store this
information, which will never be used.

What's happening is that because parens are used, perl is creating
$1 for your usage and abusage. While this may not seem important,
a fair amount of resources go into creating $1, $2
and so on. Not so much the memory used to store them, more the CPU effort
involved. So, if you aren't going to use the parens for capturing purposes, why
bother capturing the match?

The second print statement demonstrates that nothing is captured
this time. You get the benefits of the paren's precedence-changing
capabilities, but without the overhead of the capturing. This benefit is
especially worthwhile if you are writing CGI programs which use parens in
regex -- with CGI, every little of bit efficiency counts.

If you are wondering what the difference between match and using parens is
you should remember than you can move the parens around, but you can't vary
what $& and its ilk return. Also, using any of
the above three operators does slow your entire program, whereas using parens
will just slow the particular regex you use them for. However, once you've used
one of the three matches you might as well use them all over the place as
you've paid the speed penalty. Use parens where possible.

RHS Expressions

/e

RHS means Right Hand Side. Suppose we have an HTML file, which contains:

<FONT SIZE=2> <FONT SIZE=4> <FONT SIZE=6>

and we wish to double the size of each font so 2 becomes 4 and 4 becomes 8
etc. What about :

which doesn't really work out. What this does is match size=x,
where x is any digit. The first match, size=, goes
into $1 and the second match, whatever the digit
is, goes into $2 . The second part of the regex
simply prints $1 and $2
(referred to as \1 and
\2 ), and attempts to multiply $2
by 2. Remember /i means case insensitive
matching.

What we need to do is evaluate the right hand side of the regex as an
expression - that is not just print out what it says, but actually evaluate it.
That means work it through, not blindly treat it as string. Perl can do this:

$data=~s/(size=)(\d)/$1.($2 * 2)/eig;

A little explanation....the LHS is the same as before. We add
/e so Perl evaluates the RHS as an expression. So
we need to change \1
into $1 and so on. The parens are there to ensure
that $2 * 2 is evaluated, then joined to
$1 . And that's it !

/ee

It is even possible to have more than one /e .
For example:

$data='The function is <5funcA>';
$funcA='*2+4';
print "$data\n";
$data=~s/<(\d)(\w+)>/($1+2).${$2}/; # first time
# $data=~s/<(\d)(\w+)>/($1+2).${$2}/e; # second time
# $data=~s/<(\d)(\w+)>/($1+2).${$2}/ee; # third time
print "$data\n";

To properly appreciate this you need to run it three times, each time
commenting out a different line. Only one regex line should be uncommented
when the program is run.

The first time round the regex is a dumb variable interpolation. Perl just
searches the string for any variables, finds $1 and
$2, and replaces them.

Second time round the expression is evaluated, as opposed to just plain
variable-interpolated. This means that $1+2 is evaluated.
$1 has a value of 5 + 2 == 7. The other part of the
replacement, ${$2} is evaluated only so far as working out that
the variable named $2 should be placed in the string.

Third time round and Perl now makes a second pass through the string,
looking for things to do. After the first pass, and just before that second
pass the string looks like this; 7*2+4 . Perl evaluates this, and
prints the result.

So the more /e 's you add on the end of the regex,
the more passes Perl makes through the replacement string trying to evaluate
the code.

This is fairly advanced stuff here, and it is probably not something you
will use every day. But knowing it is there is handy.

A Worked Example: Date Change

Imagine you have a list of dates which are in the US format of month, day,
year as opposed to the rest of the world's logical notion of day, month year.
We need a regex to transpose the day and month. The dates are:

@dates=(
'01/22/95',
'05/15/87',
'8-13-96',
'5.27.78',
'6/16/1993'
);

The task can be split into steps such as:

Match the first digit, or two digits. Capture this result.

Match the delimiter, which appears to be one of / - .

Match the second two digits, and capture that result

Rebuild the string, but this time reversing the day and month.

That may not be all the steps, but it is certainly enough for a start.
Planning regex is important. So, first pass:

Hmm. This hasn't worked for the dates delimited with - . , and
the last date hasn't worked either. The first problem is pretty easy; we are
just matching / , nothing else. The second problem arises
because we are matching two digits. Therefore, 5/15/87 is matched on the 15
and 87, not the 5 and 15. The date 6/16/1993 is matched on the 16 and the 19 of
1993.

We can fix both of those. First, we'll match either 1 or 2 digits. There are
a few ways of doing this, such as \d{1,2} which means either 1 or
two of the preceding character, or perhaps more easily \d\d? which
means match one \d and the other digit is optional, hence the
question mark. If we used \d+ then that would match 19988883 which
is not a valid date, at least not as far as we are concerned.

Secondly, we'll use a character class for all the possible date delimiters.
Here is just the loop with those amendments:

which fails. Examine the error statement carefully. The key word is 'range'.
What range? Well, the range between / and . because - is the range
operator within a character class. That means it is a special character, or a
metacharacter. And to negate the special meaning of metacharacters we have to
use a backslash.

But wait! I don't hear you cry. Surely . is a metacharacter
too? It is, but not within a character class so it doesn't need to be escaped.

so that fixes that. In case you were wondering, the . dot does
not act as '1 of anything' inside a character class. It would defeat the object
of the character class if it did. So it doesn't need escaping. There is a
further improvement you can make to this regex:

which is good practice because you are bound to want to change your
delimiters at some point, and putting them inside the regex is hardcoding, and
we all know that ends in tears. You can also re-use the $m
variable elsewhere, which is good pratice.

Did you notice the difference between what we assign to $m and
what we had before?

/\-.
$m='/.-';

The difference is that the - is no longer escaped. Why not?
Logic. Perl knows - is the range operator. Therefore, there must
be a character to the immediate left and immediate right of it in order for it
to work, for example e-f. When we assign a string to
$m, the range operator is the last character and therefore has no
character to the right of it, so Perl doesn't interpret as a range operator.
Try this:

The two invalid dates at the end are let through. If you wanted to check
the validity of every possible date since the start of the modern calendar then
you might be better off with a database rather than a regex, but we can do some
basic checking. The important point is that we know the limitations of what
we are doing.

What we can do is make sure of two things; that there are three sets of
digits seperated by our chosen delimiters, and that the last set of digits is
either two digits, eg 99, 98, 87, or four digits, eg 1999, 1998, 1987.

How can we do this? Extend the match. After the second digit match we need
to match the delimter again, then either 2 digits or four digits. How about:

We are re-using the second match, which is the delimiter, further on
in the regex. That's what the \2 is. This ensures the
second delimiter is the same as the first one, so 5/7-98 gets rejected.

The $ on the end means end of string. Nothing allowed
after that. So the regex now has to find either 2 or 4 digits at the end
of the string, or it fails.

Added the match of the year ($4) to the rebuild section
of the regex.

Regex can be as complex as you need. The code above can be improved still
further. We could reject all years that don't begin with either 19 or 20 if
they are four-digit years. The other problem with the code so far is that it
would reject a date like 02/24/99 which is valid because there are
characters after the year. Both can be fixed:

We have now got a nested OR, and the inner OR is non-capturing for
reasons of efficiency and readability. At the end we alternate between
letting the regex match either an end of line or any non-digit, symbolised
with \D.

We could go on. It is often very difficult to write a regex that matches
anything of even minor complexity with absolute certainity. Think about IP
addresses for example. What is important is to build the regex carefully, and
understand what it can and cannot do. Catching anything supposedly invalid is
a good idea too. Test your regex with all sorts of invalid data, and you'll
understand what it can do.

Split and Join

Splitting

While you are in the regex mood, a quick look at split
and join . Destruction is always easier (just
ask your car mechanic), so lets start with split
.

Here we give split two arguments. The
first one is a regex specifying what to split on. The next is what to split.
Actually, I could leave $_ out because as usual it
is the default if nothing is specified.

The assignment can either be a scalar variable or a list like an array (or
hash, but at this time 'hash' to you means what you think the Dutch do or a
silly drinking event spoilt by some running). If it's a scalar variable you get
the number of elements the split has splut. Should that be 'the split has
splittered' or 'the split has splat'. Hmmm. Probably 'the split has split'. You
know what I mean. I think I just generated a Fatal Error in English.dll.
Whoops. In any case, splitting to a scalar variable is not always a Good Thing,
as we'll see later.

If the assignment is an array, then as you can see in the above example the
array is created with the relevant elements in order. You can also assign to
scalars, for example :

$_='Piper:PA-28:Archer:OO-ROB:Antwerp';
($maker,$model,$name,$reg,$location) = split /:/, $_;
(@aircraft[0..1],$aname,@regdetails) = split /:/, $_;
$number=split /:/ ; # not bothering with the $_ at the end, as it is the default
print "Using the first 'split'\n";
print "$reg is a $maker $model $name based in $location\n";
print "There are $number details available on this aircraft\n\n";
print "Using the second 'split'\n";
print "You can find $regdetails[0], an $aircraft[1], $regdetails[1]\n";

This demonstrates that a list can be a list of scalar variables (which is
basically what an array is anyway), and that you can easily see how many
elements the expression can be split into.

The example below adds a third parameter to split, which is how many
elements you want returned. If you don't want the extra stuff at the end
pop it.

In the example below we split on
whitespace. Whitespace, in perl terms, is a space, tab, newline,
formfeed or carriage return. Instead of writing
\t\n\f\r for each of the above, you can simply use
\s , or the negated version
\S which means anything except
whitespace. Think of whitespace as anything you know is there, but you can't
see.

The whitespace split is specially optimised for
speed. I've used spaces, double spaces, a tab and a newline in the list below.
Also note the + , which means one or more of the
preceding character, so it will split on any
combination of whitespace. And I think the final split
is useful to know. The split function does not
return the delimiter, so in this case the whitespace will not be returned.

The effect is to split each character. The
| is returned. As it is the delimiter,
| should be ignored, not returned.

At this point you should be thinking 'metacharacter'. A little research
(looking at the documentation) will reveal that | is
indeed a metacharacter, which means 'or', when inside a regex. So, in effect,
the regex /|/ means 'nothing, or nothing'. The
split is therefore performed on 'nothings', and there
are 'nothings' in between each character. The solution is easy ;
/\|/ .

Join takes a 'glue' operator, which is not a regular expression.
It can be a scalar variable however. In this case it is a space. Then it
takes a list, which can either be a list of scalar variables, an array or
whatever as long as its a list. And you can see what the result is. You
could assign it to an array, but you'd end up with everything in the first
element of the array.

The example below adds an array into the list, and demonstrates use of a
variable as the delimiter.

A few things to explain. Firstly, while (1) { .
We want an everlasting loop, and this one way to do it. 1 is always true, so
round it goes. We could test $input directly, but
that wouldn't allow last to be demonstrated.

Everlasting loops aren't useful unless you are a politician being
interviewed. We need to break out at some point. This is done by the
last function. When
$input is between 1 and the number of elements in
@cool then out we go. (You can also break out to
labels, in case you were wondering. And break out in a sweat. Don't start now
if you weren't.)

The srand operator initialises the random number
generator. Works ok for us, but CGI programmers should think of something
different because their programs are so frequently run (they hope :-).

rand generates a random number between 0 and 1, or
0 and a number it is given. In this case, the number of elements of
@cool -1, so from 0 to 7. There is no point
generating numbers between 1 and 8 because the array elements run from 0 to 7.

The int function makes sure it is an integer, that
is no messy bits after the decimal point.

The splice function removes the printed element
from the array so it won't appear again. Don't want to stress the point.

Concatenation

There is another joining operator, this time the humble dot, or period:
. . This concatanates (joins) variables:

Files

Opening

Perl is very good at handling files. Create, in your perl scripts directory
c:\scripts, a file called stuff.txt. Copy the
following into it :

The Main Perl Newsgroup:comp.lang.perl.misc
The Perl FAQ:http://www.perl.com/faq/
Where to download perl:http://www.activestate.com/

Now, to open and do things with this file. First, we must open the file
and assign it to a filehandle. All operations will be done on the
file via the filehandle. Earlier, we used
<STDIN> as a filehandle - we read from it.

What this script does is fail. What is should do is open the file
defined in $stuff , assign it to the filehandle
STUFF and then, while there are still lines left in
the file, print the line number $. and the current
line.

An unforgivable error

It fails. That's not so bad, everything fails sometimes. What is
unforgivable is NOT CHECKING THE ERROR CODE !

This is a better version:

open STUFF, $stuff or die "Cannot open $stuff for read :$!";

If the open operation fails, the or means that the code on the RHS (right hand side) is
evaluated. Perl dies. This means it exits the script, performs a post-mortem
which it writes up into $! and tells you the line
number at which it died. Just because $! contains
useful information doesn't mean to say it is automagically printed, in true
perl fashion. Usually you will wish to avail yourself of the information inside
as it is of great help when working out why something is not going according
to plan. The moral of the chapter is:

Always check your return codes !

\\ or / in pathnames -- your choice

The problem should now be apparent. The backslashes, being escape
characters, are not displayed. There are two ways to fix this:

The forward slashes are the preferred option, even under Win32, because you
can then port the script direct to Unix or other platforms (assuming you don't
use drive letters), and it is less typing. If you wish to use Perl to start
external processes then you must use the \\ method,
but this variable will be used only in a Perl program, not as a parameter to
start an external program. Changing the $stuff
variable results in a working script. Always check your return codes
!

Reading a file

A little more detail on what is happening here. The file is opened for read.
You can append and write too. You don't have to use a variable, but I
always do because it is then easy to change and easy to insert into
the or die section, and it is easy to change later
on. Hardcoding things is not the best way to write a maintainable and
flexible program. Just ask the Year 2000 people about code that lived a
little longer than the authors imagined :-).

open STUFF, "c:/scripts/stuff.txt" or die "Cannot open stuff.txt for read :$!";

is just as good but more work if you want to change anything.

The line input operator (that's the angle brackets <> reads from the beginning of the file up
until and including the first newline. The read data goes into
$_ , and you can do what you want with it there. On
the next iteration of the loop data is read from where the last read left off,
up to the next newline. And so on until there is no more data. When that
happens the condition is false and the loop terminates. That's the default
behaviour, but we can change this.

This means that you can open a 200Mb file in perl and run through it without
having to load the entire file into memory. 200Mb of memory is quite a bit. If
you really want to load the entire 200Mb file into one variable, Perl lets you.
Limits are not the Perl Way.

This saves a little bit of typing, but does tie your filehandle to the
variable name. In fact, that entire program could be compressed further, but
that's for later.

If you are really into shortness, try this:

$STUFF="c:/scripts/stuff.txt";
open STUFF or die "Cannot open $STUFF for read :$!";
print "Line $. is : $_" while (<STUFF>);

Writing to a File

A simple write

$out="c:/scripts/out.txt";
open OUT, ">$out" or die "Cannot open $out for write :$!";
for $i (1..10) {
print OUT "$i : The time is now : ",scalar(localtime),"\n";
}

Note the addition of > to the filename. This
opens it for writing. If we want to print to the file we now just specify the
filehandle name. You print to the filehandle, which is a gateway to the file.

Filehandles don't have to be capitalised, but it is wise. All Perl functions
are lowercase, and Perl is case-sensitive. So if you choose uppercase names
they are guaranteed not to conflict with current or future function words.

And a neat way to grab the date sneaked in there too. You should be aware
that writing to a file overwrites the file. It does not append
data! However, you may append:

Appending

$out="c:/scripts/out.txt";
&printfile;
open OUT, ">>$out" or die "Cannot open $out for append :$!";
print OUT 'The time is now : ',scalar(localtime),"\n";
close OUT;
&printfile;
sub printfile {
open IN, $out or die "Cannot open $out for read :$!";
while (<IN>) {
print;
}
close IN;
}

This script demonstrates subroutines again, and how to append to a file,
that is write additional data at the end. The close
function is introduced here. This, well, closes a filehandle. You don't
have to close a filehandle - just leave it open until the script finishes, or
the next open command to the same filehandle will close it for you.

@ARGV: Command Line Arguments

Perl has a special array called @ARGV . This is
the list of arguments passed along with the script name on the command line.
Run the
following perl script as:

perl myscript.pl hello world how are you

foreach (@ARGV) {
print "$_\n";
}

Another useful way to get parameters into a program -- this time
without user input. The relevance to filehandles is as follows. Run the
following perl script as:

perl myscript.pl stuff.txt out.txt

while (<>) {
print;
}

Short and sweet ? If you don't specify anything in the angle brackets,
whatever is in @ARGV is used instead. And after it
finishes with the first file, it will carry on with the next and so on.
You'll need to remove non-file elements from @ARGV
before you use this.

It can be shorter still:

perl myscript.pl stuff.txt out.txt

print while <>;

Read it right to left. It is possible to shorten it even further!

perl myscript.pl stuff.txt out.txt

print <>;

This takes a little explanation. As you know, many things in Perl, including
filehandles, can be evaluated in list or scalar context. The result that is
returned depends on the context.

If a filehandle is evaluated in scalar context, it returns the first line of
whatever file it is reading from. If it is evaluated in list context, it
returns a list, the elements of which are the lines of the files it is reading
from.

The print function is a list operator, and
therefore evaluates everything it is given in list context. As the
filehandle is evaluated in list context, it is given a list !

Who said short is sweet? Not my girlfriend, but that's another story. The
shortest scripts are not usually the easiest to understand, and not even always
the quickest. Aside from knowing what you want to achieve with the program from
a functional point of view, you should also know wheter you are coding for
maximum performance, easy maintenance or whatever -- because chances those
goals may be to some extent mutually exclusive.

Modifying a File with $^I

One of the most frequent Perl tasks is to open a file, make some changes and
write it back to the original filename. You already have enough knowledge to do
this. The steps would be:

Make a backup copy of the file

Open the file for read

Open a new temporary file for write

Go through the read file, and write it and any changes to the temp file

When finished, close both files

Delete the original file

Rename the temp file to the original filename

If you have managed to get this far and assiduously work through the
examples, the above will be child's play. Play if you want, but there is a
Better Way.

Make sure you have data in c:\scripts\out.txt
then run this:

@ARGV="c:/scripts/out.txt";
$^I=".bk"; # let the magic begin
while (<>) {
tr/A-Z/a-z/; # another new function sneaked in
print; # this goes to the temp filehandle, ARGVOUT,
# not STDOUT as usual, so don't mess with it !
}

So, what's happening? First, we load up @ARGV with the name
of a file. It doesn't matter how @ARGV is loaded. We could have
shifted the code from the
command line.

The $^I is a special variable. You knew that just
by looking at it. It's name is the Inplace Edit variable, and when it has a
value the effects are:

The name of the file to be in-placed edited is taken from the first
element of @ARGV. In this case, that is
c:/scripts/out.txt. The file is renamed to its existing
name plus the value of $^I, ie
out.txt.bk.

The file is read as usual by the diamond operator <>, placing a line at a time into
$_.

A new filehandle is opened, called ARGVOUT, and no prizes for guessing it is
opened on a file called out.txt. The original
out.txt is renamed.

The print prints automatically to
ARGVOUT, not STDOUT as it would usually.

At the end of the operation you have neatly edited the file and made a
backup. If you don't want a backup, assign a null string to $^I but don't go crying on any mailing lists if you
lose data.

The usual method of in-place editing would involve just printing everything
back where it came from until your regex finds whatever needs changing. You
could of course slurp the whole file into memory and play with it there, which
could be a lot easier but if you are dealing with files of more than a few
megabytes this is probably not a feasible approach.

Now take a look at out.txt . Notice how all
capital letters have been transliterated into lowercase. This is the
tr operator at work, which is more efficient than
regex for changing single characters. But that's only a small part
of the tr function's value to the world. More later.

You should also have an out.txt.bk file. And
finally, notice the way @ARGV has been created. You
don't have to create it from the command line arguments -- it can be treated
like an ordinary array, for that is what it is.

$/ -- Changing what is read into $_

On a different note, what if your input file is doesn't look like this:

which is delimited by TWO newlines, not one. You don't have to save the
above as shop.txt, but if you don't, the examples will be
difficult to follow.

Now, if you want each set of items as elements in an array you'll have to do
something like this:

$SHOP="shop.txt";
$x=0;
open SHOP or die "Can't open $SHOP for read: $!\n";
while (<SHOP>) {
if (/^\n/) { # does line begin with newline ?
$x++; # if so, increment $x. Rest of if statement not executed.
} else {
$list[$x].=$_; # glue $_ on the end of whatever is in $list[$x], using a .
}
}
foreach (@list) {
print "Items are:\n$_\n\n";
}

which works, but there is a much easier way to do it. You knew I was
going to say that.

The $/ variable is a special variable (it even
looks special). It is the Default Input Record Separator. Remember the
operation of the angle brackets being to read a file in up until the next
newline? Time to come clean. What the angle bracket actually do is read up
until whatever $/ is set to. It is set
to a newline by default.

So if we set it to two newlines, as above, then it reads up until it finds
two consecutive newlines, then puts the data into $_
This makes the program a lot shorter and quicker. You can set $/ to just about anything, not just a newline. If you want
to hack this list for example:

Tea:Beer:Wine:Pizza:Catfood:Coffee:Chicken:Salmon:Icecream

you could just leave $/ as a newline and slurp it
into memory in one go, but imagine the above items are a list of clothes that
your girlfriend wants to buy or a list of clothes your boyfriend should have
thrown away by now. Either are going to be really big files, and you might not
want to read it all into memory in one go. So set
$/=":"; and all will be well. There are also
read and seek functions,
but they aren't covered here. Those are useful for files where you read in a
precise number of bytes.

We'll go back to the last example for a moment. It is useful to know how to
read just one line (well, up to $/ ) at a time:

$SHOP="shop.txt";
$/="\n\n";
open SHOP or die "Can't open $SHOP for read: $!\n";
$clothes=<SHOP>; # everything up until the first occurrence of $/ into $clothes
$food=<SHOP>; # everything from first occurrence of $/ to the second into $food
print "We need...\n",$clothes,"...and\n",$food;

And now we know that, there is a even quicker way to achieve the
aim of the original program:

We haven't mentioned list context for a while. Whether the line input
operator <> returns a single value or a list
depends on the context you use it in. When you supply @xxxxx
then this must be a list. If you supply $xxxxx
then that's a scalar variable. You can force it into list context by
using parens.

The two lines below are provided so you can paste them into the above
program. They demonstrate how parens force list context. Remember to replace
the foreach with something that prints the variables.

($first, $second) = <SHOP>;
$first, $second = <SHOP>;

HERE Docs

The problem:

print "This is a long line of text which might be too long to fit on just one line\n";
print "and I was right, it was too long to fit on one line. In fact, it looks like it\n";
print "might very well take up to FOUR, yes FOUR lines to print. That's four print\n";
print "statements, which takes up even more room. But wait! I'm wrong! It will take\n";
print "FIVE lines to print this statement! Or is that six lines? I'm not sure....\n";

The solution:

$var='variable interpolated';
print <<PRT;
This is a long line of text which might be too long to fit on just one line
and I was right, it was too long to fit on one line. In fact, it looks like
it might very well take up to FOUR, yes FOUR lines to print.
That's four print statements, which takes up even more room. But wait! I'm
wrong! It will take FIVE lines to print this statement! Or maybe six lines?
I'm not sure....but anyway, just to prove this can be $var.
PRT

That's called a 'here' document and you don't need to use PRT,
you can use whatever you like within reason. You don't need to put in explicit
newlines, although if you do they perform as usual. Now you know about here
docs you can stop wearing the print function
out by calling it every couple of lines. You don't have to use here docs to
print to files, just anywhere you'd normally put a more than one print statement.

Reading Directories

Globbing

For this exercise, I suggest creating another directory where you have at
least two text files and two or more binary files. Copy a couple of .dll files
from your WINDIR directory if you need to, those will do for the binaries, and
save a couple of random text files. Size doesn't matter, in this case.

Then run this, giving the directory as the command line argument:

$dir=shift; # shifts @ARGV, the command line arguments after the script name
chdir $dir or die "Can't chdir to $dir:$!\n" if $dir;
while (<*>) {
print "Found a file: $_\n" if -T;
}

The chdir function changes perl's working
directory. You should, as ever, test to see if it worked or not. In this case
we only try and change directory if $dir is true.

The <*> construct reads all files from a given directory,
and prints if it passes the file test -T , which
returns true if the file is a non-binary, ie text file. You can be more
specific:

The first difference is the first line, which essentially says if
shift is false, then $dir =
., which is of course the current directory. Then, the directory is
opened and we have the chance to trap the error. It is assigned a filehandle.
The readdir function reads each file into
$file. There is no while
(<WDIR>) { construct.

We can also apply the text file test. Run this, once without entering a
directory and the second time with entering a directory path other than the one
the script is in:

Firstly, because the filename is now not in $_ we have to
explicitly apply the -T test to it with
-T $file.

Why did this not work the second time? Look at the code carefully. You are
testing $file. If perl doesn't get a fully qualified pathname, it
assumes you are still in the directory the script was run from, or that of the
last successful chdir . Not necessarily where you
are readdir'ing from. So, to fix it:

print "Found a file: $dir/$file\n" if -T "$dir/$file" ;

where we now specify the pathname, both in the printout and in the file test
itself. The "" are used because otherwise perl tries to divide
$file by $dir.

Notice that two files are found which have interesting names, namely .
and .. . These two files are the current, and lower
directory respectively. Nothing new, they have always been there -- run the DOS
command dir if you don't believe me. You don't usually want to
know about them, so:

but that includes the . files, so it is best to ensure they
aren't included:

@files=grep !/^\./,
readdir(DIR);

We haven't met -T yet, but for the moment just
remember it searches a list and if it returns true, lets the variable pass. In
this case, if it doesn't begin with . then that's true so it goes into
@files.

There are other commands associated with reading directories, which tell you
where in a directory you are, and then where to go to return. You should be
aware of their existence, because you never know when you might need them. The
one other command of use is closedir , which closes a
directory. Optional, but recommended for clarity.

Associative Arrays

The Basics

Very, very useful. First, a quick recap on arrays. Arrays are an ordered
list of scalar variables, which you access by their index number starting at 0.
The elements in arrays always stay in the same order.

Hashes are a list of scalars, but instead of being accessed by index number,
they are accessed by a key. The tables below illustrate the point:

@myarray

Index No.

Value

0

The Netherlands

1

Belgium

2

Germany

3

Monaco

4

Spain

%myhash

Key

Value

NL

The Netherlands

BE

Belgium

DE

Germany

MC

Monaco

ES

Spain

So if we want 'Belgium' from @myarray and also
from %myhash , it'll be:

print "$myarray[1]";
print "$myhash{'BE'}";

Notice that the $ prefix is used,
because it is a scalar variable. Despite the fact it is part of a list, it
is still a scalar variable. The hash syntax is simply to use braces { } instead of square brackets.

So why use hashes ? When you want to look something up by a keyword. Suppose
we wanted to create a program which returns the name of the country when given
a country code. We'd input ES, and the program would come back with Spain.

You could do it with arrays. It would be messy however. One possible
approach:

create @country , and give it values such
as 'ES,Spain'

Itierate over the entire array and

split each element of the array, and
check the first result to see if it matches the input

Complex and slow. We could also store a reference to another array in each
element of @countries , but that is not
efficient. Whatever way we choose, you still need to search the whole thing.
And what if @countries is a big array ? See how much
easier a hash is:

Very easy. All we need to do is make sure everything is in uppercase with
tr and we are there. Notice the way %countries is defined - exactly the same as a normal array,
except that the values are put into the hash in key/value pairs.

When you should use hashes

So why use arrays ? One excellent reason is because when an array is
created, its variables stay in the same order you created them in. With a hash,
perl reorders elements for quick access. Add print
%countries; to the end of that program above and run it. See
what I mean ? No recognisable sequence at all. It's like trying to herd cats.
If you were writing code that stored a list of variables over time and you
wanted it back in the order you found it in, don't use a hash.

Finally, you should know that each key of a hash must be unique.
Stands to reason, if you think about it. You are accessing the hash via keys,
so how can you have two keys named 'NL' or something ? If you do define a
certain key twice, the second value overwrites the first. This is a feature,
and useful. The values of a hash can be duplicates, but never the keys.

If you want to assign to a hash, there is of course no concept of
push , pop and
splice etc. Instead:

Hash Hacking Functions

Assigning

$countries{PT}='Portugal';

Deleting

delete $countries{NL};

Accessing Your Hash

Assuming you keep the same %countries hash as
above, here are some useful ways to access it:

All the keys

print keys %countries;

All the values

print values %countries;

A Slice of Hash :-)

print @countries{'NL','BE'};

How many elements ?

print scalar(keys %countries);

Does the key exist ?

print "It's there !\n" if exists
$countries{'NL'};

Well, that last one is not an access as a such but useful anyway.

More Hash Access: Iteration, keys and values

You may have noticed that keys and
values return a list. And we can iterate over a list,
using foreach :

The each function returns each key/value pair of
the hash, and is slightly faster. In this example we assign them to a list (you
spotted the parens ?) and away we go. Eventually there are no more pairs, which
returns false to the while loop and it stops.

If you are into brevity, both the above can be accomplished in a single line:

Perl is just so difficult at times, don't you think ? This works because:

keys returns a list

sort expects a list -- and gets one
from keys , and sorts it

reverse also expects a list, so it
gets one and returns it

then the whole list is foreach'd over.

This is a quick example to make sure the meaning of reverse is clear:

print "Enter string to be reversed: ";
$input=<STDIN>;
@letters=split //,$input; # splits on the 'nothings' in between each character of $input
print join ":", @letters; # joins all elements of @letters with \n, prints it
print reverse @letters; # prints all of @letters, but sdrawkcab )-:

Perl's list operators can just feed directly to each other, saving many
lines of code but also decreasing readability to those that aren't
Perl-literate:

You might want to sort numerically. In that case, you need to understand
how Perl's sort function works.

The sort function compares two variables, $a and $b. They must be
called $a and $b otherwise
it won't work. One chap published a book with stolen code, and he changed $a and $b to $x and $y. He obviously
didn't test the program because it would have failed and he would have noticed.
And this book was really published ! Don't believe everything you read in books
-- but web tutorials are always 100% truthful :-)

Back to sorting. $a and $b are compared, and the result is:

1 if $a is greater than $b

-1 if $b is greater than $a

0 if $a and $b are equal

So as long as the sort function gets one of those
three values back it is happy. This means we can write our own sort routines,
and feed them to sort. For example, we know the default sort is alphabetical.
But if we write this:

then it works correctly. Of course, there is an easier way. The 'spaceship'
operator <=> . It does exactly what
the supersort subroutine does, namely return 1, -1 or 0 depending on the
comparison of two given values.

Notice the { } braces, which define the contents as the subroutine sort must
use. Pretty short subroutine. There is a companion operator to <=> , namely cmp which does
exactly the same thing but of course compares the values as strings, not
numbers.Remember, if you are comparing numbers, your comparison operator
should contain non-alphas, if you are comparing strings the operator should
contains alphas only. And don't talk to strangers.

Anyway, you now have enough knowledge to sort a hash by value instead of
keys. Suppose your pointy haired manager bounced up to you and demanded a hash
sorted by value ? What would you do ? OK, what should you do ?

Well, we could just sort the values.

foreach (sort values %countries) {

But Pointy Hair wants the keys too. And if you have a value you can't find
the key.

So we have to iterate over the keys. But just because we are iterating over
the keys doesn't mean to say we have to hand the keys over to
sort . What about:

The example also demonstrates that you can foreach
over more than one list value -- each list is processed in turn. How I
discovered that particular trick with Perl is instructive. I just tried it. If
you think you should be able to do something with Perl, try it. Adhere to the
syntax and conventions you will be familiar with from experience, in this case
delimiting a list with commas, and try it. I'm always finding new shortcuts
just by experimentation.

Grep and Map

Grep

If you want to search a list, and create another list of things you found,
grep is one solution. This is an example, which also
demonstrates join again :

Remember qw means 'quote words', so word
boundaries are used as delimiters instead. The grep
function must be fed a list on the right hand side. On the left side,
you may assign the results to a list or a scalar variable. Assigning to a list
gives you each actual element, and to a scalar gives you the number of matches
found:

Try removing the braces and you'll get an error. Notice that the comma
before the list has gone. It is now obvious where the expression ends, as it
is inside a block delimited with { }. The regex says if the element begins
with g, s or p, then remove ing. The result is only assigned to
@new if the expression is completely true - 'parties'
does begin with p, so that works, but s/ing// fails
so the overall result is false, and the value is not assigned to
@new .

Map

Map works the same way as grep , in that they both
iterate over a list, and return a list. There are two important differences
however:

grep returns the value of
everything it evaluates to be true;

map returns the results of
everything it evaluates.

As usual, an example will assist the penny in dropping, clear the fog and
turn on the light (if not make my metaphors easier to understand):

You can see that @mapped is just a list
of 1's. Notice that there are 5 ones whereas there are six
elements in the original array, @stuff. This is because
@mapped contains the true results of
map -- in every case the expression
/ing/ is successful, except for 'parties'.

In that case there the expression is false, so the result is discarded.
Contrast this action with the grep function, which
returns the actual value, but only if it is true. Try this:

This uses the ord function to change each
letter into its ASCII equivalent, then the chr
function convert ASCII numbers to characters. If you change
map to grep in the example
above, you can see that nothing appears to happen. What is happening is that
grep is trying the expression on each
element, and if it succeeds (is true) it returns the element, not the result.
The expression succeeds for each element, so each element is returned in turn.
Another example:

Recapping on regex, what that does is match any element beginning with g, s
or p, and replace it with the same element twice. The caret
^ forces a match at the beginning of the string, the
[square brackets] denote a character class, and /e
forces Perl to evaluate the RHS as an expression.

The output from this is a mixture of 1 and nothing for map , and a three-element array called
@grepped from grep. Yet another example:

@mapped = map { chop } @stuff;
@grepped = grep { chop } @stuff;

The chop function removes the last character from
a string, and returns it. So that's what you get back from
^ , the result of the expression. The
grep function gives you the mangled remains
of the original value.

The subroutine isit first grabs
everything up until 'ing', puts it into $word , then
returns 'ok' if the there are three characters in $word
. If not, it returns the false value 0. You can make these subroutines
(think of them as functions) as complex as you like.

Sometimes it is very useful to have map return the
actual value, rather than the result. The answer is easy, but not obvious.
Remember that subroutines return the value of the last expression evaluated?
So, in this case, do blocks. What if the expression was, very simply:

and there you have it. Now you understand that you can go and impress your
friends, but please don't count on success.

External Commands

Some ways to...

Perl can start external commands. There are five main ways to do this:

system

exec

Command Input, also known as `backticks`

Piping data from a process

Quote execute

We'll compare system and exec first.

Exec

Poor old exec is broken on Perl for Win32. What
it should do is stop running your Perl script and start running whatever you
tell it to. If it can't start the external process, it should return with an
error code. This doesn't work properly under Perl for Win32. The
exec function does work properly on the standard Perl
distribution.

System

This runs an external command for you, then carries on with the script. It
always returns, and the value it returns goes into
$? . This means you can test to see if the program
worked. Actually you are testing to see if it could be started, what the
program does when it runs is outside your control if you use
system .

This example demonstrates system in action. Run
the 'vol' command from a command prompt first if you are not familiar with it.
Then run the 'vole' command. I'm assuming you have no cute furry executables
called vole on your system, or at least in the path. If you do have an
executable called 'vole', be creative and change it.

As you can see, a successful system call returns 0. An unsuccessful one
returns a value which you need to divide by 256 to get the real return value.
Also notice you can see the output. And because
system returns, the code after the first
system call is executed. Not so with
exec, which will terminate your perl script if it
is successful. Perl's usual use of single and double quotes applies as per
variable interpolation.

Backticks

These `` are different again to system and exec.
They also start external processes, but return the output of the process.
You can then do whatever you like with the output. If you aren't sure where
backticks are on your keyboard, try the top left, just left of the 1 key. Often
around there. Don't confuse single quotes '' with
backticks `` .

As you can see here, the Win32 vol command is executed. We just print it
out, escaping the $ in the variable name. Then a
simple regex, using # as a delimiter just in case you'd forgotten delimiters
don't have to be / .

When to use external calls

Before you get carried away with creating elaborate scripts based on the
output from NT's net commands, note there are plenty
of excellent modules out there which do a very good job of this sort of thing,
and that any form of external process call slows your script. Also note there
are plenty of built in functions such as readdir
which can be used instead of `dir` . You
should use Perl functions where possible rather than calling external
programs because Perl's functions are:

portable (usually, but there are exceptions). This means you can
write a script on your Mac PowerBook, test it on an NT box and then use
it live on your Unix box without modifying a single line of code;

faster, as every external process significantly slows your program;

don't usually require regexing to find the result you want;

don't rely on output in a particular format, which might be changed
in the next version of your OS or application;

are more likely to be understood by a Perl programmer -- for
example, $files=`ls`; on a Unix box means
little to someone that doesn't know that ls is the Unix
command for listing files, as dir
is in Windows.

Don't start using backticks all over the place when system will do. You might get a very large return
value which you don't need, and will consequently slurp lots of memory. Just
use them when you actually want to check the returned strings.

Opening a Process

The problem with backticks is that you have to wait for the entire process
to complete, then analyse the entire return code. This is a big problem if you
have large return codes or slow processes. For example, the DOS command
tree. If you aren't familiar with this command, run a DOS/command
prompt, switch to the root directory (C:\ ) and type
tree. Examine the wondrous output.

We can open a process, and pipe data in via a filehandle in exactly the same
way you would read a file. The code below is exactly the same as opening a
filehandle on a file, with two exceptions:

We use an external command, not a filename. That's the process
name, in this case, tree.

As soon as $. hits 10 we shut the process
off by exiting the loop. Easy.

Except, maybe it won't. What if this was a long program, and you forgot
about that particular line of code which exits the loop? Suppose that $.
somehow went from 9 to 11, or was assigned to? It would never reach 10.
So, to be safe

exit your loops in a paranoid manner, unless you really mean only
to exit when at line ten. For maximum safety, maybe you should create your own
counter variable because $. is a global variable. I'm not
necessarily advocating doing any of the above, but I am suggested these things
are considered.

You might notice the presence of a new keyword -
printf . It works like
print , but formats the string before
printing. The formatting is controlled by such parameters as
%3s , which means "pad out to a total of three
spaces". After the doublequoted string comes whatever you want to be printed in
the format specified. Some examples follow. Just uncomment each line in turn to
see what it does. There is a lot of new stuff below, but try and work out what
is happening. An explanation follows after the code.

$windir=$ENV{'WINDIR'}; # yes, you can access the environment variables !
$x=0;
opendir WDIR, "$windir" or die "Can't open $windir !!! Panic : $!";
while ($file= readdir WDIR) {
next if $file=~/^\./; # try commenting this line to see why it is there
$age= -M "$windir/$file"; # -M returns the age in days
$age=~s/(\d*\.\d{3}).*/$1/; # hmmmmm
#### %4.4d - must take up 4 columns, and pad with 0s to make up space
#### and minimum width is also 4
#### %10s - must take up 10 columns, pad with spaces
# printf "%4.4d %10s %45s \n", $x, $age, $file;
#### %-10s - left justify
# printf "%4.4d %-10s %-45s \n", $x, $age, $file;
#### %10.3 - use 10 columns, pad with 0s if less than 3 columns used
# printf "%4.4d %10.3d %45s \n", $x, $age, $file;
$x++;
last if $x==15; # we don't want to go through all the files :-)
}

There are some intentionally new functions there. When you start hacking
Perl (actually, you already started if you have worked through this far) you'll
see a lot of example code. Try and understand the above, then read the
explanation below.

Firstly, all environment variables can be accessed and set via Perl. They
are in the %ENV hash. If you aren't sure what
environment variables are, refer to your friendly Microsoft documentation or
books. The best known environment variable is path, and you can
see its value and that of all other environment variables by simply typing
set at your command prompt.

The regex /^\./ bounces out invalid entries before
we bother do any processing on them. Good programming practice. What it matches
is "anything that begins with '.'". The caret anchors the match to the
beginning of the string, and as . is a metacharacter
it has to be escaped.

Perl has several tests to apply on files. The -M
test returns the age in days. See the documentation for similar tests.
Note that the calls to readdir return just the file,
not the complete pathname. As you were careful to use a variable for the
directory to be opened rather than hardcoding it (horrors) it is no trouble to
glue it together by using doublequotes.

Try commenting out $age=~s/(\d*\.\d{3}).*/$1/ and
note the size of $age . It could do with a trim.
Just for regex practice, we make it a little smaller. What the regex does is:

start capturing with (

look for 0 or more digits \d*

then a . (escaped)

followed by three digits \d{3}

and that's all we want to capture so the parens are closed. )

Finally, everything else in the string is matched .* where . is
any character (almost) and * 0 or more. This
is pretty much guaranteed to match to the end of the line

Having matched the entire string (and put part of it into
$1 by using parens) we simply replace the
string with what we have matched.

Easy !

Mention should also be made of sprintf , which is
exactly like printf except it doesn't print. You just
use it to format strings, which you can do something with later. For example :

Quote execute

Anything within qx( ) is executed, and
duly variable interpolated. This sample also demonstrated
qw which is 'quote words', so the elements of
@opts are delimited by word boundaries, not the usual commas.
You can also use for instead of
foreach if you want to save typing four character
for the sake of legibility.

You may have noticed that system outputs the
result of the command to the screen whereas qx does
not. Each to its own.

Oneliners

A short example

You'll have noticed Perl packs a lot of power into a small amount of code.
You can feed Perl code directly on the command line. This is known as a
oneliner, for obvious reasons. An example:

perl -e"for (55..75) { print chr($_) }"

The -e switch tells Perl that a command is
following. The command must be enclosed in doublequotes, not singles as on
Unix. The command itself in this case simply prints the ASCII code for the
number 55 to 75 inclusive.

File access

This is a simple find routine. As it uses a regex, it is infinitely superior
to NT's findstr :

perl -e"while (<>) {print if /^[bv]/i}" shop.txt

Remember, the while (<>) construct will open
whatever is in @ARGV . In this case, we have supplied
shop.txt so it is opened and we print lines that begin with either
'b' or 'v'.

That can be made shorter. Run perl -h and you'll
see a whole list of switches. The one we'll use now is
-n
, which puts a while (<>)
{ } loop around whatever code you
supply with -e . So:

perl -ne"print if /^[bv]/i" shop.txt

which does exactly the same as the previous program, but uses the
-n switch to put a while
(<>) loop around whatever other commands are supplied.

A slightly more sophisticated version:

perl -ne"printf \"$ARGV : %3s : $_\",$. if /^[bv]/i" shop.txt

which demonstrates that doublequotes must be escaped.

Modifying files with a oneliner and $^I

If you don't remember $^I then please review the
section on Files before proceeding. When you're ready, copy shop.txt
to shop2.txt .

perl -i.bk -ne"printf \"%4s : $_\",$." shop2.txt

The -i switch primes the inplace edit operator. We
still need -n .

If you had a typical quoted email message such as:

>> this is what was said
>> blah blah
> blaaaaahhh
The new text

and you wanted to remove the >, then:

perl -i.bk -pe"s/^>+ ?//" email.txt

does the trick. Regex recap -- the caret matches what follows to the
beginning of the string, the + means one or more (no, we do not
use * which means 0 or more), then we will match one space with
\s , but it is not necessary for the space to be there for the
match to be successful, hence ? .

What is new in terms of oneliners is the use of -p
, which does exactly the same thing as -n
except that it adds a print statement too. In
case you were wondering why the previous example used
-n and this one uses -p --
the previous example uses prints data with
printf, whereas this example doesn't have an explicit
print statement so we provide one with -p .

Some other useful oneliners -- a calculator and a ASCII number lookup:

perl -e"print 50/200+2"
perl -e"for (50..90) { print chr($_) }"

There are plenty more oneliners, and they are an essential part of any
sysadmin's toolbox. The two examples below are functionally equivalent but the
lower one is perhaps a little more readable:

Whatever follows qq is used as a delimiter,
instead of having to escape the backslash. I learnt this from the
Perl-Win32-Users mailing list (see top) - I think it was Lennart Borgman who
pointed it out. He also mentioned that you don't need the closing doublequote.
Saves a little typing.

Subroutines and Parameters

In Perl, subroutines are functions are subroutines. If you like, a
subroutine is a user defined function. It's a bit like calling a script a
program, or a program a script. For the purposes of this tutorial we'll refer
to functions as subroutines, except when we call them functions. Hope that's
made the point.

For the purposes of this section we will develop a small program which, by
the end, will demonstrate how subroutines work. It also serves to demonstrate
how many programs are built, namely a little at a time, in manageable sections.
At least, that method works for me.

The chosen theme is gliding. That's aeroplanes without engines. A subject
close to every glider pilot's heart is how far they can fly from the altitude
they are at. Our program will calculate this. To make it easy we'll assume the
air is perfectly calm. Wind would be a complication we don't need, especially
when in a crowded lift.

What we need in order to calculate the distance we can fly is:

How high we are (in feet)

How many metres we travel forward for every metre we drop. This is
the glide ratio, for example 24:1 would mean travelling 24 metres
forward for every 1 metre of height lost.

Obviously input is needed. We can either prompt the user or grab the input
from the command line. The latter is easier so we'll just look at
@ARGV for the command line parameters. Like so:

($height,$angle)=@ARGV; # @ARGV is the command line parameters
$distance=$height*$angle; # an easy calculation
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

The above should be executed thus:

perl yourscript.pl 5000 24

or whatever your script is called, with whatever parameters you choose to
use. I'm a poet and I don't even know it.

That works. What about a slight variation? The pilot does have some control
over the glide ratio, for example he can fly faster but at a penalty of a
lesser glide ratio. So we should perhaps give a couple of options either side
of the given parameters:

($height,$angle)=@ARGV;
$distance=$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";
$angle++; # add 1 to $angle
$distance=$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";
$angle-=2; # subtract 2 from $angle so it is 1 less than the original
$distance=$height*$angle;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";

That's cumbersome code. We repeat exactly the same statement. This wastes
space, and if we want to change it there are three changes to be made. A better
option is to put it into a subroutine:

This is a basic subroutine, and you could stop here and have learnt a very
useful technique for programming. Now, when changes are made they are made in
one place. Less work, less chances of errors. Improvements can always be made.
For example, pilots outside Eastern Europe generally measure height in feet,
and glider pilots are usually concerned with how many kilometres they travel
over the ground. So we can adapt our program to accept height in feet and
output the distance in kilometres:

($height,$angle)=@ARGV;
$height/=3.2; # divide feet by 3.2 to get metres
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";
$angle++;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";
$angle-=2;
&howfar;
print "With a glide ratio of $angle:1 you can fly $distance from $height\n";
sub howfar {
$distance=$height*$angle;
}

When you run this you'll probably get a result which involves a fair few
digits after the decimal point. This is messy, and we can fix this by the
int function, which in Perl and most other languages
returns a number as an integer, ie without those irritating numbers after the
decimal point.

You might have also noticed a small bit of Bad Programming Practice slipped
into the last example. It was the evil Constant, the '3.2' used to convert feet
to metres. Why, I don't hear you ask, is this bad? Surely the conversion will
never change?

It won't change, but our use of it might. We may decide that it should be
3.208 instead of 3.2. We may decide to convert from feet to nautical miles
instead. You don't know what could happen. Therefore, code with flexibility in
mind and that means avoiding constants.

The new improved version with int and constant
removed:

($height,$ratio)=@ARGV;
$cnv1=3.2; # now it is a variable. Could easily be a cmd line
# parameter too. We have the flexibility.
$height =int($height/$cnv1); # divide feet by 3.2 to get metres
&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";
$ratio++;
&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";
$ratio-=2;
&howfar;
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";
sub howfar {
$distance=int($height*$ratio);
}

We could of course build the print statement into
the subroutine, but I usually separate output presentation from the
calculation. Again, that means it is easier to modify later on.

Something else we can improve about this code is the use of the
$ratio variable. We are having to keep track of what
we do to it -- first add one, then subtract two in order to subtract one from
the original input. In this case it is fairly easy, but with a complex program
it can be difficult, and you don't want to be creating lots of variables just
to track one input, for example $ratio1 , $ratio2
etc.

Parameters

One solution is to pass the subroutine parameters. In the best tradition of
American columnists, who seem to have a particular affection for this phrase,
'Here's how:'

($height,$ratio)=@ARGV;
$cnv1=3.2;
&howfar($height,$ratio);
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";
&howfar($height,$ratio+1);
print "With a glide ratio of ",$ratio+1,":1 you can fly $distance from $height\n";
&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";
sub howfar {
print "The parameters passed to this subroutine are @_\n";
($ht,$rt)=@_;
$ht =int($ht/$cnv1);
$distance=int($ht*$rt);
}

Quite a few things have changed here. Firstly, the subroutine is being
called with parameters. These are a comma-delimited list in parens after the
subroutine call. The two parameters are $height and
$ratio.

The parameters end up in the subroutine as the @_
array. Being an array, they are in the same order as passed. All the
usual array operations work. All we will do is assign the contents of the
array to two variables.

We have also moved the conversion function into the subroutine, because we
want to put all the code for generating the distance into one place.

Namespaces

We cannot use the variable names $height and $ratio
because we modify them in the subroutine and that will affect the main
program. So we choose new ones to do the operation on. Finally, a small change
is made to the print output.

This approach works well enough for our small program here. For larger
programs, having to think of new variable names all the time is difficult. It
would be even more difficult if different programmers were working on different
sections of the program. It would be impossible if a program were written, then
an extension created by another person somewhere else, and that same extension
had to be used by many people in many different programs. Obviously, the risk
of using the same variable name is too great. There are only so many logical
names out there.

There is a solution. Imagine you own a house with two gardens. You have two
identical dogs, one in the front garden, one in the back garden. Bear with me,
this is relevant. Both dogs are called Rover, because their owner lacks
imagination.

When you go to the front garden and call 'Rover!!!' or open a can of dog
food, the dog in the front garden comes running. Similarly, you go to the back
garden, call your dog and the other dog bounces up to you.

You have two dogs, both called Rover, and you can change either one of them.
Wash one, neuter the other -- it doesn't matter, but both are dogs and both
have the same name. Changes to one won't affect the other. You don't get them
confused because they are in different places, in two different
namespaces.

Variable Scope

To bring things back to Perl, a short diversion is necessary to illustrate
the point with actual Perl code instead of canine metaphors:

$name='Rover';
$pet ='dog';
$age =3;
print "$name the $pet is aged $age\n";
{
my $age =4; # run this again, but comment this line out
my $name='Spot'; # and this one
$pet ='cat';
print "$name the $pet is aged $age\n";
}

print "$name the $pet is aged $age\n"; This is pretty straightforward until
we get to the { . This marks the start of a
block. One feature of a block is that it can have its own namespace.
Variables declared, in other words initialised, within that block are just
normal variables, unless they are declared with
my .

When variables are declared with my they are
visible inside the block only. Also, any variable which has the same name
outside the block is ignored. Points to note from the example above:

The two my variables appear to overwrite
the variables of the same name from outside the block.

The two original variables aren't really overwritten because as we
prove after the block has ended, they haven't been touched.

The variable $pet is accessible inside and outside the
block as usual. Of course, if we declare it with
my then things will change.

my Variables

So there we have it. Namespaces. They work for all the other types of
variable too, like arrays and hashes. This is how you can write code and not
care about what other people use for variable names -- you just declare
everything with my and have your own private party.
Our original program about gliding can be improved now:

The only change is that the parameters to the subroutine, ie the contents of
the array @_ , are declared with
my . This means they are now only visible within
that block. The block happens to also be a subroutine. Outside of the block,
the original variables are still accessible. At this point I'll introduce the
technical term, which is lexical scoping. That means the variable is
confined to the block -- it is only visible within the block.

We still have to be concerned with what variables we use inside the
subroutine. The variable $distance is created in the subroutine
and used outside of it. With larger programs this will cause exactly the same
problem as before -- you have to be careful that the subroutine variables you
use are the same ones as outside the subroutine. For all the same reasons as
before, like two different people working on the code and use of custom
extensions to Perl, that can be difficult.

The obvious solution is to declare $distance with
my , and thus lexically scope it. If we do this, then
how do we get the result of the subroutine? Like so:

($height,$ratio)=@ARGV;
$cnv1=3.2;
$distance=&howfar($height,$ratio); # run this again and delete '$distance='
print "With a glide ratio of $ratio:1 you can fly $distance from $height\n";
$distance=&howfar($height,$ratio+1);
print "With a glide ratio of ",$ratio+1,":1 you can fly $distance from $height\n";
$distance=&howfar($height,$ratio-1);
print "With a glide ratio of ",$ratio-1,":1 you can fly $distance from $height\n";
sub howfar {
my ($height,$ratio)=@_;
my $distance;
$height =int($height/$cnv1);
$distance=int($height*$ratio/1000); # output result in kilometres not metres
}

First change -- $distance is declared with my . Secondly, the result of the subroutine is assigned to
a variable, which is also named $distance. However, it is a
$distance in a different namespace. Remember the two gardens. You
may wish to delete the $distance= from the first assignment and
re-run the code. The only other change is one to change the output from meters
to kilometres.

We have now achieved a sort of Black Box effect, where the subroutine is
given input and creates output. We pass the subroutine two numbers, which may
or may not be variables. We assign the output of the subroutine to a variable.
We care not what goes on inside the subroutine, what variables it uses or
what magic it performs. This is how subroutines should operate. The
only exception is the variable $cnv1. This is declared in the main
body of the program but also used in the subroutine. This has been done in case
we need to use the variable elsewhere. In larger programs it would be a good
idea to pass it to subroutines along with the other parameters too.

Multiple Returns

That's all the major learning out the way with. The next step is relatively
easy, but we need to add new functionality to the program in order to
demonstrate it. What we will do is work out how long it will take the glider
pilot to fly the distance. For this calculation, we need to know his airspeed.
That can be a third parameter. The actual calculation will be part of
howfar. An easy change:

This doesn't work correctly. First, the changes. The result from
howfar is now assigned to two variables. Subroutines return a
list, and so assigning to some scalar variables between parens separated by
commas will work. This is exactly the same as reading the command line
arguments from @ARGV .

We are also passing a new parameter, $airspeed. There is a
another conversion and a one-line calculation to provide the amount of minutes
it will take to fly $distance.

If you look carefully, you can perhaps work out what the problem is. There
was a clue in the Regex section, when /e was
explained.

The problem is that Perl returns the result of the last expression
evaluated. In this case, the last expression is the one calculating
$time, so the value $time is returned, and it is the
only value returned. Therefore, the value of $time is assigned to
$distance, and $distance itself doesn't actually get
a value at all.

Re-run the program but this time uncomment the line in the subroutine which
prints $distance and $time. You'll noticed the value
is 1, which means that the expression was successful. Perl is faithfully
returning the value of the last expression evaluated.

This is all well and good, but not what we need. What is required is a
method of telling Perl what needs to be returned, rather than what Perl thinks
would be a good idea:

A simple fix. Now, we tell Perl what to return, with the aptly named
return function. With this function we have complete
control over what is returned and when. It is quite usual to use
if statements to control different return values, but
we won't bother with that here.

There is a subtle flaw in the program above. It is not backwards compatible
with the old method of calling the subroutine. Run this:

A division by 0 results third time around. This is of course because
$airspeed doesn't exist, so of course it will effectively be 0.
Making your subroutines backwards compatible is important in large programs, or
if you are writing an add-in module for other people to use. You can't expect
everyone to retrofit additional parameters to their subroutine calls just
because you decided to be a bit creative one day.

Here we just test the $airspeed to ensure we won't be doing any
divisions by 0. It also affects what we return. There is also a new
print statement, which shows that you don't need to
assign to intermediate variables, or even pass variables as parameters.
Constants, evil things that they are, work just as well. I already mentioned
this, but a demonstration doesn't hurt. Unless you work for an electric chair
manufacturer.

The astute reader.....:-) Every time I read that I wonder what I've missed.
Usually something obscure which the author knows nobody will ever notice, but
likes to belittle the reader. No exception here! Anyway, you may be wondering
why this would not have sufficed instead of the if
statement:

After all, the first item returned is $distance, so therefore
it should be the first one assigned via:

$distance=&howfar($height,$ratio-1);

and $time should just disappear into the bit bucket.

The answer lies with scalars and lists. We are returning a list, but
assigning it to a scalar. What happens when you do that? The scalar takes on
the last value of the list. The last value of the list being returned
is of course $time, which is has been declared but not otherwise
touched. Therefore, it is nothing and appears as such on the printed statement.
A small program to demonstrate that point:

$word=&wordfunc("Greetings");
print "The word is $word\n";
(@words)=&wordfunc("Bonjour");
print "The words are @words\n";
sub wordfunc {
my $word=shift; # when in a subroutine, shifts @_ if no target specified
my @words; # how to my an array
@words=split //,$word; # splits on the nothings between each letter
($first,$last)=($words[0],$words[$#words]); # see section on Arrays if required
return ($first,$last); # Returns just the first and last
}

As you can see, the first call prints the letter 's', which is the last
element of the list that is returned. You could of course use a list consisting
of just one element:

($word)=&wordfunc("Greetings");

Now we are assigning a list to a list, so perl starts at the first element
and keeps assigning till it runs out of elements. The parens turns a lonely
scalar into an element of a list. You might consider always assigning the
results of subroutines this way, as you never know when the subroutine might
change. I know I've just evangelised about how subroutines shouldn't change,
but if you take care and the subroutine writer takes care, there definitely
won't be any problems!

That's about it for good old my . There is a lot
more to learn about it but that's enough to get started. You now know a little
about variable visibility, and I don't mean changeable weather.

Local

There is one more function that I'd like to draw to your attention, and
we'll launch straight into the demonstration:

The special variable $, defines what Perl should
print in between lists it is given. By default, it is nothing. So the first two
prints should have no spaces between the words. Then we assign '_' to
$, so the next prints have underscores between the
words.

If we want to use a different value for $, in the
change subroutine, and not disturb the main value, we have a
little problem. This problem cannot be solved by my
because global variables like $, cannot at
this time be lexically scoped. So, we could manually do it:

You can try it with my instead but it
won't work. I'm sure you'll try it anyway, I know you learn things the hard way
otherwise you a) wouldn't be programming computers and b) wouldn't be using
this tutorial to do it.

The local function works in a similar way to
my , but assigns temporary values to global
variables. The my function creates new variables
that have the same name. The distinction is important, but the reasons require
perl proficiency beyond the scope of this humble tutorial. In practice, the
difference is:

Returning arrays

So that's the end of subroutines and parameters. Would you believe I have
only scratched the surface? There are closures, prototypes, autoloading and
references to learn. Not, however, in this tutorial. At least not yet. I'll
finish with one last demonstration. You may have noticed that Perl returns one
long list from subroutines. This is fine, but suppose you want two separate
lists, for example two arrays? This is one way to do it:

There is a lot going on there. It should be clear up until the
return statement. As we know, Perl only returns a
single list. So, we make Perl return a list of the arrays it has just created.
Not the actual arrays themselves, but references to the arrays. A bit like a
shopping list is a just a bit of paper, not the actual goods itself. The
reference is created by use of the \ backslash.

Having returned two array references they are assigned to scalar variables.
If you uncomment the second print line you'll see two references to arrays.

The next problem is how to dereference the references, or access the arrays.
The construct @$xxx does that for us. I know I said I
wouldn't cover references, and I haven't -- that is just a useful trick.

This little section is not designed as a complete guide, it is just a taster
of things to come. Perl is immensely powerful. If you think something can't be
done, the problem is likely to be it is beyond your ability, not that of Perl.

Modules

An introduction

Subroutines are oft-used pieces of code. They exist so you can re-use the
code and not have to constantly rewrite it.

A module is, in principle, similar to a subroutine. It is also an oft-used
piece of code. The difference is that modules don't live in your program, they
are their own separate script outside your code. For example, you might write a
routine to send email. You could then use this code in ten, a hundred, a
thousand different programs just by referencing the original program.

As you would expect, the basic Perl package includes a large number of
modules. These have been written by people who had a need for the code, made it
a module and released it into the big wide world. Many of these modules have
been debugged, improved and documented by yet more people. To quote the
OpenSource mantra, all bugs are shallow under the scrutiny of every programmer.

Aside from the many modules included with Perl there are hundreds more
available on CPAN, the Comprehensive Perl Archive Network. Refer to your
documentation for details.

File::Find -- using a module

An example of a module included with Perl is File::Find. There
are several modules under the File::Find section, such as
File::Basetree, File::Compare and
File::Stat.

The first line is the most important. The use
function loads the File::Find module. Now, all the power
and functionality of File::Find is available for use. Such as the
find function. This accepts two basic parameters:

The name of a subroutine, usually wanted which defines
what you want to do with the list of files being returned. The filename
will be in $_.

A list of directories to be searched. Subdirectories will also be
searched.

The subroutine wanted simply prints the directory the file was
found in if the filename begins with a,b,c or d. Make your own regex to suit.
The line $File::Find::dir means the $dir variable in
the module $File::Find. This is explained further in the next
section.

Note -- the \&wanted parameter is a reference to a
subroutine. Essentially, this means that the code in File::Find
knows where to find the &wanted subroutine. It is basically
like shortcuts under Windows 9x and NT4, instead of actual files (but the UNIX
Perl people would slaughter me for that, so be quiet).

ChangeNotify

Another example is Win32::ChangeNotify. As you might expect
there are a number of Win32-specific modules, and ChangeNotify is one of them.
It waits until a something changes in a directory, then acts. What it waits for
and what it does are up to you, for example:

Again, the module is incorporated into the program with use . An object referred to by the variable
$notify is created. The parameters passed are the path to be
watched, whether we want to watch subtrees, and what sort of events we want to
be notified about, in this case only filename changes.

Then, we enter a loop which continues while 1 is true -- which will be
forever.

The program pauses when the wait method of the
$notify notify object is called. Only when there is a change to
the directory, then the rest of the subroutine completes, launching the
browser. We have to reset the $notify object.

There is some pretty frightening stuff about objects in the explanation. But
you don't actually need to understand anything about objects. Just read the
documentation, and experiment.

You can use as many modules as you like in one program. As they are all
written with carefully scoped variables you need not worry about programmers
using the same variable names in different modules. Now you *really* appreciate
scoping!

Your Very Own Module

You too can write your own modules. It is easy. First, we will create the
fantastic bit of code that we want to re-use everywhere. First, we'll write a
normal Perl program:

The bit that has been added is the 1 at the bottom. Why? Perl
requires that all modules return true. We know that a subroutine always returns
the value of the last expression evaluated. As 1 evaluates to true, that'll do.

You need to save this as logon.pm in your newly created
directory under lib. The pm stands for Perl Module.

That's it. A module created. To use, just make a normal Perl script such as:

use RMP::logon;
$name=shift;
print logname($name);

and hey presto! Module power is yours!

You don't have to create your own subdirectory within lib but I
would advise it for the sake of neatness. And as you might expect, there is a
lot more to learn about modules but this is supposed to be a basic tutorial, so
that's enough for the time being.

Bondage and Discipline

Perl is a very flexible language. It is designed as a hacking tool, for
quick sysadmin magic. It can do quite a bit more besides, but being small and
powerful is a core Perl feature. Earlier on I said Perl is not a bondage and
discipline language -- to qualify that, it doesn't have to be. However,
there is a time and place for everything.

For tiny scripts you don't want to be declaring variables, typecasting and
generally spending more time obeying rules than you do getting the job done.
So, Perl doesn't force you to do all of these good programming practices.
However, not all your programs are going to be five-minute hacks. Some will be
pretty large. Therefore, some Discipline is in order.

It doesn't do much. Just prints out the first argument supplied, and
demonstrates the uninspiring sleep function. The
program itself is full of holes, and it is only a few lines. How many errors
can you spot? Try and count them. When you are finished, execute the program
with error-checks enabled:

perl -w script.pl hello

Perl finds quite a few errors. The -w switch
finds, among other heinous sins:

Variables used only once. In the example, $input2 is
used only once. It is a useless variable.

Filehandles used incorrectly. With print OUY I'm
trying to print to a non-existent filehandle. With -w an alarm is raised, as it would be if I
tried to write to a filehandle which was read-only.

Use of uninitialised variables. The variable $delay is
uninitialised if 'sleep' is not the first parameter. Making variables
spring into the air on demand is not good programming practice. They
should be defined carefully first.

So, generally, -w is a Good Thing. It forces you
to write cleaner code. So use it, but don't be afraid not to for very short
programs.

Shebang

You know that you can turn warnings on with -w on
the command line. You can also turn them on within the script itself. For that
matter, you can give perl any command line option within the script itself. For
example:

It may be more convenient for you to put the flag inside the script. It
doesn't have to be just -w , it can be any argument
Perl supports. Run

perl -h

for a full list.

The first line, #!perl -w is the shebang line. This is derived
from UNIX, where Perl was first developed. UNIX systems make a script
executable by changing an attribute. The operating system then loads the file
and works out how to execute it -- in this case by looking at the first line,
then loading the perl interpreter. Windows systems know that all files with a
certain extension must be passed to a certain program for execution, eg all
.bat files are passed to command.com, and all
.xls files are passed to Excel. The point of all this being that
you don't need a shebang line, but it doesn't hurt.

use strict;

So what's strict and how do you use it? The module strict
restricts 'unsafe constructs', according to the perldocs. The
strict module is a pragma, which is a hint that must be
obeyed. Like when your girlfriend says 'oh, that ring is *far* too expensive'.

There is no need to be frightened about unsafe code if you don't mind
endless hours of debugging unstructured programs. When you enable the
strict module, the three things that Perl becomes strict about
are:

Variables 'vars'

References 'refs'

Subroutines 'subs'

This tutorial doesn't presently cover references (and let's hope I remember
to remove this sentence if I do cover it in later versions) so we won't worry
about refs.

Strict variables are useful. Essentially, this means that all variables must
be declared, that is defined before use rather than springing into existence as
required. Furthermore, each variable must be defined with
my or fully qualified. This is an example of a
program that is not strict, and should be executed something like this:

perl script.pl "Alain James Smith";

where the "" enclose the string as a single parameter as opposed to three
separate space-delimited parameters.

These warnings mean Perl is not exactly clear about what the scope of
variables is. If Perl is not clear, you might not be either. So you need to be
explicit about your variables, which means either declaring them with
my so they are restricted to the current block, or
referring to them with their fully qualified name. An example, using both
methods:

The my variables in the subroutine are nothing
new. The my variables outside the subroutine are. If
you think about it, the main program itself is also a kind of block, and
therefore variables can be lexically scoped to be visible only within the
block.

The other interesting bit is the $MAIN::name business. This, as
you might expect, is the fully qualified name of the variable. The first part
is the package name, in this case MAIN. The second part is the
actual variable name. Personally, I've never needed to refer to a variable this
way. I'm not saying you'll never use the syntax, but I would suggest that
knowing this is not on a perl students Top 10 list of Things to Master.

The important thing about use strict is that it
does enforce more discipline than you have been used to, and for all but the
smallest of programs, that is most definitely a Good Thing.

Debugging

Sooner or later you'll need to do some fairly hairy debugging. It will be
later if you are using strict , -w
and writing your subroutines properly, but the moment will come.

When it does you'll be poring over code, probably late at night, wondering
where the hell the problem is. Some techniques I find useful are

Print your variables and other information out at frequent
intervals.

Split difficult components of the program out into small, throwaway
scripts. Get these working, then copy the results back into the main
program.

# Comment frequently.

Eventually, you'll be stuck. Such is the price of progress. In this case,
Perl's own debugger can be invaluable. Run this code as normal first:

Type s for a single step and press enter. The code
$name=shift; will be executed, and perl waits for your next
command. Keep inputting s until the program terminates.

This by itself is useful as you see the subroutine flow, but if you enter
h for help you'll see a bewildering range of debug options. I
won't detail them all here, but some of the ones I find most useful are:

n

Executes main program, but skips subroutine calls. The subroutine
is executed, but you aren't stepped through it. Try using
n instead of s .

/xx/

Searches through program for xx

p

Prints, for example p @namebits, p $name

Enter

Pressing the Enter key (inputting a carriage return) repeats the last n or
s command.

perlcode

You can type any perl code in and it will be evaluated, and have a
effect on your program. In the example below I remove spaces
from $name. Inputs in bold:

The output is interesting. The variable $name2 has been
created, albeit with a false value. However, $name1 does not
exist. The reason is all about precedence. The or
operator has a lower precedence than ||
.

This means or looks at the entire expression on
its left hand side. In this case, that is $name1 = $list[4] . If
it is true, it gets done. If it is false, it is not and the right hand side is
evaluated, and the left hand side is ignored as if it never existed. In the
example above, once the left side is found to be false, then all the right side
evaluates is "1-Unknown" which may be true but doesn't produce any
output.

In the case of || , which has a higher precedence,
the code immediately on the left of the operator is evaluated. In this case,
that is $list[4]. This is false, so the code immediately to the
right is evaluated. But, the original code on the left which was not evaluated,
$name2 = is not forgotten. Therefore, the expression evaluated to
$name2 = "2-Unknown".

The two failure codes are both printed, but for different reasons. The first
is printed because we are assigning $ele1 a false value, so the
result of the operation is false. Therefore, the right hand side is evaluated.

The second is printed because $list[4] itself false. Yet, as
you can see, $ele2 exists. Any idea why?

The reason is that the result of print "2-Failed\n" has been
assigned to $ele2. This is successful, and therefore returns 1.

In the first example, the error message is not printed. This is because
$file is evaluating to true. However, in the second example,
or looks at the entire expression, not just what is
immediately to the left and takes action on the result of evaluating the entire
left hand side, not just the expression immediately to its left.

Now, ($name2 = $list[4]) is evaluated as a complete expression,
not just as $list[4] is evaluated as a complete expression, not
just as $list[4], so we get exactly the same result as if we used
or .

And

now for something similar. And. Logical AND operators evaluate two
expressions, and return true only if both are true. Contrast this with
OR, which returns true only of one or more of the two expressions are
true. Perl has a few AND operators.

The output here is false. It is clear that $list[0] does not
equal x . As AND statements can only return true if both
expressions being evaluated are true, then as the first statement is false this
is an obvious non-starter and perl decides it need not continue to the second
statement. Entirely sensible.

The second type of AND statement is & . This
is similar to && . See if you can work out
what the difference is using this example:

The difference is that the second part of the expression is evaluated no
matter what the result of the first part is. Despite the fact that the AND
statement cannot possibly return true, perl goes ahead and evaluates the second
part of the statement anyway, hence $list[2] ends up as d
.

The third AND which we will look at is and . This
behaves in the same way as && but is lower
precedence. Therefore, all the guidelines about ||
and or apply.

Other Logical Operators

Perl has not , which works like
! except for low precedence. If you are wondering
where you have seen ! before, what about:

$x !~/match/;
if ($t != 5) {

as two examples. There is also Exclusive OR, or XOR. This means:

If one expression is true, XOR returns true

If both expressions are false, XOR returns false

If both expressions are true, XOR returns false (the crucial difference from OR)

This needs an example. Jane and Sonia are two known troublemakers, with a
reputation for throwing good beer around, going topless at inappropriate
moments and singing out of tune to the karaoke machine. You only want to let
one of them into your party, and instead of a big muscle-bound bouncer you have
this perl script on the door:

Well, the script is not perfect as a doorman, as all Jane and Sonia have to
do is type their names in lowercase, but hopefully it demonstrated xor .

One thing to beware of is:

$_=shift;
print "OK\n" unless not(!/r/i || /o/i & /p/ or /q/);

over-complication, and believe me the above is not as complicated as it
could be. Take the time to understand what you want to do. Perl provides a
plethora of logical operands so you really don't have any excuse for not
writing legible code. The above can be written a lot more concisely and
clearly. As well as a lot more obscurely :-)

@ARGV

Last words

I hope you have enjoyed this tutorial and learnt something from it. I would
appreciate an email letting me know how it could be improved. What you have
learnt is just a fraction of Perl's functionality, but you'll find skills like
regexes can be applied in many other places than Perl.

Good luck.

--
Robert

Thanks to...

Everyone that helped in the development of this tutorial. I do read all the
feedback emails, but don't always action them the same year. What you have just
read is better because of the people below. They fix the bugs, scream when they
don't understand and I rewrite whole sections. Documents like this are written
by the authors, but polished by the readers.

The roll of honour is, in a semi-chronological order:

Mark Miller for his long email suggesting improvements and
highlighting typos. I cringed when I realised what I'd let through :-(

Roland to whom I am eternally grateful for sending in many
typo reports, and pointing out where he didn't understand an explanation.

Katya de Vries for finding HTML errors and problems with the
example code.

Steven Ham for being picky about spelling errors. Good going,
considering English is his second language !

Carlos Jaramillo Uribe for pointing out where I could have
explained postincrements and regex a little better and for pointing out a
typo or two.

Sergio Polini who brought an interesting aspect Perl's
behaviour with arrays to my attention, and helping to improve parts of
the Regex section.

Leo Durocher for telling me he had trouble with the regex
section. If he did, I'm sure many others did too.

Paul Trafford for solving the Them/Us problem I was too lazy
to bother with, and doing it so elegantly.

Eric Smith who was one of many people who made me a
table of contents rather than just tell me I should include one. I never
used any of them, and the one you see now is auto-generated by a program
written in Java (only kidding, its not auto-generated :-)

Mike Conkin who said he didn't understand $^I. Good point. I'd
forgotten to explain it at all. Mike went to list several other areas I
could do with improving in one of the most amusing and useful missives
I've had on the tutorial. Thanks.

Vasile Calamuti who
picked up on my use of join before I'd explained it, and a
couple more oversights.

Didier Owono for pointing out my original explanation of
/ee didn't make sense. Hopefully the second version does.

Keen Meng Lew and Ever Olano who, independently (I
assume) picked up exactly the same two typos. Which are now fixed.

Anna in Ohio who sent a polite email with a few errors she
picked up on.

Ken Teuchler for knowing the difference
between = and =~, and for his long list of
improvements which varied from grammar errors to style suggestions to
oversights. A huge help.

cookie, firstly for his Win9x experiments and error checks
about my explanation of scoping. Secondly for his many subsequent emails
pointing out minor problems which elevated him to status of #1 bugfixer.
Appreciated.

Ginny for spotting an errant ; which in the best tradition of
teachers I have changed into an exercise for debugging, of course I meant
to leave it out in the first place. I should also point out that a major
motivation for me do put the effort into this tutorial is the
appreciation of the userbase, and Ginny sent me a particularly
motivational missive.

Jeffery Jackson for noticing my error about 0-based arrays.

Kevin Haskins for pointing out Notepad's limitations and
an equality issue.

Pat McCarthy for picking up a small typo.

Bob Kauten who noticed that I hadn't explained the range
operator properly. I blame....well, me really.

Ayhan Tuncer for picking up a mistake where I'd carelessly cut
and pasted pasted pasted. The next day Michael Kersey found the
exact same error, before I'd had a chance to fix it. Ayhan also found
quite a few more errors after that one during her work on the Turkish
translation.

Ray Price who was another one who found the above error, and a
couple more typos as well.

Henry Vermeulen, a Dutch chap who noticed I'd mispelled
Heineken. Nothing to do with Perl, just one of my outlandish examples.

Everyone that has ever worked on perl, all the hackers on the
perl-win32* mailing lists, ActiveState and the netizens of
clpm.

This tutorial is copyright 1997, 1998, 1999 by
Robert Pepper. Reproduction in whole or part is prohibited.
Please contact me if you want to use this information anywhere.
Thank you.