The first steps

Instead of greeting the world, you would like to greet whatever is
passed in on the command line. To do that, you should know that the
command-line arguments are stored in the @ARGV array. To get
at the first element, we could index it using $ARGV[0]. The
usual idiom, however, is to remove the element from the array and then
deal with it and/or throw it way. shift is used to get at the first
element of an array, thus we would write something like:

This is will be familiar to people used to Unix shell programming. This
is all well and good; the script's behaviour is controlled by
the parameters appearing on the command line. There is a problem,
however, in that now a parameter must be supplied, for if it is
omitted, the script will cough up a Use of uninitialized value in
concatenation error, and all that will be printed is "Hello, ".

Using default values

It would be nice to be able to provide the script with a sensible
default value, so that should no parameter be supplied, it will be able
to continue and do something reasonable. For this we can use the ||
operator:

What this does is assign $thing the value of the first
parameter on the command line or 'world', should the command line be
empty. Of course, sometimes an empty command line is not reasonable, in
which case the best thing is to stop the script and print out a message
so that the user can take corrective action:

... will not do the right thing if you pass 0 to the script.
It boils down to what Perl considers truth. It just so
happens that 0 is treated as false, so the left hand side of the
|| operator as a whole is false, and thus $thing
winds up being assigned the value of 'default'.

There is a simple two step process way around this. First assign what
comes out of shift. Then, depending on whether $thing
is defined (not whether it is true or false, thus side-stepping the
issue), use the wonderful-but-cryptic ||= to possibly
assign to $thing, based on the outcome of the conditional.

The above code is difficult to understand. It does, however, work
according to spec. The main problem is that it will fail to consider
-t or - anything as a switch, and complain that the
switch has no effect on the program. For example, consider what greet
-h harry will print out. Even worse, the code will become horribly
obfuscated should the script have to deal with two, three or more command
line switches.

Obviously, a better approach is called required. Above all, it would
be nice not to have to write it oneself, but rather use something that
exists already. That must mean that packages exist to do what we need.

What we are looking for is something that will look for switch-like
instances on the command line, set some corresponding Perl variables and
above all remove them from @ARGV so that we don't have to bother
with them.

What can perl offer?

Note the distinction between Perl the language and perl the
interpreter. As it turns out perl, the Perl interpreter can do some rudimentary
command line processing all by itself. Sometimes this is sufficient. All
you have to do is feed the interpreter the -s switch:

<aside>Do not get confused by perl's switches
and your script's switches. Remember that with a shebang line of
#! /usr/local/perl -sw and the switches -xy,
the shell actually runs /usr/local/bin/perl -sw script -xy. Perl
sees -sw script -xy. It processes the -sw, sees that 'script'
looks like a file name and opens it and starts interpreting. Your
script only sees -xy (although to a certain extent it can
detect what switches were passed to perl, such as by reading the value
of $^W).</aside>

Now we have a much smaller script that should be easier to
understand. There is, however, a small problem due to interactions with
use strict pragma. The -s functionality harks back to
before the age of lexical variables. It refers to package variables that
have to be explicitly declared in a use vars pragma when
strict is in use. This is not really a problem, except that
if the script is run with the -h switch and warnings are switched on,
the program will complete but it will spit out a Name "main::h"
used only once: possible typo. warning message.

Before turning away from -s as a viable solution, consider
the other feature that Perl provides. If the above script is run with
-g, the package variable $g is set to 1. Alternatively,
the script could be run with -g=foo, in which case instead of
being set to 1, $g would contain 'foo'. Sometimes this limited
functionality is enough to get the job done, and the fact that you don't
have to drag around an external package file can be a win in certain
circumstances.

<update date="2001/11/15"> It appears that
-s has some rather nasty side effects, which means that scripts that use it should only be used in safely controlled environments (if such a thing exists). For more information, read the thread "perl -s is evil?".</update>

getopt: the heavy artillery

More Unix culture: the traditional way to parse command line arguments
in C was through a library call named getopt or getopts,
short for get options. This has been carried over to Perl in the
form of Getopt::Std and Getopt::Long which are bundled
in the core distribution.

Getopt::Std

Getopt::Std performs command line processing and pulls
out anything that resembles a -letter switch and its value, leaving
the remaining values in @ARGV. It offers two interfaces,
getopt and getopts. You almost always want to use the
second variant. Let's see why:

Before going any further, the first thing to point out is that
Getopt::Std has been retrofitted to get around the uncomfortable
use of package variables. If you pass a reference to a hash as the
second parameter to the getopt call, it will populate the
hash, instead of using package variables, which allows the script to be
rewritten as:

This script will silently ignore a non-specified switch, which
is usually A Good Thing. There is, however, a serious bug lurking in
this code. Try to get the script to print "Goodbye, foo". It's rather
difficult to do because getopt is greedy. When it sees
a specified switch, it tries hard to assign that switch a meaningful
value, which means either the characters following the switch (as in
-gparam) or the next parameter on the command line
(as in -g param). Which means if you run the above script as
script -g foo, $arg{g} will contain 'foo', but there
will be nothing left on the command line, so $thing will be
assigned the default value of 'world'.

In order to get around this "feature", the second interface, via
getopts should be used instead. In this case, the specification
string ('g' in the above) is interpreted differently. By default, all
letters specify boolean parameters. To force a parameter to pick up
a value (i.e. to get the behaviour we so much wanted to avoid
above), a ':' (colon) is appended. Therefore, to make -g
greedy, it should be specified as 'g:'.

This means that all we have to do in the above script is to call
getopts instead of getopt and the job is done.

If you want to look at a real-life example of code that
uses Getopt::Std, you can look at a script I uploaded
here named pinger, a little tool designed
to scan a range of IP addresses via ping.

Getopt::Long

That is all well and good, but what happens when you reimplement
tarin
Perl? How do you remember what all those pesky single character switches
do in the string -cznTfoo? It's much easy to understand what's
going on with --create --gzip --norecurse --files-from foo
instead. Enter Getopt::Long.

This module lets you build up a specification that adheres to the
POSIX syntax for command line options, which generally introduces
switches with the double-dash notation. Unfortunately, this precludes
the use of single-dash switches (bikeNomad points out that this is not true. My bad for not paying closer attention
to the documentation). Even worse, you cannot include both
Getopt::Std and Getopt::Long in the same program,
as they will fight over @ARGV and the results will be... undefined.

Since I originally wrote this tutorial, I have used Getopt::Long a bit more (figured that I had to since I wrote this). Once you understand Getopt::Simple, Getopt::Long is pretty easy to pick up, and has much sophistication to offer, once you scratch below the surface.

That said, all of the processing goes on behind the scenes. You can attach a callback to deal with the processing of individual options, but this can become unwieldy. Sometimes you need more fine-grained control of the parsing of the switches, as they come in one by one.

While the following module is no longer being actively developed, it is just what you need in some instances, because it deals with parsing options only, and lets you deal with the rest. It turns the parsing inside out, and lets you act on options on the fly, and just therefore feels more cooperative. Try it, you might like it.

Getopt::Mixed

This module should cover all your command line
processing needs. It's quite simple to set up. First of all you need to
call init with a format string (akin to pack and unpack). The sets up
what command line switches are defined, and what values they can take on.
Here's a real life example hoisted from some code I have lying around:

Pretty straightforward stuff. The next step is to call nextOption
repeatedly until it fails. Once that is done, you have processed all the
switches. Unlike Getopt::Std you set your defaults beforehand. If the
switch isn't specified, the value isn't touched. Also note that just because a
switch has a mandatory argument doesn't mean that the script will abort if
the switch doesn't appear on the command line... it's not the switch itself
that is mandatory. If this is required then you test the corresponding
variable after the loop and if its value is undefined then you yank the rug
out from under the script.

as all being valid syntaxes for assigning foo to the
-j switch. Remember the last variant. It's the
easiest way of passing in a negative number on the command
line. After all, how should --offset -30 be
interpreted?

Another real-life example of code, this time using
Getopt::Mixed can be found at nugid, a
script I wrote to manage large scale modifications
of uids and gids of Unix filesystems.

Where to from here

This should be enough for 95% of your basic command line processing
needs. But everyone has a different itch to scratch, and you should be
aware that there is a boatload of getoptish packages hanging out on
CPAN, as a search will reveal. Once you have the hang of a couple it's
pretty simple to pick up another.

The most sophisticated of all, Getopt::Declare comes,
naturally enough, from the Damian.
This module has an advanced method for specifying exactly what are the
legal values that a switch may take, as well as providing poddish descriptions
so that you don't have to write sub usage { ... } that explains
how to use the program correctly.

Switch name idioms

Over the years, a number of conventions have arisen over the best letters
to assign to common operations that crop up again and again in program
design. This list attempts to codify existing practices (updates welcomed).
Use these conventions and people will find your programs easy to learn.

-a

Process everything (all).

-d

Debug mode. Print out lots of stuff.

-h

Help. Print out a brief summary of what the script does and what it expects.

-i

Input file, or include file

-l

Name of logfile

-o

Name of output file

-q

Quiet. Print out nothing.

-v

Verbose. Print out lots of stuff.

And now you know all you need to know about command line processing.
Have fun!

update: Tip o' the hat to petral for pointing out
the node on Getopt::Declare, -h and a better
Damian link. Tip o' the hat to Albannach for reminding
me about the "passing 0 on the command line" bugaboo, and
to OeufMayo regarding passing negative numbers.

That statement only applies to GNU/Linux systems, or at least systems
that use the GNU Binutils. Most commercial UNIX vendors don't have
long option support in their Binutils. Sun and HP's versions of tar don't
support long arguments, but FreeBSD and Linux support both forms of
arguments.

While the choice of which arguments your programs accept is your choice,
many people choose to follow the standard their OS vendor uses.
I think that the table showing only the short arguments is good
because short arguments is more of a standard then GNU-Style arguments,
but I think that including a link to GNU Coding Standards Option Table,
with an explaination that these are used mainly on GNU/Linux or GNU Binutils
systems would be useful.

This is because $thing could be defined but zero and we don't want to overwrite a perfectly valid zero from the command line.

As Not_a_Number points out, the syntax EXPR unless defined $thing will do nothing at all * if $thing is defined, whether it's true or false. That is, if we are executing EXPR, then $thing is guaranteed undefined, hence false; so $thing ||= 'default' is guaranteed to be the same as $thing = 'default'.

It seems reasonable to guess that what happened is that the coder originally had $thing ||= 'default' in some old code, discovered (as you mention) that it doesn't work when $thing is false-but-defined, and added the defined check without realising that it made ||= redundant.

UPDATE (the *'d statement above—sorry, I don't know how to do footnotes): On further thought, it's not quite true that EXPR unless defined $thing will do nothing if $thing is defined. Among what I suppose are many other subtle cases, if the unless is the last line in a subroutine, it'll make the subroutine return 1 when $thing is defined. For example, after

Very nice tutorial. Helped me a lot. Exspecially i didn't saw a note in the CPAN GetOpt:Std docs that said "GetOps strips of all parameters from @ARGV and leaves the rest".
But thats really important if you want to use <> to process some files specified on the command-line.
So thanx for the work =)
Greetings From Munich,
Grand Apeiron