PERLREQUICK

NAME

DESCRIPTION

This page covers the very basics of understanding, creating
and using regular expressions ('regexes') in
Perl.

The Guide

Simple word matching

The simplest regex is simply a word, or more generally, a
string of characters. A regex consisting of a word matches
any string that contains that word:

In this statement, World is a regex and the // enclosing /World/ tells perl to search a string for a match. The operator = associates the string with the regex match and produces a true value if the regex matched, or false if the regex did not match. In our case, World matches the second word in , so the expression is true. This idea has several variations.

Expressions like this are useful in

conditionals

print

The sense of the match can be reversed by using ! operator

print

The literal string in the regex can be replaced by a variable

$greeting =

If you're matching against $_, the $_ = part can be omitted

$_ =

Finally, the // default delimiters for a match can be changed to arbitrary delimiters by putting an 'm' out front:

Regexes must match a part of the string exactly in order for the statement to be true:

perl will always match at the earliest possible point in the string:

Not all characters can be used 'as is' in a match. Some characters, called metacharacters, are reserved for use in regex notation. The metacharacters are

In the last regex, the forward slash '/' is also backslashed, because it is used to delimit the regex.

Non-printable ASCII characters are
represented by escape sequences. Common examples are
t for a tab, n for a newline, and
r for a carriage return. Arbitrary bytes are
represented by octal escape sequences, e.g., 033,
or hexadecimal escape sequences, e.g.,
x1B:

Regexes are treated mostly as double quoted strings, so variable substitution works

$foo = 'house';

'cathouse' = /cat$foo/; # matches
'housecat' = /${foo}cat/; # matches
With all of the regexes above, if the regex matched anywhere in the string, it was considered a match. To specify where it should match, we would use the anchor metacharacters ^ and $. The anchor ^ means match at the beginning of the string and the anchor $ means match at the end of the string, or before a newline at the end of the string. Some examples:

Using character classes

A character class allows a set of possible
characters, rather than just a single character, to match at
a particular point in a regex. Character classes are denoted
by brackets [...?, with the set of characters to be
possibly matched inside. Here are some

examples

/cat/; # matches 'cat'

/[bcr?at/; # matches 'bat', 'cat', or 'rat'
In the last statement, even though 'c' is the first character in the class, the earliest point at which the regex can match is 'a'.

/yes/i; # also match 'yes' in a case-insensitive way
The last example shows a match with an 'i' modifier, which makes the match case-insensitive.

Character classes also have ordinary and special characters,
but the sets of ordinary and special characters inside a
character class are different than those outside a character
class. The special characters for a character class are

/[0-9a-fA-F?/; # matches a hexadecimal digit
If '-' is the first or last character in a character class, it is treated as an ordinary character.

The special character ^ in the first position of a
character class denotes a negated character class,
which matches any character but those in the brackets. Both
[...? and [^...? must match a character,
or the match fails. Then