PHP Basic Tutorial

PHP Regular Expressions

Regular expressions are patterns that define a list of characters. They can be used to search, replace and otherwise work with strings, but are most commonly used to validate forms, since forms allow users to enter in unknown data. We will learn here how to determine whether or not an email address uses the correct syntax.

In the past, PHP supported two methods of pattern matching (POSIX and PCRE). Now, PHP is dropping support for POSIX methods, leaving PCRE (Perl-Compatible Regular Expressions) our subject matter for this tutorial.

There are two types of regular expressions. “Literal” characters are those that match themselves, and “metacharacters” are those that have another meaning.

“Literal” characters can be a single character, a word, a phrase, etc. that is taken literally, at face-value. Using the preg_match() function, let’s look at an example of a literal regular expression. (Note: Literal and metacharacter patterns must both be enclosed in slash “/” delimiters, as seen in our example below.)

<?php
$string = "Which side of a chicken has the most feathers?";
if (preg_match("/chicken/", $string)) {
echo $string . "<br>";
echo "The outside.";
}
?>

Since in our example a match is found, the question and answer will be echoed. If you change the word “chicken” in either one of the instances where it is found, there will no longer be a match, and the statements will not be echoed.

Metacharacters are understandably more involved than their literal counterparts, but with a little study, they can be interpreted. Let’s begin with an easy subject: character classes.

Character classes specify which characters are acceptable in a pattern. [a-z], [A-B] and [0-9] are all examples of character classes covering a wide range of characters, but you can build your own by enclosing the acceptable characters in square brackets. Examples: [dgefgh] matches d, g, e, f, etc. and [l-p] matches l, m, n, o & p.

Negated character classes specify which characters are not acceptable in a pattern. These classes are in square brackets, but begin with ^. Examples: [^dgefgh] means that d, g, e, f, g & h are not acceptable and [^l-p] means that l, m, n, o & p are not acceptable.

And now let’s take a brief look at the basic possibilities that regular expressions allow.

Escapes Special Characters So They Can Be Used In Pattern to Represent Themselves

()

(cat)

Used to Group Options Together By Capturing Subpatterns

[]

[abcde]

Used to Group Options Together By Forming Classes

+

cat+

Means That There Should Be One or More Occurrence of the Preceding Character or Expression

*

cat*

Means That There Should Be Zero or More Occurrence of the Preceding Character or Expression

?

cat?

Means That There Should Be Zero or One Occurrence of the Preceding Character or Expression

{}

cat{2}

Means That There Should Be A Certain Number (2) of Occurrences of the Preceding Character or Expression

{}

cat{5,7}

Means That There Should Be A Certain Number (Between 5 & 7) of Occurrences of the Preceding Character or Expression

Now that you have all of that memorized, let’s jump into our long-awaited example, where we will determine whether or not an email address uses the correct syntax. Consider the following by looking up each symbol on the chart above, and see what you make of it.

As you can see, this regular expression defines a series of rules that express an acceptable pattern to be followed in order to validate the proper syntax of an email address. It can be used in the following manner: