Login

Functions of Strings and Regular Expressions

In this second part of a five-part series on strings and regular expressions in PHP, you’ll learn about regular expression functions and a variety of string-specific functions. This article is excerpted from chapter nine of the book Beginning PHP and Oracle: From Novice to Professional, written by W. Jason Gilmore and Bob Bryla (Apress; ISBN: 1590597702).

Note that the array corresponds to the indexed order of the input array. If the value at that index position matches, it’s included in the corresponding position of the output array. Otherwise, that position is empty. If you want to remove those instances of the array that are blank, filter the output array through the function
array_values()
, introduced in Chapter 5.

The optional input parameter
flags
was added in PHP version 4.3. It accepts one value,
PREG_GREP_INVERT
. Passing this flag will result in retrieval of those array elements that do not match the pattern.

Searching for a Pattern

The
preg_match()
function searches a string for a specific pattern, returning
TRUE
if it exists, and
FALSE
otherwise. Its prototype follows:

The optional input parameter
pattern_array
can contain various sections of the subpatterns contained in the search pattern, if applicable. Here’s an example that uses
preg_match()
to perform a case-insensitive search:

PREG_PATTERN_ORDER
is the default if the optional
order
parameter is not included.
PREG_PATTERN_ORDER
specifies the order in the way that you might think most logical:
$pattern_array[0]
is an array of all complete pattern matches,
$pattern_array[1]
is an array of all strings matching the first parenthesized regular expression, and so on.

PREG_SET_ORDER
orders the array a bit differently than the default setting.
$pattern_array[0]
contains elements matched by the first parenthesized regular expression,
$pattern_array[1]
contains elements matched by the second parenthesized regular expression, and so on.

Here’s how you would use
preg_match_all()
to find all strings enclosed in bold HTML tags:

The function
preg_quote()
inserts a backslash delimiter before every character of special significance to regular expression syntax. These special characters include
$^*( ) +={ } [ ] | \ : < >
. Its prototype follows:

string preg_quote(string str [, string delimiter])

The optional parameter
delimiter
specifies what delimiter is used for the regular expression, causing it to also be escaped by a backslash. Consider an example:

<?php$text = "Tickets for the bout are going for $500.";echo preg_quote($text);
?>

This returns the following:

——————————————–
Tickets for the bout are going for $500.——————————————–

Replacing All Occurrences of a Pattern

The
preg_replace()
function operates identically to
ereg_replace()
, except that it uses a Perl-based regular expression syntax, replacing all occurrences of
pattern
with
replacement
, and returning the modified result. Its prototype follows:

——————————————–
This is a link to
<a href="http://www.wjgilmore.com/">http:// www.wjgilmore.com/</a>.——————————————–

Interestingly, the
pattern
and
replacement
input parameters can also be arrays. This function will cycle through each element of each array, making replacements as they are found. Consider this example, which could be marketed as a corporate report filter:

——————————————–
In 2007 the company celebrated skyrocketing revenues and expansion.——————————————–

Creating a Custom Replacement Function

In some situations you might wish to replace strings based on a somewhat more complex set of criteria beyond what is provided by PHP’s default capabilities. For instance, consider a situation where you want to scan some text for acronyms such as IRS and insert the complete name directly following the acronym. To do so, you need to create a custom function and then use the function
preg_replace_callback()
to temporarily tie it into the language. Its prototype follows:

The
pattern
parameter determines what you’re looking for, while the
str
parameter defines the string you’re searching. The
callback
parameter defines the name of the function to be used for the replacement task. The optional parameter
limit
specifies how many matches should take place. Failing to set
limit
or setting it to
-1
will result in the replacement of all occurrences. In the following example, a function named
acronym()
is passed into
preg_replace_callback()
and is used to insert the long form of various acronyms into the target string:

Note Later in this chapter, the section titled “Alternatives for Regular Expression Functions” offers several standard functions that can be used in lieu of regular expressions for certain tasks. In many cases, these alternative functions actually perform much faster than their regular expression counterparts.

{mospagebreak title=Other String-Specific Functions}

In addition to the regular expression–based functions discussed in the first half of this chapter, PHP offers more than 100 functions collectively capable of manipulating practically every imaginable aspect of a string. To introduce each function would be out of the scope of this book and would only repeat much of the information in the PHP documentation. This section is devoted to a categorical FAQ of sorts, focusing upon the string-related issues that seem to most frequently appear within community forums. The section is divided into the following topics:

Determining string length

Comparing string length

Manipulating string case

Converting strings to and from HTML

Alternatives for regular expression functions

Padding and stripping a string

Counting characters and words

Determining the Length of a String

Determining string length is a repeated action within countless applications. The PHP function strlen() accomplishes this task quite nicely. This function returns the length of a string, where each character in the string is equivalent to one unit. Its prototype follows:

int strlen(string str)

The following example verifies whether a user password is of acceptable length:

In this case, the error message will not appear because the chosen password consists of ten characters, whereas the conditional expression validates whether the target string consists of less than ten characters.

Comparing Two Strings

String comparison is arguably one of the most important features of the string-handling capabilities of any language. Although there are many ways in which two strings can be compared for equality, PHP provides four functions for performing this task: strcmp(), strcasecmp()
,
strspn()
, and
strcspn()
. These functions are discussed in the following sections.

Comparing Two Strings Case Sensitively

The
strcmp()
function performs a binary-safe, case-sensitive comparison of two strings. Its prototype follows:

int strcmp(string str1, string str2)

It will return one of three possible values based on the comparison outcome:

0
if
str1
and
str2
are equal

-1
if
str1
is less than
str
2

1
if
str2
is less than
str1

Web sites often require a registering user to enter and then confirm a password, lessening the possibility of an incorrectly entered password as a result of a typing error.
strcmp()
is a great function for comparing the two password entries because passwords are often case sensitive:

Note that the strings must match exactly for
strcmp()
to consider them equal. For example,
Supersecret
is different from
supersecret
. If you’re looking to compare two strings case insensitively, consider
strcasecmp()
, introduced next.

Another common point of confusion regarding this function surrounds its behavior of returning
0
if the two strings are equal. This is different from executing a string comparison using the
==
operator, like so:

if ($str1 == $str2)

While both accomplish the same goal, which is to compare two strings, keep in mind that the values they return in doing so are different.

Comparing Two Strings Case Insensitively

The
strcasecmp()
function operates exactly like
strcmp()
, except that its comparison is case insensitive. Its prototype follows:

int strcasecmp(string str1, string str2)

The following example compares two e-mail addresses, an ideal use for
strcasecmp()
because case does not determine an e-mail address’s uniqueness:

Note that if O’Malley was accidentally written as O’malley,
ucwords()
would not catch the error, as it considers a word to be defined as a string of characters separated from other entities in the string by a blank space on each side.