preg_match

Description

Searches subject for a match to the regular
expression given in pattern.

Parameters

pattern

The pattern to search for, as a string.

subject

The input string.

matches

If matches is provided, then it is filled with
the results of search. $matches[0] will contain the
text that matched the full pattern, $matches[1]
will have the text that matched the first captured parenthesized
subpattern, and so on.

flags

flags can be the following flag:

PREG_OFFSET_CAPTURE

If this flag is passed, for every occurring match the appendant string
offset will also be returned. Note that this changes the value of
matches into an array where every element is an
array consisting of the matched string at offset 0
and its string offset into subject at offset
1.

offset

Normally, the search starts from the beginning of the subject string.
The optional parameter offset can be used to
specify the alternate place from which to start the search (in bytes).

Note:

Using offset is not equivalent to passing
substr($subject, $offset) to
preg_match() in place of the subject string,
because pattern can contain assertions such as
^, $ or
(?<=x). Compare:

Return Values

preg_match() returns 1 if the pattern
matches given subject, 0 if it does not, or FALSE
if an error occurred.

Warning

This function may
return Boolean FALSE, but may also return a non-Boolean value which
evaluates to FALSE. Please read the section on Booleans for more
information. Use the ===
operator for testing the return value of this
function.

Changelog

Version

Description

5.3.6

Returns FALSE if offset is higher than
subject length.

5.2.2

Named subpatterns now accept the
syntax (?<name>)
and (?'name') as well
as (?P<name>). Previous versions
accepted only (?P<name>).

<?php/* The \b in the pattern indicates a word boundary, so only the distinct * word "web" is matched, and not a word partial like "webbing" or "cobweb" */if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) { echo "A match was found.";} else { echo "A match was not found.";}

User Contributed Notes 65 notes

Regex quick reference[abc] A single character: a, b or c[^abc] Any single character but a, b, or c[a-z] Any single character in the range a-z[a-zA-Z] Any single character in the range a-z or A-Z^ Start of line$ End of line\A Start of string\z End of string. Any single character\s Any whitespace character\S Any non-whitespace character\d Any digit\D Any non-digit\w Any word character (letter, number, underscore)\W Any non-word character\b Any word boundary character(...) Capture everything enclosed(a|b) a or ba? Zero or one of aa* Zero or more of aa+ One or more of aa{3} Exactly 3 of aa{3,} 3 or more of aa{3,6} Between 3 and 6 of a

Sometimes its useful to negate a string. The first method which comes to mind to do this is: [^(string)] but this of course won't work. There is a solution, but it is not very well known. This is the simple piece of code on how a negation of a string is done:

(?:(?!string).)

?: makes a subpattern (see http://www.php.net/manual/en/regexp.reference.subpatterns.php) and ?! is a negative look ahead. You put the negative look ahead in front of the dot because you want the regex engine to first check if there is an occurrence of the string you are negating. Only if it is not there, you want to match an arbitrary character.

here is a small tool for someone learning to use regular expressions. it's very basic, and allows you to try different patterns and combinations. I made it to help me, because I like to try different things, to get a good understanding of how things work.

I noticed that in order to deal with UTF-8 texts, without having to recompile php with the PCRE UTF-8 flag enabled, you can just add the following sequence at the start of your pattern: (*UTF8)

for instance : '#(*UTF8)[[:alnum:]]#' will return TRUE for 'é' where '#[[:alnum:]]#' will return FALSE

found this very very useful tip after hours of research over the web directly in pcre website right here : http://www.pcre.org/pcre.txtthere are many further informations about UTF-8 support in the lib

I see a lot of people trying to put together phone regex's and struggling (hey, no worries...they're complicated). Here's one that we use that's pretty nifty. It's not perfect, but it should work for most non-idealists.

There does not seem to be any mention of the PHP version of switches that can be used with regular expressions.

preg_match_all('/regular expr/sim',$text).

The s i m being the location for and available switches (I know about)
The i is to ignore letter cases (this is commonly known - I think)
The s tells the code NOT TO stop searching when it encounters \n (line break) - this is important with multi-line entries for example text from an editor that needs search.
The m tells the code it is a multi-line entry, but importantly allows the use of ^ and $ to work when showing start and end.

I am hoping this will save someone from the 4 hours of torture that I endured, trying to workout this issue.

Note that you actually get the named group as well as the numerical keyvalue too, so if you do use them, and you're counting array elements, beaware that your array might be bigger than you initially expect it to be.

Some times a Hacker use a php file or shell as a image to hack your website. so if you try to use move_uploaded_file() function as in example to allow for users to upload files, you must check if this file contains a bad codes or not so we use this function. preg match

// a typical URL_query validity-checker (the pattern's function does not matter for this example)$pattern = '/^(?:[;\/?:@&=+$,]|(?:[^\W_]|[-_.!~*\()\[\] ])|(?:%[\da-fA-F]{2}))*$/';

var_dump( preg_match( $pattern, $text ) );

?>

Possible bug (1):=============On one of our Linux-Servers the above example crashes PHP-execution with a C(?) Segmentation Fault(!). This seems to be a known bug (see http://bugs.php.net/bug.php?id=40909), but I don't know if it has been fixed, yet.If you are looking for a work-around, the following code-snippet is what I found helpful. It wraps the possibly crashing preg_match call by decreasing the PCRE recursion limit in order to result in a Reg-Exp error instead of a PHP-crash.

// reset the PCRE recursion limit to its original valueini_set( "pcre.recursion_limit", $former_recursion_limit );

// if the reg-exp fails due to the decreased recursion limit we may not make any statement, but PHP-execution continuesif ( PREG_RECURSION_LIMIT_ERROR === preg_last_error() ){// react on the failed regular expression here$result = [...];

// do logging or email-sending here[...]} //if

?>

Possible bug (2):=============On one of our Windows-Servers the above example does not crash PHP, but (directly) hits the recursion-limit. Here, the problem is that preg_match does not return boolean(false) as expected by the description / manual of above.In short, preg_match seems to return an int(0) instead of the expected boolean(false) if the regular expression could not be executed due to the PCRE recursion-limit. So, if preg_match results in int(0) you seem to have to check preg_last_error() if maybe an error occurred.

I have been working on a email system that will automatically generate a text email from a given HTML email by using strip_tags(). The only issue I ran into, for my needs, were that the anchors would not keep their links. I search for a little while and could not find anything to strip the links from the tags so I generated my own little snippet. I am posting it here in hopes that others may find it useful and for later reference.

A note to keep in mind:I was primarily concerned with valid HTML so if attributes do no use ' or " to contain the values then this will need to be tweaked.If you can edit this to work better, please let me know.<?php/** * Replaces anchor tags with text * - Will search string and replace all anchor tags with text (case insensitive) * * How it works: * - Searches string for an anchor tag, checks to make sure it matches the criteria * Anchor search criteria: * - 1 - <a (must have the start of the anchor tag ) * - 2 - Can have any number of spaces or other attributes before and after the href attribute * - 3 - Must close the anchor tag * * - Once the check has passed it will then replace the anchor tag with the string replacement * - The string replacement can be customized * * Know issue: * - This will not work for anchors that do not use a ' or " to contain the attributes. * (i.e.- <a href=http: //php.net>PHP.net</a> will not be replaced) */function replaceAnchorsWithText($data) {/** * Had to modify $regex so it could post to the site... so I broke it into 6 parts. */$regex = '/(<a\s*'; // Start of anchor tag$regex .= '(.*?)\s*'; // Any attributes or spaces that may or may not exist$regex .= 'href=[\'"]+?\s*(?P<link>\S+)\s*[\'"]+?'; // Grab the link$regex .= '\s*(.*?)\s*>\s*'; // Any attributes or spaces that may or may not exist before closing tag $regex .= '(?P<name>\S+)'; // Grab the name$regex .= '\s*<\/a>)/i'; // Any number of spaces between the closing anchor tag (case insensitive)

if (is_array($data)) {// This is what will replace the link (modify to you liking)$data = "{$data['name']}({$data['link']})"; } return preg_replace_callback($regex, 'replaceAnchorsWithText', $data);}

echo replaceAnchorsWithText($input).'<hr/>';?>Will output:Test 1: PHP.NET1(http: //php.net1).Test 2: PHP.NET2(HTTP: //PHP.NET2).Test 3: php.net3 (is still an anchor)This last line had nothing to do with any of this

Posting to this site is painful...Had to break up the regex and had to break the test links since it was being flagged as spam...

Matching a backslash character can be confusing, because double escaping is needed in the pattern: first for PHP, second for the regex engine<?php//match newline control character:preg_match('/\n/','\n'); //pattern matches and is stored as control character 0x0A in the pattern stringpreg_match('/\\\n/','\n'); //very same match, but is stored escaped as 0x5C,0x6E in the pattern string

When using a 'bad words reject string' filter, preg_match is MUCH faster than strpos / stripos. Because in the other cases, you would need to do a foreach for each word. With efficient programming, the foreach is ONLY faster when the first word in the ban-list is found.

Just an interesting note. Was just updating code to replace ereg() with strpos() and preg_match and the thought occured that preg_match() could be optimized to quit early when only searching if a string begins with something, for example<?phpif(preg_match("/^http/", $url)){//do something}?>

vs

<?php if(strpos($url, "http") === 0){//do something}?>

As I guessed, strpos() is always faster (about 2x) for short strings like a URL but for very long strings of several paragraphs (e.g. a block of XML) when the string doesn't start with the needle preg_match as twice as fast as strpos() as it doesn't scan the entire string.

So, if you are searching long strings and expect it to normally be true (e.g. validating XML), strpos() is a much faster BUT if you expect if to often fail, preg_match is the better choice.

If someone is from a country that accepts decimal numbers in format 9.00 and 9,00 (point or comma), number validation would be like that:<?php$number_check = "9,99";if (preg_match( '/^[\-+]?[0-9]*\.*\,?[0-9]+$/', $number_check)) { return TRUE; }?>

However, if the number will be written in the database, most probably this comma needs to be replaced with a dot. This can be done with use of str_replace, i.e :<?php$number_database = str_replace("," , "." , $number_check);?>

preg_match and preg_replace_callback doesnt match up in the structure of the array that they fill-up for a match.preg_match, as the example shows, supports named patterns, whereas preg_replace_callback doesnt seem to support it at all. It seem to ignore any named pattern matched.

Preg_match returns empty result trying to validate $subject with carriege returns (/n/r).To solve it one need to use /s modifier in $pattern string.<?php$pattern='/.*/s';$valid=preg_match($pattern, $subject, $match);?>

If you need to check for .com.br and .com.au and .uk and all the other crazy domain endings i found the following expression works well if you want to validate an email address. Its quite generous in what it will allow

The most accurate IPv4 function. It will not allow leading zeros and supports the full address range of 0.0.0.0 - 255.255.255.255

<?phpfunction is_ipv4($string){// The regular expression checks for any number between 0 and 255 beginning with a dot (repeated 3 times) // followed by another number between 0 and 255 at the end. The equivalent to an IPv4 address.return (bool) preg_match('/^(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])'.'\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|[0-9])$/', $string);}?>

I spent a while replacing all my ereg() calls to preg_match(), since ereg() is now deprecated and will not be supported as of v 6.0.

Just a warning regarding the conversion, the two functions behave very similarly, but not exactly alike. Obviously, you will need to delimit your pattern with '/' or '|' characters.

The difference that stumped me was that preg_replace overwrites the $matches array regardless if a match was found. If no match was found, $matches is simply empty.

ereg(), however, would leave $matches alone if a match was not found. In my code, I had repeated calls to ereg, and was populating $matches with each match. I was only interested in the last match. However, with preg_match, if the very last call to the function did not result in a match, the $matches array would be overwritten with a blank value.

i do a fair bit of html scraping in conjunction with curl. i always need to know if i have reached the right page or if the curl request failed. the main problem i have encountered is html tags having unexpected spaces or other characters (especially the &nbsp; character sequence) between them. for example when requesting a page with a certain manner set of post or get variables the response might be

<a href='blah'><span>data data data</span></a>

but requesting the same page with different post/get variables might give the following result:

<a href='blah'> &nbsp;<span>data data data</span></a>

to match both of these tag sequences with the same pattern i use the [\S\s]*? wildcard which basically means 'match anything at all...but not if you can help it'

I noted that PCRE_ANCHORED (the pattern modifier A) does work fine if using an offset. If you use the escape sequence \A or even the dash "^" in the regex, it does not work (even if in multiline mode)...

Just a note about my last post. The regex expression for the function I posted contains a question mark at the end. Technically this doesn't need to be there but it will work with or without it. Just remove it if you don't want it. Enjoy!

Simple function to return a sub-string following the preg convention. Kind of expensive, and some might say lazy but it has saved me time.

# preg_substr($pattern,$subject,[$offset]) function# @author aer0s# return a specific sub-string in a string using # a regular expression # @param $pattern regular expression pattern to match# @param $subject string to search# @param [$offset] zero based match occurrence to return# # [$offset] is 0 by default which returns the first occurrence,# if [$offset] is -1 it will return the last occurrence

This is a really simple script made for beginners !If you'd like you could add restriction to the numbers. The code above will accept all kind of numbers and we know that IP address could be MAX 255.255.255.255 and the example accepts to 999.999.999.999.