"The problem here is that there is no way to detect run-away regular expressions here without huge performance and memory penalties. Yes, we could build PCRE in a way that it wouldn't segfault and we could crank up the default backtrack limit to something huge, but it would slow every regex call down by a lot. If PCRE provided a way to handle this in a more graceful manner without the performance hit we would of course use it."

Here's some fleecy code to 1. validate RCF2822 conformity of address lists and 2. to extract the address specification (the part commonly known as 'email'). I wouldn't suggest using it for input form email checking, but it might be just what you want for other email applications. I know it can be optimized further, but that part I'll leave up to you nutcrackers. The total length of the resulting Regex is about 30000 bytes. That because it accepts comments. You can remove that by setting $cfws to $fws and it shrinks to about 6000 bytes. Conformity checking is absolutely and strictly referring to RFC2822. Have fun and email me if you have any enhancements!

Recently I had to write search engine in hebrew and ran into huge amount of problems. My data was stored in MySQL table with utf8_bin encoding.

So, to be able to write hebrew in utf8 table you need to do<?php$prepared_text = addslashes(urf8_encode($text));?>

But then I had to find if some word exists in stored text. This is the place I got stuck. Simple preg_match would not find text since hebrew doesnt work that easy. I've tried with /u and who kows what else.

I found simpleXML to be useful only in cases where the XML was extremely small, otherwise the server would run out of memory (I suspect there is a memory leak or something?). So while searching for alternative parsers, I decided to try a simpler approach. I don't know how this compares with cpu usage, but I know it works with large XML structures. This is more a manual method, but it works for me since I always know what structure of data I will be receiving.

Essentially I just preg_match() unique nodes to find the values I am looking for, or I preg_match_all to find multiple nodes. This puts the results in an array and I can then process this data as I please.

I was unhappy though, that preg_match_all() stores the data twice (requiring twice the memory), one array for all the full pattern matches, and one array for all the sub pattern matches. You could probably write your own function that overcame this. But for now this works for me, and I hope it saves someone else some time as well.

I had been crafting and testing some regexp patterns online using the tools Regex101 and a `preg_match_all()` tester and found that the regexp patterns I wrote worked fine on them, just not in my code.

As I intended to create for my own purpose a clean PHP class to act on XML files, combining the use of DOM and simplexml functions, I had that small problem, but very annoying, that the offsets in a path is not numbered the same in both.

That is to say, for example, if i get a DOM xpath object it appears like:
/ANODE/ANOTHERNODE/SOMENODE[9]/NODE[2]
and as a simplexml object would be equivalent to:
ANODE->ANOTHERNODE->SOMENODE[8]->NODE[1]

So u see what I mean? I used preg_match_all to solve that problem, and finally I got this after some hours of headlock (as I'm french the names of variables are in French sorry), hoping it could be useful to some of you:

<?php
function decrease_string($string)
{
/* retrieve all occurrences AND offsets of numbers in the original string: */

;Please note that if you set this value to a high number you may consume all;the available process stack and eventually crash PHP (due to reaching the;stack size limit imposed by the Operating System).

I have written this example mainly to demonstrate the power of PCRE LANGUAGE, not the power of it's implementation :)

Here is a way to match everything on the page, performing an action for each match as you go. I had used this idiom in other languages, where its use is customary, but in PHP it seems to be not quite as common.

// Update offset to the end of the match$offset = $match_start + $match_length; }

return $match_count;}?>

Note that the offsets returned are byte values (not necessarily number of characters) so you'll have to make sure the data is single-byte encoded. (Or have a look at paolo mosna's strByte function on the strlen manual page).I'd be interested to know how this method performs speedwise against using preg_match_all and then recursing through the results.