Sadly, the Posix regexp evaluator (PHP 4.1.2) does not seem to support multi-character coallating sequences, even though such sequences are included in the man-page documentation.

Specifically, the man-page discusses the expression "[[.ch.]]*c" which matches the first five characters of "chchcc". Running this expression in ereg_replace generates the error "Warning: REG_ECOLLATE". (Running an equivalent expression with only one character between the periods does work, however.)

Multi-character coallating sequences are not supported!

This is really, really too bad, because it would have provided a simple way to exlude words from the target.

Tip !Metacharacters in regular expresions are usefull and easy to use.

The following is a set of special values that denote certain common ranges. They have the advantage that also take in account the 'locale' i.e. any variant of the local language/coding system.

[:digit:] Only the digits 0 to 9 [:alnum:] Any alphanumeric character 0 to 9 OR A to Z or a to z. [:alpha:] Any alpha character A to Z or a to z. [:blank:] Space and TAB characters only. [:xdigit:] . [:punct:] Punctuation symbols . , " ' ? ! ; : [:print:] Any printable character. [:space:] Any space characters. [:graph:] . [:upper:] Any alpha character A to Z. [:lower:] Any alpha character a to z. [:cntrl:] .

I was having a ton of issues with other people's phone number validation expressions, so I made my own. It works with most US phone numbers, including those with extentions. Format matches any of the following formats:

^ Start of line$ End of linen? Zero or only one single occurrence of character 'n'n* Zero or more occurrences of character 'n'n+ At least one or more occurrences of character 'n'n{2} Exactly two occurrences of 'n'n{2,} At least 2 or more occurrences of 'n'n{2,4} From 2 to 4 occurrences of 'n'. Any single character() Parenthesis to group expressions(.*) Zero or more occurrences of any single character, ie, anything!(n|a) Either 'n' or 'a'[1-6] Any single digit in the range between 1 and 6[c-h] Any single lower case letter in the range between c and h[D-M] Any single upper case letter in the range between D and M[^a-z] Any single character EXCEPT any lower case letter between a and z.

Pitfall: the ^ symbol only acts as an EXCEPT rule if it is the very first character inside a range, and it denies the entire range including the ^ symbol itself if it appears again later in the range. Also remember that if it is the first character in the entire expression, it means "start of line". In any other place, it is always treated as a regular ^ symbol. In other words, you cannot deny a word with ^undesired_word or a group with ^(undesired_phrase). Read more detailed regex documentation to find out what is necessary to achieve this.

[_4^a-zA-Z] Any single character which can be the underscore or the number 4 or the ^ symbol or any letter, lower or upper case

?, +, * and the {} count parameters can be appended not only to a single character, but also to a group() or a range[].

therefore,^.{2}[a-z]{1,2}_?[0-9]*([1-6]|[a-f])[^1-9]{2}a+$would mean:

^.{2} = A line beginning with any two characters, [a-z]{1,2} = followed by either 1 or 2 lower case letters, _? = followed by an optional underscore, [0-9]* = followed by zero or more digits, ([1-6]|[a-f]) = followed by either a digit between 1 and 6 OR a lower case letter between a and f, [^1-9]{2} = followed by any two characters except digits between 1 and 9 (0 is possible), a+$ = followed by at least one or more occurrences of 'a' at the end of a line.

Follow-up to my previous post:Some simple optimization allowed me to realize that excluding a word at the beginning of a string has a degree of complexity O(n) rather than O(n^2). I only had to follow the logic:

It's easy to exclude characters but excluding words with a regular expression is a bit more tricky. For parentheses there is no equivalent to the ^ for brackets. The only way I've found to exclude a string is to proceed by inverse logic: accept all the words that do NOT correspond to the string. So if you want to accept all strings except those _begining_ with "abc", you'd have to accept any string that matches one of the following: ^(ab[^c]) ^(a[^b]c) ^(a[^b][^c]) ^([^a]bc) ^([^a]b[^c]) ^([^a][^b]c) ^([^a][^b][^c])

Note that this won't work to detect the word "abc" anywhere in a string. You need to have some way of anchoring the inverse word matchlike: ^(a[^b]|[^a]b|[^a][^b]) ;"ab" not at begining of line or: (a[^b]|[^a]b|[^a][^b])& ;"ab" not at end of line or: 123(a[^b]|[^a]b|[^a][^b]) ;"ab" not after "123"

I don't know why "(abc){0,0}" is an invalid synthax. It would've made all this much simpler.

Something that really got me: I'm used to using Perl's regexps, and so I used \s to check for a whitespace character in a password on a website. My PHP book (Wrox Press, Professional PHP Programming) agreed with me that this is exactly the same as [ \r\n\t\f\v], but it's NOT. In fact, what it did was keep anyone from joining the site if they put an 's' in their password! So beware, check for subtle differences between what you're used to and PHP.