It has been reported, well, more than enough times to the PHP team (I made another attempt today, hoping this will get fixed in some time soon.. if at all). This issue affects all PHP versions Mario Heiderich and me could test, and endangers practically all PHP programs that use the utf8_decode() function for decoding (as recommended by OWASP guidelines).

A filter (such as addslashes, htmlentities, escapeshellarg, etc.) will NOT be able to detect&escape such byte sequences, and so an application that relies on them for security checks wont be protected at all. Because it allows an attacker to encode "dangerous" chars, such as ', ", <, ;, &, \0 in different ways:

Note that htmlpurifier does a utf8_decode function call at the end of the decoding, BUT they are safe because of a pre-encoding made by htmlpurifier.. other codes that do the same wont be so lucky.

Bogdan Calin from Acunetix WVS described a couple of other potential attack scenarios:

Where an attacker could fool the filter by doing a request like:vuln.php?input=%F6%3Cimg+onmouseover=prompt(/xss/)//%F6%3E

And:

Where an attacker could fool the filter by doing a request like:index.php?username=test%FC%27%27+or+1=1+–+&password=a

3.- Integer overflow:
Unsigned short has a size of 16 bits (2 bytes), that is UNCAPABLE of storing unicode characters of 21 bits, and represented on UTF with 4 bytes (1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx). PHP attempts to sum a 21 bits value to a 16 bits-size variable, and then makes no checks on the value.

This besides the already mentioned problems, and the possibility of bypassing quite a lot of WAFs and Filters.. demonstrate the problem of a bad unicode implementation on PHP.

I hope the PHP development team acknowledges all this issues that have been reported before, and were explained some months ago on Blackhat USA (and the developers were noticed to check the ppt more than once), and now are explained yet another time.

This was fixed on 5.2.11 :) on my birthday!! Sept 17

Anyway.. that's not all, now to finish this post I want to publish a overlong utf-8 exception on Firefox (actually, Mozilla's).

The firefox one

Firefox is supposed to consider the non-shortest form exception (point #1 in the PHP vulnerabilities), and section 3.1 of the Unicode Technical Report #36 but apparently there's a flaw on it. This is specially problematic for the reasons that an overlong unicode sequence not taken into consideration may allow several types of filter bypasses.

Anyway, the severity of this vulnerability is not as high as the PHP ones, but is worth mentioning. The following non-shortest form for the char U+1000: