WAF Normalization and I18N

WAF Normalization and I18N

Web application firewalls must be able to handle Internationaliztion (I18N) and thus properly handle various data encodings including Unicode and UTF-8 in order to prevent not only evasion issues but also to minimize false positives. In an earlier blog post, we highlighted ModSecurity's new support for Unicode mapping and decoding. This capability helps us to more accurately decode characters from different Unicode code points. While this certainly helps our accuracy, we still had the issue of UTF-8 encodings.. This is a challenge for any WAF as it must be able to handle UTF-8 encodings of characters for different languages such as Portuquese. So, if you are running ModSecurity to protect a non-English language website then this blog post is for you! We introduce a new transformation function called utf8toUnicode that helps to normalize data for inspection.

When these characters are UTF-8 encoded, they use multiple bytes. As an example, the "ę" character is encoded as "%c4%99". If ModSecurity only applies the standard t:urlDecodeUni, it will decode each byte individually which results in an impedance mismatch. In this case, this incorrect decoding resulted in a false positive match against some SQL Injection rules in the OWASP ModSecurity CRS. While this is a bit of a pain, it is not as bad as a
false negative bypass situation that may be caused by this type of incorrect decoding
. Let's look an this type of SQL Injection evasion issue. What if we send the following request:

http://172.16.51.132/index.php?foo='úníón+séléct+data+fróm+námés

Let's see how ModSecurity will decode this data when checking for an example SQL Injection keyword:

As you can see above the character sequence "úníón+séléct+data+fróm+námés" was handled by the engine as "\xc3\xban\xc3\xad\xc3\xb3ns\xc3\xa9l\xc3\xa9ct data fr\xc3\xb3m n\xc3\xa1m\xc3\xa9s". However the rulewas looking for the pattern "select". So, what happen if the application applies best-fit mapping conversions and removes the accents before sending the data tothe database ? This may allow the payload to bypass our signatures.

Utf8 to Unicode Mapping

In order to better handle this data, we should first map the UTF-8 encoded data to Unicode and then use the unicode point mapping capabilities mentioned at the beginning of the blog post. This configuration is achieved by first setting the SecUnicodeCodePage and SecUnicodeMapFile directives in your main ModSecurity configuration file: