Donnerstag, 24. November 2011

JSF Ajax Encoding

The entire story started when we got the request to encode everything in ISO-8859-51 including the ajax cycle. Well first I thought it was easy just change the encoding on the javascript side, let the server handle the rest. The encoding was easily detectable on the javascript side simply by checking the xhtmls encoding (the meta tag head encoding itself would have been another option, but since we are nailed down to xhtml anyway due to facelets we have an easier way)
Easy, I thought, but then I ran into browser hell.

The problem normally would be easily resolvable. The XHR object has the option of adding
xhr header content type:

The problem now are the browsers themselves. By testing the dynamic encoding on various browsers following came out:

Browser

Actual Encoding

Mozilla 7

UTF-8

Chrome

UTF-8

IE

ISO-8859-15

Opera

ISO-8859-15

So what does this mean, only opera and IE got it right. Which means the path of allowing non UTF-8 submits is blocked for now.

However JSF automatically deals with the problem properly. While I implemented the most of the Ajax part of myfaces, I have to admit the actual encoding part was provided by another project, namely j4fry and its implementors worked on that part, so I never gave a second thought. However both implementations deal the same way with the issue.

First ajax submits are UTF-8 encoded, this at the first look could pose problems with non UTF pages. It turns out there are none.

The solution both implementations follow is to encode the actual key value pair parameters into a utf url encoded representation.

Both implementations seem to apply the encodeURIComponent function of javascript-
Now now matter what content type the page has, always a proper utf-8 representation of the original content will be passed down.

Given the response type also then is UTF-8 what happens with the response. After all the page needs to handle the response properly in its own encoding.
Well there MyFaces and Mojarra differ slightly. Both in compliance with the spec, encode the response in a XML CDATA block. However MyFaces does additional escaping of non ascii chars with their unicode representation, while Mojarra simply pushes the content byte per byte into the CDATA block.

Here is an example:

Mojarra:

Here it is clearly visible that the cdata block has a different encoding than the outer UTF-8 encoded xml. In the final page representation all the special chars are visible again as they should be.

However MyFaces goes one step further and escapes the content additionally to get rid of the non utf-8 representation of the characters.

However this comes from the fact that myfaces basically also does an escape of special chars at a full page refresh, so the core behavior regarding the partial response is the same.

So what happens for instance if you just tamper with the UTF header.
You automatically will run into problems due to the uri encoded UTF-8 representation of the parameters. In the worst case you will trigger a server error because of non decodable parameters, in the best case if you pass down ascii only chars you will get it through, in the normal case you will get junk in which is wrongly decoded.

See here an example on IE9 and Mojarra:

The question remains, are both save from a client point of view? Theoretically it should be since everything is embedded in a CDATA block.
However I cannot say if the browsers swallow really everything within their browser engines which is embedded in a CDATA block (aka every byte combination outside of their encoding).

It comes down again to the golden rule #1 in web programming, use UFT-8 and never bother with other encodings, if you can. If you have to, be prepared to enter the valley of tears, after all UTF-8 now has been for almost a decade the fix for all the encoding issues in the web space.