Our company is exploring the idea of using Unicode in our web pages.
We ran into a problem that, despite our effort researching for the last two
weeks, we
are not able to find an answer. The problem is related to passing text from
an HTML form to the webserver.

From the user's perspective:
. we present the user a web page with a form.
. user fills the form
. user click on "Submit"
. the browser post the data entered to the server

From what I can gather so far, the data flow is followed:
. when the user click on the submit button, the browser
urlencoded the
data using the following algorithm:

The ASCII characters 'a' through 'z', 'A' through 'Z', and '0' through '9'
remain the same.
The space character ' ' is converted into a plus sign '+'.
All other characters are converted into the 3-character string "%xy", where
xy is the two-digit hexadecimal representation of the lower 8-bits of the
character.

The last rule will clip Unicode charater to an 8-bit
representation and
thus the data entered to the HTML form will not make it back to the web
server.

Have you have experience in this area? How does one capture the data
in
an HTML form in Unicode and send it along when user click on the "Submit"
button?