At 11:24 AM -0700 4/13/01, Jason Hunter wrote:
>The general "right" solution is probably to check each character as it
>goes out and if it's not in the chosen encoding's character set then
>output a char entity. The problem is that for many encodings such a
>check isn't fast at all (less than this, greater than that, less than
>this, greater than that), nor is the information about which chars are
>in which character set easily available (to my knowledge).
>
JDK 1.4 should make this available and a lot easier. It is available
now. but you either need to use some undocumented classes in the sun
packages or use some very inconvenient and probably slow APIs.
As to slowness, there are some strong optimizations we can do for the
most common cases; e.g. ASCII, Latin-1, UTF-8, and all other Unicode
variants. We'd only need to take the performance hit on non-Latin-1
characters in non-Unicode environments.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible (IDG Books, 1999) |
|http://metalab.unc.edu/xml/books/bible/ |
|http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://metalab.unc.edu/javafaq/ |
| Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/ |
+----------------------------------+---------------------------------+