>
>The workaround I usually suggest is to represent control characters
>with (references to) characters from the Unicode private use range.
>This makes the necessary transformation a simple character
>substitution (which can even be just a subtraction - no need for a
>table).
>
> -- Richard
Actually, as someone has already pointed out, 0x007F - 0x009F are fair game
for XML documents, and Unicode has these defined as control character
aliases.
Mapping 0x0000 - 0x001F to the private use area sounds like the "correct"
unicode thing to do, But for US-ASCII/UTF-8 documents I would map to 0x0080
- 0x009F instead.
This way you preserve the deprecated anglo centric english-only bigoted
assumption of 1 character == 1 byte.
The only downside is that someone might actually have data in this range. I
think this is about as likely as someone having data in the private use
area.
XSLT will not _ALWAYS_ give you a perfect output format.
XML --> XSLT --> simple_text_filter seems like a win to me.
-Wayne Steele
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
***************************************************************************
This is xml-dev, the mailing list for XML developers.
To unsubscribe, mailto:majordomo@xml.org&BODY=unsubscribe%20xml-dev
List archives are available at http://xml.org/archives/xml-dev/
***************************************************************************