At 00/10/09 23:30 -0400, Yves wrote:
>Hello Martin,
>
>Thanks, I think I understand better now:
>
>There is nothing special to do to encode surrogates for XML, we just apply
>the UTF encodings. But *once parsed*, the XML text (or tags) cannot include
>the high or low part of a surrogate as single 'charatacter'. The XML char
>definition talks about scalar values (UCS as coded character set) not
>encoded ones (encodings of UCS).
>
>And now I assume it also means we cannot have a surrogate pair coded as 2
>NCRs. For example: <U+D801,U+DC05> would be written "&#x10405;" not
>"&#xD801;&#xDC05;"?
Yes, exactly!
Regards, Martin.