3 Answers
3

Just elaborating on David's answer, XML doesn't rule out any value in a text node (except for very few reserved characters) as long as they are valid in the current encoding.

There are a few missing facts from your question:

Are you producing this XML using a text editor?
If this is true, then you must check what encoding are you using when saving the file. Try UTF-8. If your documents are produced using "windows" encoding then try adding an encoding attribute to the XML control tag, i.e., <?xml version="1.0" encoding="iso-8859-1"?>.

Are you producing this XML using Delphi String functions?
If this is the case, the encoding used by Delphi is by default UTF-8, but you can inadvertently mix it with other encodings if you are reading fragments from external sources. For this problem there is no silver bullet, except for using your XML library built-in functions to create XML.

When I have had to deal with these things (for XML signatures, no less!) I resorted to use wrappers for any string used, and use explicit encodings (I use type Latin1String = type AnsiString(28591).)

Thanks, this is the correct answer. I originally produced the file from Delphi using whatever defaults it gave me. However I then opened and edited the file using Notepad++, which seems to revert it back to ANSI. So I went to the Format menu, selected the UTF-8 option and then made sure my ° character looked correct and BAM - it worked! Thanks a lot, learnt something new already and it's only 10:00am on Monday morning!
–
Rick WheelerFeb 24 '13 at 23:11

Here you have directly encoded the character. Whether or not your code can parse this depends on the charset used by your XML document. So, if your XML document uses UTF-8 and is correctly encoded then your XML code will be able to parse it.

V2: <Item Id="1" Description="90&deg; Hinge"/>

This uses a named entity, deg. In XML there are only five pre-defined named entities: quot, amp, apos, lt, gt. It is possible for an XML document to define other named entities, however that is unusual. So, it would seem that deg is not a valid named entity for your document.

As to what you should do going forwards, we can immediately rule out the named entity. I would also recommend avoiding wholesale use of NCRs for all non-ASCII characters. That just leads to unreadable documents. Of course, if you must use a non-Unicode aware tool to process the document then using NCRs is the only approach.

So that leaves us with directly encoding non-ASCII characters. You should make sure that your XML is properly encoded using the UTF-8 charset and that approach will work well, and lead to readable and clean documents.

I checked my XML document and it was not encoding using UTF-8, so I changed the XML header to <?xml version="1.0" encoding="UTF-8"?> but this did not seem to have any effect at all. I'm using MSXML and it still seems to raise an error with the ° encoding in UTF-8.
–
Rick WheelerFeb 24 '13 at 22:50

1

Changing the header doesn't change the encoding of the actual file. If the file is properly encoded you can directly encode any Unicode character.
–
David HeffernanFeb 24 '13 at 22:58

It seems my text editor Notepad++ switched my encoding back to ANSI (see below), so no matter what I put in the XML header it was not actually encoded that way at all.
–
Rick WheelerFeb 24 '13 at 23:14

No. Notepad++ doesn't change your encoding. Your file was ANSI all along. As I said, saying in the header that the file is utf8 doesn't make it utf8. I already told you that if your file is correctly encoded then you can directly encode any Unicode character apart from reserved chars like < > & and so on.
–
David HeffernanFeb 24 '13 at 23:27

Ok, so does that mean that Delphi doesn't generate the file correctly encoded in UTF-8? I created the file using the TXMLDocument, created my nodes and then saved it to file. All I did after that was open it in Notepad++ and make some minor edits. Somewhere became ANSI format, either directly from Delphi or after saving in Notepad++. Any idea when that would happen?
–
Rick WheelerFeb 25 '13 at 0:12

Delphi itself doesn't parse the XML at all. A third party XML engine does, whether it be MSXML, OpenXML, AtomXML, etc. The TXMLDocument component and supporting interfaces are just a wrapper framework, the bulk of the parsing is done by someone else.

V1 may or may not be malformed. It depends on XML's actual charset.

V2 is actually not standard. Not all XML engines support it. Clearly, the one you are using with Delphi does not.

Regarding V3, I would say "all the above XML engines support that syntax" (or at-least should support it).
–
kobikFeb 22 '13 at 13:45

1

@Kobik, if there's something that doesn't support numeric character references, I'd struggle to classify it as an XML parser.
–
Rob KennedyFeb 22 '13 at 13:54

@kobik NCRs are part of the XML standard so if a parser can't handle them, it's not an XML parser.
–
David HeffernanFeb 22 '13 at 13:55

It seems my text editor Notepad++ switched my encoding back to ANSI (see below), so no matter what I put in the XML header it was not actually encoded that way at all.
–
Rick WheelerFeb 24 '13 at 23:14