Decodes any XML character references and standard XML entity references in
the text as well as removing any carriage returns. It's intended to be used
on the text fields of element tags and on the values of start tag
attributes.

There are a number of characters that either can't be directly represented
in the text fields or attribute values in XML or which can sometimes be
directly represented but not always (e.g. an attribute value can contain
either a single quote or a double quote, but it can't contain both at the
same time, because one of them would match the opening quote). So, those
characters have alternate representations in order to be allowed (e.g.
"&lt;" for '<', because
'<' would normally be the beginning of an entity).
Technically, they're entity references, but the ones handled by decodeXML
are the ones explicitly defined in the XML standard and which don't require
a DTD section.

Ideally, the parser would transform all such alternate representations to
what they represent when providing the text to the application, but that
would make it impossible to return slices of the original text from the
properties of an Entity.
So, instead of having those properties do the transformation themselves,
decodeXML and asDecodedXML do that so that the application can choose to do
it or not (in many cases, there is nothing to decode, making the calls
unnecessary).

Similarly, an application can choose to encode a character as a character
reference (e.g. '&#65" or
'&#x40" for 'A'). decodeXML will
decode such character references to their corresponding characters.

However, decodeXML does not handle any entity references beyond the five
predefined ones listed below. All others are left unprocessed. Processing
them properly would require handling the DTD section, which dxml does not
support. The parser considers any entity references other than the
predefined ones to be invalid XML, so unless the text being passed to
decodeXML doesn't come from dxml's parser, it can't have any entity
references in it other than the predefined ones. Similarly, invalid
character references are left unprocessed as well as any character that is
not valid in an XML document. decodeXML never throws on invalid XML.

Also, '\r' is not supposed to appear in an XML document
except as a character reference unless it's in a CDATA section. So, it
really should be stripped out before being handed off to the application,
but again, that doesn't work with slices. So, decodeXML also handles that.

Specifically, what decodeXML and asDecodedXML do is

convert &amp; to &

convert &gt; to >

convert &lt; to <

convert &apos; to '

convert &quot; to "

remove all instances of \r

convert all character references (e.g.
&#xA;) to the characters that they
represent

All other entity references are left untouched, and any '&'
which is not used in one of the constructs listed in the table as well as
any malformed constructs (e.g. "&Amp;" or
"GGA2;") are left untouched.

The difference between decodeXML and asDecodedXML is that decodeXML returns
a string, whereas asDecodedXML returns a lazy range of code
units. In the case where a string is passed to decodeXML, it
will simply return the original string if there is no text to decode
(whereas in other cases, decodeXML and asDecodedXML are forced to return
new ranges even if there is no text to decode).

Parameters:

R range

The range of characters to decodeXML.

Returns:
The decoded text. decodeXML returns a string, whereas
asDecodedXML returns a lazy range of code units (so it could be a
range of char or wchar and not just dchar; which it
is depends on the code units of the range being passed in).

Deprecated

normalize has been renamed to decodeXML, and asNormalized has been
renamed to asDecodedXML. It was pointed out that there's a fairly
high chance that std.uni.normalize would be used in
conjunction with dxml, making conflicts annoyingly likely. Also, there was
no good opposite for normalize for the functions that became
encodeAttr and encodeText. denormalizeAttr and
denormalizeText would arguably have been a bit ugly.

These aliases have been added to avoid code breakage when upgrading from
dxml 0.2.*. They will be removed in dxml 0.4.0.

This parses one of the five, predefined entity references mention in the XML
spec from the front of a range of characters.

If the given range starts with one of the five, predefined entity
references, then it is removed from the range, and the corresponding
character is returned.

If the range does not start with one of those references, then the return
value is null, and the range is unchanged.

Std Entity Ref

Converts To

&amp;

&

&gt;

>

&lt;

<

&apos;

'

&quot;

"

Any other entity references would require processing a DTD section in order
to be handled and are untouched by parseStdEntityRef as are any other types
of references.

Parameters:

R range

A range of characters.

Returns:
The character represented by the predefined entity reference that
was parsed from the front of the given range or null if the range
did not start with one of the five predefined entity references.

Strips the indent from a character range (most likely from
Entity.text).
The idea is that if the XML is formatted to be human-readable, and it's
multiple lines long, the lines are likely to be indented, but the
application probably doesn't want that extra whitespace. So, stripIndent
and withoutIndent attempt to intelligently strip off the leading
whitespace.

For these functions, whitespace is considered to be some combination of
' ', '\t', and '\r'
('\n' is used to delineate lines, so it's not considered
whitespace).

Whitespace characters are stripped from the start of the first line, and
then those same number of whitespace characters are stripped from the
beginning of each subsequent line (or up to the first non-whitespace
character if the line starts with fewer whitespace characters).

If the first line has no leading whitespace, then the leading whitespace on
the second line is treated as the indent. This is done to handle case where
there is text immediately after a start tag and then subsequent lines are
indented rather than the text starting on the line after the start tag.

If neither of the first two lines has any leading whitespace, then no
whitespace is stripped.

So, if the text is well-formatted, then the indent should be cleanly
removed, and if it's unformatted or badly formatted, then no characters
other than leading whitespace will be removed, and in principle, no real
data will have been lost - though of course, it's up to the programmer to
decide whether it's better for the application to try to cleanly strip the
indent or to leave the text as-is.

The difference between stripIndent and withoutIndent is that stripIndent
returns a string, whereas withoutIndent returns a lazy range
of code units. In the case where a string is passed to
stripIndent, it will simply return the original string if there is no
indent (whereas in other cases, stripIndent and withoutIndent are forced to
return new ranges).

Parameters:

R range

A range of characters.

Returns:
The text with the indent stripped from each line. stripIndent
returns a string, whereas withoutIndent returns a lazy range
of code units (so it could be a range of char or wchar
and not just dchar; which it is depends on the code units of
the range being passed in).