Ignorable White Space

One of the more obscure parts of the XML 1.0 specification is
the perhaps misleadingly named “ignorable white space”.
This is white space that occurs between tags in places where
the DTD does not allow mixed content.
For example,
consider the XML-RPC document in
Example 6.13:

Example 6.13. A document that uses
ignorable white space to prettify the XML

This example has quite a bit of white space just for
indenting. In particular, the spaces, carriage returns, and
line feeds between <methodCall> and
<methodName>,
</methodName> and
<params>,
<params> and
<param>,
<param> and
<value>, </value> and
</param>,
</param> and
</params>, and </params>
and
</methodCall> only exist for indenting.
Furthermore, the DTD says that these elements cannot contain #PCDATA,
and therefore it’s known that this white space is ignorable.
Thus a validating parser will not pass these
white space characters to the
characters() method.
Instead it passes them to the
ignorableWhiteSpace() method.
A non-validating parser might do the same, or it might pass the
ignorable white space to the characters() method
instead. If this matters to you, make sure you use a validating
parser.

The space and line break characters in the string
element are not ignorable because the DTD
allows this element to contain #PCDATA.
This white space is passed to the characters() method
along with the words Red and
Hat. White space is considered
ignorable only where #PCDATA is invalid.

For purposes of this method, white space consists exclusively of the
ASCII space (&#x20;), tab (&#x9;),
carriage return (&#xD;), and line feed
(&#xA;). Unicode includes many more space characters
including new line (&#x85;), em space (&#x2003;),
en space (&#x2002;), and more. However, these characters
are never ignorable.

The
ignorableWhiteSpace() method has the
same arguments and the same caveats as the
characters() method.
For instance, there’s no guarantee that each call to this
method will contain the maximum contiguous run of ignorable
white space.
However, its text[] argument
should contain nothing except
space characters, tabs, carriage returns, and linefeeds,
at least in the sub-array delineated by
start and start+length.