OutputFormat

The detailed behavior of a serializer is controlled by
an OutputFormat object.
This class can configure almost any aspect of serialization,
including setting the maximum line length,
changing the indenting, specifying which
elements have their text escaped as CDATA
sections, and more. There are even a
few options that have the potential to make your documents
malformed. For instance, if you add an element to the list of non-escaping
elements, then any reserved
characters like < and & that appear
in its text content will be output as themselves rather than
escaped as &lt; and
&amp;.

One of the most frequent requests for serializers is pretty
printing data with extra line breaks and indentation.
Within reasonable limits, the OutputFormat
class can provide this. Simply pass true to
setIndenting(),
pass the number of spaces you want each level to be indented
to
setIndent(), and pass the maximum
line length to
setLineWidth().
Example 13.1 demonstrates.

I think you’ll agree that this looks much more attractive
than the smushed together output from the bare serialization
without any extra white space. One warning, however: white
space is significant in XML. Adding this white space has
changed the document. This is not the same document as
existed before it was pretty printed. For this application
the extra white space is insignificant. However, this is not true
in all XML applications.

White space is just the beginning of what
the OutputFormat class can control.
Other features include the MIME media type, the XML declaration,
the system and public IDs for the document type,
which elements’ content should be escaped
as CDATA sections and more. Here are the various properties you can control by
invoking various methods on OutputFormat.
In some cases the default is document dependent.
When it’s not the default value is given in parentheses.

Method

This is normally set to one the three values
xml, html or
text, indicating the type of output that is desired.
The serializer uses this value to configure itself.
The default value is determined by the type of document being serialized.

The MIME media type for the output such as application/xml
or application/xhtml+xml. This will not be included in the document itself,
but may be used as part of the stream's metadata if it's written in to a file system
or onto an HTTP connection or some such.

This specifies the system and public IDs of the external DTD
subset given in the document type declaration.
These values are used only if the Document
being serialized does not contain a
DocumentType object of its own.

If true, then no document type declaration is output.
If false, a document type declaration is written.
If the document does not have a document type declaration
and none
has been set with setDoctype(),
then no document type declaration will be written,
regardless of the value of this property.

If true, then the serializer will add indents at each level
and wrap lines that exceed the maximum line width. If false it won't.
The number of spaces to indent is set by the indent property,
and the column to wrap at is set by the line width property.

Example 13.2 uses these methods to create a valid
MathML document encoded in ISO-8859-1 with a document type declaration,
an XML declaration, no comments, a 65 character maximum line width,
a two space indent, a standalone declaration with the value yes,
and the MIME media type application/xml:

You can imagine other requests for the serializer. For
example, maybe you want a line break after
each </mrow> end-tag but no line
breaks inside mrow elements.
OutputFormat doesn’t
give you enough control to arrange serialization at this
level of detail, but you could write a custom subclass of
XMLSerializer that accomplishes this.