The XML specification grants parsers a sometimes confusing
amount of leeway in processing XML documents.
Parsers are allowed to validate or not, resolve external
entities or
not, treat non-deterministic content models as errors or not,
support non-standard encodings or not, check for namespace
well-formedness or not, and much more.
Depending on exactly which choices two parsers make for all
these options, they can actually produce quite different
pictures of the same XML document. Indeed, in a few cases one
parser may even report a document to be well-formed while
another reports that the document is malformed.

To support the wide range of capabilities of different
parsers, the XMLReader
interface that represents parsers in SAX
is quite deliberately non-specific. It can be
instantiated in a variety of different ways. It can
read XML documents
stored in a variety of media. It can be configured with
features and properties both known and unknown.
This chapter explores in detail the configuration and use of
XMLReader objects.

Building Parser Objects

Since XMLReader is an
interface, it has no constructors. Instead you use the
static factory method XMLReaderFactory.createXMLReader()
to retrieve an instance of
XMLReader.
In fact, there are two such methods in the
XMLReaderFactory class:

The first one returns the default XMLReader
implementation
class.
This is specified by the org.xml.sax.driver Java
system property.
Parser vendors are supposed to modify this method to return an
instance of their own parser in the event that this property is not set,
though in practice few do this. Consequently when running a
program that relies on XMLReaderFactory.createXMLReader()
you may want to set the org.xml.sax.driver Java
system property at the command line using the -D
flag to the
interpreter like this:

If there are multiple
versions of the SAX classes in your class path, then
whichever one the virtual machine finds first gets to choose
which XMLReader
implementation
class to give you. However, if you know you want a specific
class (e.g.
org.apache.xerces.parsers.SAXParser)
then you can ask for it by fully package-qualified name using the second
XMLReaderFactory.createXMLReader()
method. For example, this code asks for the Xerces parser by
name:

If the class you request can’t be located, createXMLReader()
throws a SAXException.
Since there’s no guarantee that any particular parser is
installed on any given system where your code may run,
you should be prepared to catch and respond to this.
Normally the correct response is to fall back to the default
parser, like this:

Alternately, you can try multiple known parser classes until you find one
that’s available. This code searches for several of the major
parsers in my personal order of preference, only falling back
on the default parser if none of these can be found:

I use this technique in my working code; and you’re
more than welcome to
copy it. However, because it’s quite
long and repetitive, I’ll mostly stick to one named parser and
a fallback to the default in the examples in this book.

I also occasionally see programs that use a constructor to
retrieve an instance of a particular class. For example,

XMLReader parser = new SAXParser()

Or, worse yet,

SAXParser parser = new SAXParser()

This doesn’t let you do anything you can’t do with
XMLReaderFactory.createXMLReader().
However, it does tie your code tightly to one particular
parser and makes it a little more difficult to change parsers
at a later date. At an absolute minimum, swapping in a
different parser will require an edit and a recompile.
However, if you use
XMLReaderFactory.createXMLReader() instead,
you can change parsers without even having access to the source
code.