The Java Tutorials have been written for JDK 8. Examples and practices described in this page don't take advantage of improvements introduced in later releases.

Handling Lexical Events

At this point, you have digested many XML concepts, including DTDs and external entities. You have also learned your way around the SAX parser. The remainder of this lesson covers advanced topics that you will need to understand only if you are writing SAX-based applications. If your primary goal is to write DOM-based applications, you can skip ahead to
Document Object Model.

You saw earlier that if you are writing text out as XML, you need to know whether you are in a CDATA section. If you are, then angle brackets (<) and ampersands (&) should be output unchanged. But if you are not in a CDATA section, they should be replaced by the predefined entities &lt; and &amp;. But how do you know whether you are processing a CDATA section?

Then again, if you are filtering XML in some way, you want to pass comments along. Normally the parser ignores comments. How can you get comments so that you can echo them?

This section answers those questions. It shows you how to use org.xml.sax.ext.LexicalHandler to identify comments, CDATA sections, and references to parsed entities.

Comments, CDATA tags, and references to parsed entities constitute lexical information-that is, information that concerns the text of the XML itself, rather than the XML's information content. Most applications, of course, are concerned only with the content of an XML document. Such applications will not use the LexicalEventListener API. But applications that output XML text will find it invaluable.

Note - Lexical event handling is an optional parser feature. Parser implementations are not required to support it. (The reference implementation does so.) This discussion assumes that your parser does so.

How the LexicalHandler Works

To be informed when the SAX parser sees lexical information, you configure the XmlReader that underlies the parser with a LexicalHandler. The LexicalHandler interface defines the following event-handling methods.

comment(String comment)

Passes comments to the application.

startCDATA(), endCDATA()

Tells when a CDATA section is starting and ending, which tells your application what kind of characters to expect the next time characters() is called.

startEntity(String name), endEntity(String name)

Gives the name of a parsed entity.

startDTD(String name, String publicId, String systemId), endDTD()

Tells when a DTD is being processed, and identifies it.

To activate the Lexical Handler, your application must extend DefaultHandler and implement the LexicalHandler interface. Then, you must configure your XMLReader instance that the parser delegates to, and configure it to send lexical events to your lexical handler, as shown below.

Here, you configure the XMLReader using the setProperty() method defined in the XMLReader class. The property name, defined as part of the SAX standard, is the URN, http://xml.org/sax/properties/lexical-handler.

Finally, add something like the following code to define the appropriate methods that will implement the interface.