From the author of

From the author of

HTML processing is something that Java programs must commonly do. Although
there are several third-party tools to do this for Java programs, Java actually
contains HTML processing as part of Swing. In this article, I will show you how
to make use of the HTML processing capabilities that are built into Java.

Although Swing contains HTML processing capabilities, it is not totally
straightforward about how to use them. Swing needs HTML processing internally to
display HTML text, but using HTML processing outside of Swing can be a bit more
difficult. In the following sections, I will show you the classes that Swing
makes available for you to use and how you can access them.

Using HTMLEditorKit.Parser

The Parser class, which is an inner class of the HTMLEditorKit class, is
provided by Swing to facilitate the parsing of HTML. Actually, instantiating
this class is not an easy task. It almost appears that the HTML parsing
facilities of Swing were not meant to be used externally; instead, their
availability is more a side effect than a feature. This is particularly evident
by the way in which you must instantiate a class of HTMLEditorKit.Parser.

The only way to instantiate an HTMLEditorKit.Parser object is by overriding
the getParser method of HTMLEditor kit to make it public. A class that does this
is shown in Listing 1.

Parser objects are instantiated by calling the getParser method of
HTMLEditorKit. Unfortunately, this method does not have public access. The only
way to call getParser is by overriding getParser to a public member function in
a subclass. This is exactly what the HTMLParse class is used for. After you have
obtained a Parser class, you should call the parse method of Parser and pass it
a callback class.

The above code assumes that you have just retrieved an HTML page as a string.
The page is used to create a StringReader that will then be passed to the parse
method. The variable callback is assumed to hold a valid callback object.

parse.parse(r,callback,true);

This callback object is called repeatedly for each type of tag contained in
the HTML stream. (The structure of a ParserCallback class is discussed in the
next section.)