I am using the internal WST DOM model
(org.eclipse.wst.xml.core.internal.provisional.document.IDOM Model) to
parse XML documents, because it is the only one I could find that is able
to retrieve the original textual offsets of an XML node in the source
document. I couldn't figure out a way to do this with w3c.dom or jdom.

Is there maybe a SAX parser with the same functionality hidden somewhere
in the WST packages? I don't need the XML context, and it would make the
parsing much faster.

Gerrit wrote:
> I am using the internal WST DOM model
> (org.eclipse.wst.xml.core.internal.provisional.document.IDOM Model) to
> parse XML documents, because it is the only one I could find that is
> able to retrieve the original textual offsets of an XML node in the
> source document. I couldn't figure out a way to do this with w3c.dom or
> jdom.
>
> Is there maybe a SAX parser with the same functionality hidden somewhere
> in the WST packages? I don't need the XML context, and it would make the
> parsing much faster.

WST does implement it's own SAX parser api, that does keep track of the
beginning column and ending column number of a tag. It's buried within
the XML validation routines as an internal class. So isn't callable
directly.

As you noticed the w3c.dom and jdom themselves don't keep track of this
information.

David Carver wrote:
> WST does implement it's own SAX parser api, that does keep track of the
> beginning column and ending column number of a tag. It's buried within
> the XML validation routines as an internal class. So isn't callable
> directly.
>
> As you noticed the w3c.dom and jdom themselves don't keep track of this
> information.

I wouldn't say it's SAX exactly, since it doesn't even try to
implement that API, and actually has more in common with StaX (which
came along a lot later). If you make use of the platform's file
buffers APIs, the documents in the ITextFileBuffers for XML files
implement IStructuredDocument. That then breaks down into
IStructuredDocumentRegions, which for XML represent the start and
end tags individually. Inside of those are ITextRegions marking the
positions of the various syntactic tokens that make up the tag.

Nitin Dahyabhai wrote:
> David Carver wrote:
>> WST does implement it's own SAX parser api, that does keep track of
>> the beginning column and ending column number of a tag. It's buried
>> within the XML validation routines as an internal class. So isn't
>> callable directly.
>>
>> As you noticed the w3c.dom and jdom themselves don't keep track of
>> this information.
>
> I wouldn't say it's SAX exactly, since it doesn't even try to implement
> that API, and actually has more in common with StaX (which came along a
> lot later).

I was referring to the Xerces Valadition routines which do override and
implement a SAX parser that keeps track of line numbers. It extends the
SAX pieces, when it instatiates the Xerces parser for validation against
a grammar.

The problem with the above approach that while it can be done, it's not
one of the typical API's that most XML programmers are going to be
familiar with. It's again one of the reasons I keep strongly urging
WTP to try and leverage and use more of the existing XML APIs as much as
possible instead of re-inventing the wheel. It makes it much easier for
adopters familiar with these technologies to implement. It may be a
shell that wraps the underlying Eclipse API, but having these wrapper
classes can make adoption easier, just as has been partially done with
the DOM implementation of the SSE.