This is a small XML parser, based purely on STL. There are two main classes
XmlStream and XmlParser. XmlParser.h contains most of the parsing code.
It has several state variable, which can be split up into two categories:

Buffer state - Shows where we are parsing from

Parsing state - Shows what we have found

XmlParser makes extensive use of offsets to keep track of its state. This is
done by design. In order to maximize speed it does not do any string copies.

To start parsing declare an instance of the class XmlStream and setup the buffer
that you want to parse. An example is included in Parser.cpp. Call parse
in XmlStream, passing in a pointer to the buffer and the buffers length.
You will see screen output showing what has been found. This is simple
debug output and can be turned off.

XmlNotify is used as an interface class to notify a subscriber
of nodes and elements being found. There is a pointer to a subscriber
in the XmlStream class. The subscriber can be set using setSubscriber.

Notice that no XML document declaration is included nor
is a schema included. If those exist in your buffer don't
send them to the parser. Later, the ability to remove these, will added in the code to
step through these. so this is a non-validating parser. There is one bug in the parser.
When an empty node is encountered it will be reported as an element
this will be fixed later. An example of this is included in the sample code.

If you have any suggestions or improvements let me know.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Share

About the Author

Comments and Discussions

has anyone tried to make this code run on vc2005?
I get alot of cannot convert type string to iterator etc etc... I finally got around the problems
by not using this code, but I find myself wondering why the new compiler is so string on converting pointers to iterators.

Perhaps this has been talked a lot but I can not stop my self from writing this.
One of the reasons of XML is connecting platforms, apps through text encoded messages, to achieve this you need compliance. Implementing a fully (95%-98%) compliant XML parser is really hard. Usually an average Dev can get to 80% pretty quickly and the experienced can get to %90. The rest is all egde cases and getting them right requires extensive testing suites to ensure compat between different platforms. I know this by experience. There are quite a few xml parser implementations out there. Users are usually best served when the xml parser implementations are recognized by the W3C Recommendations. Other implementations can not go any further than academic studies as the cost of them making interop with the rest of the world sky rockets as dev's are trying to achieve %95 compat.

If I have xml scheme like below and I like to get the attribute value such as 22805, SA287430035 or ENV_LAN_1. How would I do it? thank in advance.--------------------------------------------------------------------------------<?xml version="1.0" encoding="utf-8"?><LOADBOX><row BoxID="22805" BoxSerialNumber="SA287430035" BoxTest="ENV_LAN_1"><row BoxID="77688" BoxSerialNumber="SA287460015" BoxTest="ENV_LAN_5"><row BoxID="15141" BoxSerialNumber="HK184700609" BoxTest="ENV1"></LOADBOX>

in this example the <HOST> and <IP> tags both have multiple entries. the XMLDialog displays the host cod for both entries, and only 1 IP address, it would be nice to be able to address these in the read/write methods like:

I found the same problem, and believe it is related to this code in XmlStream.cpp:

// if new parse cur position is in the last parser// last tag position we are done with the nodechar * curPos = parserNode.getCurPos();char * lastCurPos = parser.getLastTagPos();if ( curPos >= lastCurPos ){ break;}

This code seems to assume that if you hit the end of a node at a certain level of XML heirarchy, you also must be done with that entire heirarchical level altogether. (When first </HOST> is hit, there won't be any more <HOST> entries to follow).

I simply removed that code and allowed the above parser.parse() method to do what it does, and that seemed to fix the problem.