Mastering XML under Windows CE

Short Preface

To tell you the truth, I'm not an expert in XML at all. And before Pocket PC 2002, I did not need to be. But, these days are gone, and now Microsoft delivers a powerful XML parser as part of its mobile platforms. Still, it is not as state-of-the-art as the desktop one, but it has became useful enough. So, now you, as a programmer, may consider using XML as a storage layer for your application. XML today is a wide area, hardly coverable in one article, so here we will discuss only some basical aspects of XML usage in mobile applications.

XML may become a player in mobile games due to several reasons. Windows CE world is built on Unicode. This trivial fact often leads to an unpleasant issue: You should convert data from ASCII to Unicode and vice versa. Not a big deal really, but the standard API functions MultiByteToWideChar/WideCharToMultiByte don't work so well with languages other than English. Thus, you must implement your own convertor to be sure all is okay or use Unicode. If you should support different languages in your application, it may turn into a real headache. Data maintanance issues aren't worth being noted... You for sure may discover more reasons occuring in real projects.

XML gives us a nice opportunity to use ASCII files almost anywhere and anytime, even the same files with both desktop and WinCE. All you need to do is to use UTF-8 encoding and probably additional fonts for those code pages that are not included to a predefined set; for example, Hebrew or Arabic. If you port your application to Windows CE, XML will give you a consistent solution. Data size may be a painful point because XML can't be named as 'lightweight' technology, but some balanced XML structure always may be found. We will discuss this issue later in the article.

On early Pocket PC devices, there was (and still is) an XML parser as a part of HTML control. Beginning recently, Windows CE comes with a DOM XML parser of at least version 2.0. A SAX parser is supported only in Windows CE 4.0 and later. As with many other APIs, these parsers are not as rich as their desktop counterparts, but give us enough nice features. XPath queries are also supported; even that documentation often states the opposite.

How It Works in Native Code

Simple parser

First of all, let's consider the worst case—PocketPC 2000, where we have the only opportunity, a simple SAX-like XML parser. It was deprecated in XML 3.0 (under WinCE 4.0 and later), but you may find it on Pocket PC 2000/2002. I was surprised that I've failed to find any understandable examples about how people may use it. So, let's take a quick look at the general flow.

Actually, all seems to be pretty simple. The short theory may be formulated as follows:

Create an instance of the IXMLParser

Command the parser object from where it may receive data

Call the SetFactory method

Call the Run method to proceed parsing

Release the parser

The key trick here is that your need to implement IXMLNodeFactory to be able to proceed parsing. Then parser will call IXMLNodeFactory methods for each document node, just like SAX or expat. The next sample illustrates all said above:

As you may see, the parser has several sources of incoming data: URL, file (via IStream interface), and memory buffer. Please refer to WinCE help for additional details. After the parser's input and node factory are assigned, the Run method does its job. Now, let's focus on the node factory implementation. The header file is listed below:

As a matter of fact, that's about all the important technical stuff about this type of parser. All additional details about interfaces involved can be found in WinCE help. Pro and contra of this parser are similar to SAX versus DOM parsers. Obviously, it's faster; you may stop parsing at any time you want to; memory usage is less; and so forth. From the other side, DOM has a lot of nice features too.

DOM parser

As a practical programmer, I used to think that DOM is based on three main interfaces: IXMLDOMNode, IXMLDOMNodeList, and IXMLDOMNamedNodeMap. There are, of course, several important children of these interfaces; IXMLDOMDocument and IXMLDOMElement are just pretty useful ones. Thus, the typical loading of some XML document is a pretty simple thing:

As you see, all you need to do is to call the load method. Well, and use ATL's smart pointers to make your life easier. I'd like to mention several things here.

First of all, pay closest attention to the three 'put' calls, which disable aditional processing. This is the fastest way to load an XML document.

Another side effect of such an approach is that you may save on memory usage. It's well known that a DOM parser is expensive from a memory perspective. you should always keep this constraint in mind whenworking under Windows CE, even though recently devices have enough available memory to satisfy almost all possible needs. But, for large XML files, it may be a significant issue. If you're in such a situation, you may consider using SAX-like parsers. Another option is to balance the data/tags ratio, which immediately may reduce the file size to an acceptable value. Sometimes, using UCS-2 instead of utf-8 may help too, when your app works with languages requiring something like 3-byte for some characters.

Next, speaking frankly, you should 'smartly' use the smart pointers with XML because they don't provide casting to different inherited interfaces, so you won't be able to use, for example, IXMLDOMNodePtr and IXMLDOMElementPtr (inherited from IXMLDOMNode) at the same place. Second, sometimes the XML parser behaves weirdly and just removes the XML file it is going to load. A workaround is to read the file manually into the buffer, and then use the

IXMLDOMDocument::loadXML(BSTR bstrXML,VARIANT_BOOL *isSuccessful)

method instead. Keeping this in mind, all rest is a piece of cake. Suppose we have the following simple XML:

To obtain some data from a loaded XML document, you may use either XPath queries or 'direct' walking through document tree using IXMLDOMNode::get_firstChild and IXMLDOMNode::get_nextSibling (or their 'last' analogues for the opposite direction). Tree walking is the fastest method of surfing through the whole document. Nevetherless, if you need to find some data in a document tree (I guess it's the most common case), XPath works much better. IXMLDOMDocument has two methods to use with XPath: selectSingleNode and selectNodes to get one or several nodes, respectively. So, to get all contacts in the "business" category, you run the following query:

You'll find other examples of XPath queries in WinCE Help. Btw, many programmers are lazy enough and use operator "//" instead of the full path in queries. Well, you should know that it's not for free; it'll cost you up to 15% of your performance. In reality, it's hard to compare XPath versus tree walking precisey. get_firstChild and the others lead to a huge number of COM calls. selectSingleNode does the job in one single shot doing much less walking because it may skip a lot of text nodes. From the other side, selectSingle node and the like need to compare each node with a matching pattern. So, the actual performance depends on the XML document's structure. But, in general, XPath queries give you a better performance improvement.

The next factor that has an influence on parsing performance is validation. The bottom line here is that, by skipping such validation, you may get a double or triple decrease in loading an XML document.

If you're developing some kind of Web application or application that needs to create different reports based on XML data, you have one more option to think about. It's XSL, right. In some situations, it works 5-7 times faster that the trivial sequentual XPath approach to build output. We will not dive into too many details here, just put in a simple example to illustrate the idea:

XSL itself is a theme for many books, so here let me just note that it gives you a nice opportunity to modify the look and feel of output without any changes in the application logic. So once again, separating data from logic works just fine.

New Times

Windows CE 4.x is when C# comes to the mobile world. Still, it has some performance troubles, but that's the only way to use managed code under WinCE for now. C# has powerful support for XML, so the preceding examples dealing with DOM may be easily rewritten using C# because nothing's changed in terms of XML. Following is an example of simple XML parsing. This sample does not pretend to be a well-programming example; it just illustrates the technique. The code calls to a Web service and then fills in the listbox with the retrieved data.

Conclusion

It does not matter which version of WinCE your application is targeting. XML becomes a common technique anywhere. So, there are fewer and fewer reasons you won't use it.

About the Author

Alex Gusev started to play with mainframes at the end of the 1980s, using Pascal and REXX, but soon switched to C/C++ and Java on different platforms. When mobile PDAs seriously rose their heads in the IT market, Alex did it too. Now, he works at an international retail software company as a team leader of the Mobile R department, making programmers' lives in the mobile jungles a little bit simpler.

Please enable Javascript in your browser, before you post the comment! Now Javascript is disabled.