Tuesday, May 20, 2008

In this post a quick way to process XML files using a SAX-like method in SNOBOL4 is presented.

I couldn't find a tool for XML parsing/processing for SNOBOL4 so I decided to try to create one to learn more about the language. I decided to use SAX method because is simpler to implement and lets me focus in the text processing part of the code .

Since SNOBOL4 works consumes the input line by line, a function to flatten all the input was created. This helps by eliminating the problem of considering line breaks, but it also makes the code very inefficient since it creates a big line with all the code in in the XML file.

This code keeps track of the position in the string where the last XML element structure matched by using the iPos variable. The '@' symbol followed by a variable records the position in the input string at a given moment.

Each part of this function marked by the labels XmlDirectiveL, TagStartL, EndTagL, BlanksL, TextL matches one XML element and calls a callback function specified by the fTStart, fTEnd and fText parameters. The call is made by using the APPLY function.

XML TestInto unoid=Into dosid=3Text: asdf Into tresid=4Text: h hh Out of tresInto cuatroid=Text: iasdl Out of cuatroInto cincoid=42Out of dosOut of uno

The benefit of using a SAX-like approach is that the code could be reused for other programs. For example the following program prints all the links and the titles from an OPML file from Google Reader.

Here the multiple results generated by the %match construct is used to fill the list with all the rows.

Something that might be confusing is that the %match pattern seems to be looking for a single item with title as its only child (because of the lack of _* constructs). This is something specific to the Xml literal syntax, as documentation says implicit _* constructs are added between Xml literals.

Mapping categories

The categories method maps each category and is equivalent XSLT template that matches a category.

Although I'm not a big fan of Xml literals, it is nice way to create a new tree compared to using W3C DOM classes. Xml literals are used in several languages today such as Scala or Visual Basic 9. A nice alternative is Groovy Builders which provide a nice syntax to create tree structures that is independent of the backend .

One of the things that was missing(at least from the documentation) was direct support for Xml namespaces which is useful when working with multiple Xml Schemas from different sources.

It is possible to use Tom object mappings(explained here) to get a similar effect.

Example

For this example, an alternative representation for a Complex number will be created. A complex number could be represented by Cartesian coordinates (real and imaginary parts) and by Polar coordinates (angle and modulus).

Finally a mapping for the Polar representation. Since the Complex class stores the number in Cartesian coordinates a conversion must be applied to get the angle and modulus slots. Also the polar2Complex method is used to create a Complex instance using the Polar symbol.