Tag: XML

I recently received a bug report about the free space calculation in gnome-vfs-obexftp. At the moment, the code exposes a single free space value for the OBEX connection. However, some phones expose multiple volumes via the virtual file system presented via OBEX.

It turns out my own phone does this, which was useful for testing. The Nokia 6230 can store things on the phone’s memory (named DEV in the OBEX capabilities list), or the Multimedia Card (named MMC). So the fix would be to show the DEV free space when browsing folders on DEV and the MMC free space when browsing folders on MMC.

Doing a bit of investigation, I found that the information I wanted was in the folder listings:

I took a look through the OBEX specification, and this mem-type wasn’t defined. So it looked like a Nokia extension. Doing a quick search, the closest I came to a describing it was US patent application #20060095537.

So Nokia is effectively trying to patent an XML attribute. I’d seen a lot of bad patents, but this seemed particularly weak. The patent application even comes with a useful diagram explaining the invention:

As far as I can tell, using the information returned from the phone wouldn’t be covered by the patent (if it gets issued, that is). So it should be fine to use the information to calculate free space more accurately.

Started playing with nxml-mode, which makes editing XML much nicer in emacs (psgml-1.3 does an okay job, but the indenter and tag closer sometimes get confused by empty elements). There is a nice article about nxml-mode on xmlhack which gives an introduction to the mode.

The first thing that struck me about nxml in comparison to psgml was the lack of syntax highlighting. It turned out that the reason for this was that colours were only specified for the light background case, and I was using a dark background. After setting the colours appropriately (customise faces matching the regexp ^nxml-), I could see that the highlighting was a lot better than what psgml did.

One of the big differences between nxml and psgml is that it uses RELAX-NG schemas rather than DTDs. It comes with schemas for most of the common formats I want to edit (xhtml, docbook, etc), but I also wanted to edit documents in a few custom formats (the module description files I use for jhbuild being a big one).

Writing RELAX-NG schemas in the compact syntax is very easy to do (the tutorial helps a lot). I especially like the interleave feature, since it makes certain constraints much easier to express (in a lot of cases, your code doesn’t care what order the child elements occur in, as long as particular ones appear). While it is possible to express the same constraint without the interleave operator, you end up with a combinatorial explosion (I guess that’s why XML Schema people don’t like RELAX-NG people making use of it). For example, A & B & C would need to be expressed as:

(A, B, C) | (A, C, B) | (B, A, C) | (B, C, A) | (C, A, B) | (C, B, A)

(for n interleaved items, you’d end up with n! groups in the resulting pattern).

After writing a schema, it was a simple matter of dropping a schemas.xml file in the same directory as my XML documents to associate the schema with the documents. This is required because RELAX-NG doesn’t specify a way to associate a schema with a document, so nxml has its own method. Matching rules can be based on file extensions, document element names, XML namespaces or public IDs, but I used the document element name for simplicity. You can specify other locations for schema locator rules, but putting it in the same directory is the easiest with multiple developers.

Once that is done, you get background revalidation of the document, and highlighting of invalid portions of the document (something that psgml doesn’t seem to be able to do). It also says whether the document is valid or not in the modeline, which is helpful when editing documents.

Now all we need is for libxml2 to be able to parse RELAX-NG compact syntax schemas …

Have been playing round with Atom, which looks like a nicer form of RSS. Assuming your content is already in XHTML, it looks a lot easier to generate an Atom file compared to an RSS file, because the content can be embedded directly, rather than needing to be escaped as character data. Similarly, an Atom file is easier to process using standard XML tools compared to RSS because the document only needs to be parsed once to get at the content (which is probably what you were after anyway).

I decided to take a look at what would be necessary to get Advogato to produce nice Atom feeds. One of the difficulties is that all the content is stored in plain non XML compatible HTML. After a little bit of head scratching I realised that libxml can already do this kind of normalisation without much trouble as it already has an HTML parser that produces a DOM tree compatible with its XML parser/dumper APIs.

I did some simple test programs in Python and C. I wonder whether code like this could be used directly in the diary posting code? With some small extensions, it would be pretty easy to implement tag/attribute sanitisation, and double new line to new paragraph conversion (the current implementation of this is quite annoying — it still adds extra <p> tags for new lines that are clearly outside of a paragraph).