Summary
Wouldn't it be nice if there existed a standard for XML when you don't need the whole thing? I propose a strict subset of XML called XML minus minus (XML--).

Advertisement

Often in my work, I find XML would be very handy, (e.g. configuration files, data serialization, etc.), but that I only need a fraction of the specification. In these cases it doesn't make sense to embed a gargantuan industrial strength XML parser in my code, so I usually use a homegrown partial XML parser.

Inventing a brand new markup language is one possibility I have explored (such as Labelled S-Expressions), but to be honest I don't think the idea will take off. Many markups exist, and people prefer things that are already well-known (and well marketed) such as XML.

I want to propose a specification for a strict subset of XML called XML--. The specification is the same as that for XML 1.0 (third edition) but with the following restrictions:

no attributes

no CDATA sections

no processing instructions

no document type definitions

the standalone document declaration MUST have the value "yes"

the encoding must be UTF-8

support only for the entities: &amp; &gt; &lt; &quot; and &apos;

This work is inspired by TinyXML. The main difference though is that XML-- does not support attributes. I wonder what more could be done, to make this idea into a viable specification with actual users?

Postscript: Justification for Dropping Attributes

I should just make a quick justification, the attributes are dropped for several reasons:

What about namespaces? They are responsible for much of XML's bloat*. Plus, TinyXML does not support them.

The problem with eliminating attributes in that the result tends to be very big. You're likely to have no empty elements, so every tag will be repeated ( <tag>...</tag>, instead of <tag/> ). So file size can be ~twice as large as XML with many attributes.

I still like this idea, though. But only if as part of the specification there's a compression recommendation (XML-- will ZIP pretty good), and it also eliminates namespaces.

Noam.

* I realise that namespaces are essential for large applications and vendor interoperability. They are still bloated, however.

/* The problem with eliminating attributes in that the result tends to be very big. You're likely to have no empty elements, so every tag will be repeated ( <tag>...</tag>, instead of <tag/> ). So file size can be ~twice as large as XML with many attributes.*/

I think part of the bloat CDiggins was referring to is in the parser (needing to parse two separate syntaxes).

However, you are very right that start and end tags do lead to unnecessarily bloated files that need parsing.

I love the idea of removing features. Many of these are things many people don't know about, or use. Removing them makes things simpler, more predictable, and generally better.

Attributes are convenient syntax, but useless model, because, as pointed out, you can trivally rewrite them as elements.

So how complicated is the rewrite?

We already agree that <p></p> is the same as <p/>, why not agree that:

<a href="blah">...</a>

Is the same as:

<a><href>blah</href>...</a>

Don't attributes parse as something like:

([:whitespace:]+[:word:]+=("|')[^"]*(\2))*>

Removing the distinction between attributes and elements removes a decision people would otherwise have to make designing schemas. Keeping the syntax maintains the benefits others have described (size and convenience).

When using XML for config, I've been doing something close to what you propose. In languages with reflection (C#, Java) this works quite well. When you encounter something like:

<something>somevalue</something>

this means that in the current context, we go look for an attribute "somevalue". Once found, we see if it's a property, field or function. In this case, we'd either call the function with 'somevalue' as parameter or set the field or property to 'somevalue'. How to pass somevalue (as string or something else) is done by requesting the type information, and asking the type information for a converter from string to appropriate type (e.g. Color, Integer, ...).This severely limits the allowed XML, approximately to what you propose. When needed, you can always extend the parser with more complex syntax (for example, i did something for accessing arrays and other indexed properties).

By nesting, you can "change context", e.g.:<myobject><something>somevalue</something><someother >12</someother></myobject>would set two properties in the object myobject. This behaviour is triggered by the content being more XML.

This worked very well in C#, but when I tried something like that in Python, I bumped into the fact that Python is equipped with excellent reflection, but objects don't have static types so you cannot decide whether you should set the someother property to a string or an integer.

In Python I took th Pythonic solution for configuration: Write config files in Python. Any one will understand and be able to modify a config file that says:

color="red"users = {"pete": "Peter Cambell", "claus": "Santa Claus"}

In fact, I find this syntax much more readable (and writable) than XML. XML may be ASCII printable, but it certainly is not human readable.

> Attributes are in fact very useful in XML, when using it> to merge data with existing data.> > Suppose I send this XML to some business logic component> that manages my data. I never explained it anything about> my table structure, but it can still decide what to do:> > <user login="mike" host="mymachine">> <fullname>Mike Looijmans</fullname>> <occupation>Developer</occupation>> </user>> > This instructs my data storage to:> - Lookup a user "mike" on host "mymachine"> - If it does not exist, create a new entry for it> - Update the fullname and occupation for that entry.

I need to point out that the XML record only instructs the data storage because the data-storage has predetermined that that is what the login attribute and host mean. It could have just as easily read that data as XML elements. It all depends on how you want to interpret the data.

> Without the attributes, i would have a hard time> explaining which of the fields should be considered as> "primary key", and which only supply additional data.