From ...
From: Erik Naggum
Subject: Re: Lisp XML parser ?
Date: 2000/06/23
Message-ID: <3170708641147673@naggum.no>#1/1
X-Deja-AN: 637854920
References: <39523341.CED20EE@bbn.com> <3170682110777797@naggum.no>
mail-copies-to: never
Content-Type: text/plain; charset=us-ascii
X-Complaints-To: newsmaster@eunet.no
X-Trace: oslo-nntp.eunet.no 961721199 29880 195.0.192.66 (23 Jun 2000 00:46:39 GMT)
Organization: Naggum Software; vox: +47 8800 8879; fax: +47 8800 8601; http://www.naggum.no
User-Agent: Gnus/5.0803 (Gnus v5.8.3) Emacs/20.6
Mime-Version: 1.0
NNTP-Posting-Date: 23 Jun 2000 00:46:39 GMT
Newsgroups: comp.lang.lisp
* Simon Brooke
| H'mmmm.... I've always considered that XML syntax was just a prolix
| way of writing sexprs.
The element structure has inherent similarities to trees made up of
lists and the significant differences are non-obvious.
| The only problem in the representation is that XML has two distinct
| types of attribute-value pairs, one of which can only take simple
| data types as values and the other of which can take structures.
| You need some way of indicating the difference but the above scheme
| (I would have thought) would make an adequate first cut.
I tend to represent *ML elements as if destructured with
((&rest attlist &key gi &allow-other-keys) &rest contents)
where attlist is a keyword-value plist, at least one key in which is
the generic identifier, a.k.a. the element type name. (There is an
important distinction between attributes and contents as far as
abstraction goes, but I won't go into that.) Attribute values have
a restricted set of types, but I consider this an artificial, not a
significant difference.
One significant difference is the entity structure, which is mostly
used for special characters, but is really an amazingly powerful and
under-understood mechanism for organizing the input sources. Lisp's
syntax has nothing like it at all, and neither do other languages
that could naturally represent tree structures. It is non-trivial
to represent the entity structure and the element structure side by
side, unless you only refer to entities in attribute values.
Another significant difference is the way identifiers are used to
change the meaning of both the gi and the other attributes. We are
not used to the operator changing meaning if we change an argument,
but this is quite common in *ML contexts, to the point where the
generic identifier may not even name the element type as far as
processing is concerned. This means that the "processing key" is
computed from the entire attribute list. Various other mechanisms
with similar confusability exist, and they are bad enough that you
cannot just gloss over them.
The result is that you cannot really represent an *ML structure
without knowing how it is supposed to be processed, as if you would
have to tell the Lisp reader whether you were reading for code or
reading for data, rejecting perhaps the biggest advantage of Lisp's
syntax. In short: They got it all wrong.
If they had had a less involved syntax, they wouldn't have needed
all the arcane details and would have had fewer chances to go off
the deep end. Given that you can stuff a lot of junk into that
attribute list, it just had to happen that they would do something
harmful to themselves. Both Perl and C++ evolved they way they did
because of syntactic mistakes like that.
#:Erik
--
If this is not what you expected, please alter your expectations.