Is XML a Language? (was RE: [xsl] XSLT Architecture: Next Step)

Didier says:
[Didier replies:
I am sorry Bill, but XML is not a language but a meta language. More
specifically it is a set of syntax rules but it lacks any semantics or
element structure. As you know, a fully qualified languages does include
syntax, semantics and structure. The semantics and the structure are not
provided by XML but by the designer of the XML based language. Using
languages like DTD, schema, relax, etc... to define the keywords and the
structural rules. XML is an empty shell, it doesn't provide you the keywords
(i.e. the elements and attributes) you get in any well formed language.
To transform an XML set of syntax rules into a language you need to add the
keywords (i.e. the elements and the attributes) and the structural rules
(i.e. one or several occurrences, what kind of element is allowed under that
one, what are the attributes attached to a particular element, what kind of
data content is allowed for a particular element, etc...).
Thus, XHTML is a language, its structural rules are defined and elements and
attributes specified. XML, per se, is the meta language used to create
XHTML.
Cheers
Didier PH Martin]
[Bill replies:
Didier-
What does the "L" stand for in "XML"?
I'm not sure what you mean by "language", but clearly it's not what is
generally accepted in computer science as a formal language. See for
instance _Compilers Principles, Techniques, and Tools_ by Aho, Sethi, &
Ullman (pp27-28) or most any introductory text on formal languages,
compilers, or parsers. You will find, among other things, that "semantics"
is not required of a formal language. (Otherwise it'd be a LOT harder to
build parsers!) Further the concept of "structure" is embedded in the
syntax; i.e., there is no separate designation of structure. In particular
(paraphrasing from the above reference):
A context-free grammar is a 4-tuple:
1.) A set of tokens, known as terminal symbols.
2.) A set of nonterminals.
3.) A set of productions (syntax rules) ...
4.) A designated nonterminal called the "start" symbol.
A "language" (CF Language) is the set of strings that can be derived from
the designated start symbol via the productions. If you go to
http://www.w3.org/TR/REC-xml you'll find that this is pretty much how XML is
defined, the start symbol being "document".
Now, if you want to extend the concept of a formal language to that of a
"programming language" then, yes, you need to add semantics (otherwise it'd
be a LOT harder to build code generators!) -- but I don't think anyone ever
claimed that XML was a programming language. So, perhaps what you mean by a
"fully qualified language" is a "programming language". BTW, I agree that
XML, like BNF, is a Meta Language, but that does not make it less a
language.
Finally, since (I'm guesing) XSD is more expressive than the productions of
a CFG, perhaps what you and I are arguing about is simply the difference
between a "validated" XML document and a "well-formed" XML document, the
former involving the "semantics" and "structure" implied by an XML Schema,
the latter involving simply the CFG for XML.
Regards,
Bill
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list