Re: [Yaml-core] xml2yaml.xsl

On Fri, Oct 24, 2003 at 10:20:51AM -0700, Jason Diamond wrote:
| Are there any plans to develop a vocabulary that can describe the YAML
| data model in XML so that it can be validated with a DTD/XML Schema/RELAX
| NG Schema? Much like the Docutils Generic DTD
| <http://docutils.sourceforge.net/spec/docutils.dtd&gt;. Having a canonical
| and validatable target vocabulary might help ease the development of tools
| that convert between XML and YAML and also ease the transition from XML to
| YAML for developers like me who can read DTDs with ease but am still murky
| about YAML's data model.
This is good idea, and the xml2yaml.xsl should convert any XML objects
using this schema to YAML as well as trying to "guess" about what the
user meant (which is what it currently does). A YAML schema for XML
would have three elements:
for the implicit root:
<yaml:_root>
for mappings:
<yaml:_key>
<yaml:_value>
for sequences:
<yaml:_>
Thus, you know you are in a sequence if you encounter a sequence
item (_), and you know you are in a mapping if you encounter a
sequence of key/value. Another way to express things would be to
have an explict "mapping" "sequence" and "scalar" elements, but I
think this would be more verbose and less clear; and not able to
be "mixable" with other non-yaml prefixed data.
There would also be several attributes:
yaml:anchor="id" # marks a node with an anchor
yaml:alias="id" # the _, key or value is an alias node
yaml:tag="seq" # this is the type tag for a given node,
# less the 1st !, ie "!private"
yaml:style="double" # specifies the style to use
Then, a "strict" XML binding of YAML would only use the
above elements and attributes. A "loose" binding would
follow the rules outlined at http://yaml.org/xml.html ...
1. If an element contains a sequence of elements with the
same name, then the name is discarded and they are
considered to be a sequence. Further, a new attribute
on the YAML root node, yaml:seq="items|people" could be used
to indicate elements which are sequences; in this case,
all element names of children are ignored.
NOTE: the implementation doesn't do this... yet. The current
approch it takes is similar, but brain dead (I was keying
of elements which are the sequence-items... dumb move)
2. If an element contains different elements; then it is treated
as a mapping, with the names of the children treated as keys,
and the element's value treated as the key's value. This
requires the elements to be unique, of course.
3. If an element contains attributes, then it is treated as a
mapping. And there are two sub-cases:
a) If the element is a sequence (see #1), then the name of
the sequence items are used as a key value, ie,
<a k="v"><b>val</b><b>two</b></a>
becomes
{ a: v, b: [val, two] }
b) If the element is a mapping (see #2), then the attributes
and elements are merged.
<a k="v"><b>val</b><c>two</c></a>
becomes
{ a: v, b: val, c: two }
4. If an element contains a text node, but no elements, then
it is a scalar value (either a key, value, or sequence item).
Note: if there is only one child element then it will be treated
as a mapping (use yaml:seq to specify otherwise)
if there are zero children (not even a text node),
then the element will also be treated as a mapping.
If an element contains only a single text node of just
whitespace, then it will be converted as a scalar.
Clearly "implicit" conversion like this is dangerous,
however, it is the most useful. ;)
There are lots of issues in this process, but if properly documented,
this can be useful for XML people. I'm not sure if this schema makes
sense if you are trying to "understand" YAML, however. To do this,
it is probably best to scan the spec and/or tutorial. I think that
trying to learn via XML could be dangerous.
Best,
Clark
P.S. In the new information model section, I'm using lisp style
SExpr to annotate the examples.

Thread view

version: .02
download: http://yaml.org/xml/yaml2xml.xsl
warning: >
This is experimental stuff! And subject to change!
subject: >
xml2yaml.xsl is a very simple script for converting brain-dead
XML (specifically XML made for use as YAML) to YAML. It does so
by imposing many constraints, such as no mixed-content, and
forcing unique mapping keys, etc.
examples:
- http://yaml.org/xml/invoice.xml
- http://yaml.org/xml/filter.xml
new features:
- it now does aliases/anchors, see http://yaml.org/xml/invoice.xml
- it now allows for duplicate element names to be treated as
a sequence
- if the root node is not in the yaml namespace, it uses
the node name as a !!privatetype as suggested by Oren
- it now reports more conversion problems better
Best,
Clark

On Fri, Oct 24, 2003 at 09:36:53AM -0700, Jason Diamond wrote:
| That should be http://yaml.org/xml/xml2yaml.xsl
Right on.
| > - http://yaml.org/xml/invoice.xml
|
| This is what I get in IE:
|
| A reference to variable or parameter 'seqname' cannot be resolved. The
| variable or parameter may not be defined, or it may ...
Hmm. Well, I didn't catch this error beacuse it happens
when making an error message. This error message shouldn't
be called with the invoice.xml -- so it still probably won't
work, but I'd be curious what element it chokes on.
Best,
Clark

Hi.
> | > - http://yaml.org/xml/invoice.xml
> |
> | This is what I get in IE:
> |
> | A reference to variable or parameter 'seqname' cannot be resolved. The
> | variable or parameter may not be defined, or it may ...
>
> Hmm. Well, I didn't catch this error beacuse it happens
> when making an error message. This error message shouldn't
> be called with the invoice.xml -- so it still probably won't
> work, but I'd be curious what element it chokes on.
Oops. I have IE 5.5 installed here at work. I don't think that uses, by
default, a version of MSXML that supports XSLT 1.0. It works fine in
Firebird 0.7.
Are there any plans to develop a vocabulary that can describe the YAML
data model in XML so that it can be validated with a DTD/XML Schema/RELAX
NG Schema? Much like the Docutils Generic DTD
<http://docutils.sourceforge.net/spec/docutils.dtd&gt;. Having a canonical
and validatable target vocabulary might help ease the development of tools
that convert between XML and YAML and also ease the transition from XML to
YAML for developers like me who can read DTDs with ease but am still murky
about YAML's data model.
-Jason

On Fri, Oct 24, 2003 at 10:20:51AM -0700, Jason Diamond wrote:
| Are there any plans to develop a vocabulary that can describe the YAML
| data model in XML so that it can be validated with a DTD/XML Schema/RELAX
| NG Schema? Much like the Docutils Generic DTD
| <http://docutils.sourceforge.net/spec/docutils.dtd&gt;. Having a canonical
| and validatable target vocabulary might help ease the development of tools
| that convert between XML and YAML and also ease the transition from XML to
| YAML for developers like me who can read DTDs with ease but am still murky
| about YAML's data model.
This is good idea, and the xml2yaml.xsl should convert any XML objects
using this schema to YAML as well as trying to "guess" about what the
user meant (which is what it currently does). A YAML schema for XML
would have three elements:
for the implicit root:
<yaml:_root>
for mappings:
<yaml:_key>
<yaml:_value>
for sequences:
<yaml:_>
Thus, you know you are in a sequence if you encounter a sequence
item (_), and you know you are in a mapping if you encounter a
sequence of key/value. Another way to express things would be to
have an explict "mapping" "sequence" and "scalar" elements, but I
think this would be more verbose and less clear; and not able to
be "mixable" with other non-yaml prefixed data.
There would also be several attributes:
yaml:anchor="id" # marks a node with an anchor
yaml:alias="id" # the _, key or value is an alias node
yaml:tag="seq" # this is the type tag for a given node,
# less the 1st !, ie "!private"
yaml:style="double" # specifies the style to use
Then, a "strict" XML binding of YAML would only use the
above elements and attributes. A "loose" binding would
follow the rules outlined at http://yaml.org/xml.html ...
1. If an element contains a sequence of elements with the
same name, then the name is discarded and they are
considered to be a sequence. Further, a new attribute
on the YAML root node, yaml:seq="items|people" could be used
to indicate elements which are sequences; in this case,
all element names of children are ignored.
NOTE: the implementation doesn't do this... yet. The current
approch it takes is similar, but brain dead (I was keying
of elements which are the sequence-items... dumb move)
2. If an element contains different elements; then it is treated
as a mapping, with the names of the children treated as keys,
and the element's value treated as the key's value. This
requires the elements to be unique, of course.
3. If an element contains attributes, then it is treated as a
mapping. And there are two sub-cases:
a) If the element is a sequence (see #1), then the name of
the sequence items are used as a key value, ie,
<a k="v"><b>val</b><b>two</b></a>
becomes
{ a: v, b: [val, two] }
b) If the element is a mapping (see #2), then the attributes
and elements are merged.
<a k="v"><b>val</b><c>two</c></a>
becomes
{ a: v, b: val, c: two }
4. If an element contains a text node, but no elements, then
it is a scalar value (either a key, value, or sequence item).
Note: if there is only one child element then it will be treated
as a mapping (use yaml:seq to specify otherwise)
if there are zero children (not even a text node),
then the element will also be treated as a mapping.
If an element contains only a single text node of just
whitespace, then it will be converted as a scalar.
Clearly "implicit" conversion like this is dangerous,
however, it is the most useful. ;)
There are lots of issues in this process, but if properly documented,
this can be useful for XML people. I'm not sure if this schema makes
sense if you are trying to "understand" YAML, however. To do this,
it is probably best to scan the spec and/or tutorial. I think that
trying to learn via XML could be dangerous.
Best,
Clark
P.S. In the new information model section, I'm using lisp style
SExpr to annotate the examples.

Hello.
> Thus, you know you are in a sequence if you encounter a sequence
> item (_), and you know you are in a mapping if you encounter a
> sequence of key/value. Another way to express things would be to
> have an explict "mapping" "sequence" and "scalar" elements, but I
> think this would be more verbose and less clear; and not able to
> be "mixable" with other non-yaml prefixed data.
Actually, that's exactly what I was asking for: a vocabulary that did
include elements types like mapping and sequence and scalar. I would never
want to author a document like this, but as an intermediate vocabulary in
some processing pipeline, I think this makes a lot of sense. And with a
fixed set of element types, it can be validated.
> There are lots of issues in this process, but if properly documented,
> this can be useful for XML people. I'm not sure if this schema makes
> sense if you are trying to "understand" YAML, however. To do this,
> it is probably best to scan the spec and/or tutorial. I think that
> trying to learn via XML could be dangerous.
It certainly helps me. Mostly because it's a machine-readable description
of the data model. No, I'm not a machine--but it's easier for me,
personally, to read schemas (XML or otherwise) than it is to read
instances or even prose when learning about a new model. Tim Bray
mentioned this today when talking about XML data models:
http://lists.xml.org/archives/xml-dev/200310/msg00705.html
How about a YAML "vocabulary" that describes YAML documents (or streams,
rather). Much like the XML Infoset serializations that are out there.
-Jason

Clark C. Evans wrote:
> A YAML schema for XML would have three elements: ...
Nice work, Clark, on the XSLT and this. How's the spec coming along? :-)
> if there are zero children (not even a text node),
> then the element will also be treated as a mapping.
With the element name as the key and a null value?
---
xml: |
<a/>
yaml: |
a:
...
> There are lots of issues in this process, but if properly
> documented, this can be useful for XML people. I'm not sure
> if this schema makes
> sense if you are trying to "understand" YAML, however. To do this,
> it is probably best to scan the spec and/or tutorial. I
> think that trying to learn via XML could be dangerous.
+10. YAML as alternative syntax for XML == bad. YAML as alternative way
to serialize data == good.
Have fun,
Oren Ben-Kiki

On Friday, October 24, 2003, at 04:11 PM, Oren Ben-Kiki wrote:
>> There are lots of issues in this process, but if properly
>> documented, this can be useful for XML people. I'm not sure
>> if this schema makes
>> sense if you are trying to "understand" YAML, however. To do this,
>> it is probably best to scan the spec and/or tutorial. I
>> think that trying to learn via XML could be dangerous.
>
> +10. YAML as alternative syntax for XML == bad. YAML as alternative way
> to serialize data == good.
Just to be clear, I certainly wasn't suggesting YAML as an alternate
XML infoset syntax. But I do foresee a lot of data-oriented XML being
converted into YAML. Having an intermediate XML vocabulary that
describes the YAML model (not the XML model!) would be very useful as
it gives the developers performing this conversion a concrete
vocabulary that they can target and then verify that they've succeeded
in producing. Helping me "understand" the YAML model by being able to
read the DTD for it was just an added bonus (for me). The prose in the
spec might be enough for you but it's not for me.
-Jason

Hi.
> version: .02
> download: http://yaml.org/xml/yaml2xml.xsl
That should be http://yaml.org/xml/xml2yaml.xsl
> warning: >
> This is experimental stuff! And subject to change!
> subject: >
> xml2yaml.xsl is a very simple script for converting brain-dead
> XML (specifically XML made for use as YAML) to YAML. It does so
> by imposing many constraints, such as no mixed-content, and
> forcing unique mapping keys, etc.
> examples:
> - http://yaml.org/xml/invoice.xml
This is what I get in IE:
A reference to variable or parameter 'seqname' cannot be resolved. The
variable or parameter may not be defined, or it may ...
> - http://yaml.org/xml/filter.xml
> new features:
> - it now does aliases/anchors, see http://yaml.org/xml/invoice.xml
> - it now allows for duplicate element names to be treated as
> a sequence
> - if the root node is not in the yaml namespace, it uses
> the node name as a !!privatetype as suggested by Oren
> - it now reports more conversion problems better
-Jason