Thanks Tantek,
On 23 Nov 2011, at 15:19, Tantek Ãelik wrote:
> Generic consumers can absolutely pickup all necessary information from microformats 2 syntax (again by design), and at least some generic information from microdata syntax as well. E.g. an HTML5 Drag & Drop implementation can do generic parsing of microformats 2 and microdata, convert them to a standard (and interoperable) JSON data model, and incorporate them into the data being dragged/dropped.
OK, let's try to put together some wording together for a separate section on generic consumers. Here's a start, but I'd appreciate input about what microformats-2 processors can and can't do, particularly around locating additional machine-readable information about the vocabulary:
Microdata, RDFa and microformats-2 all use a generic syntax, which means
that it's possible to have generic parsers operate over them to extract
data. In the case of microdata and microformats-2, the data has a JSON
structure; data extracted from RDFa has a RDF structure (microdata can
also be converted into RDF).
Generic applications can work in the browser to do things such as
highlighting markup that follows a particular syntax or enabling users
to download the data embedded within a page into a separate file. These
can also use the context in which the HTML data is found to provide
additional features. For example, generic consumers may detect that
each row in a table is associated with a distinct entity, and each cell
with a particular property, and enable users to sort that table based
on property values. In this case, a consumer could ensure that when
values are marked up as dates, times or durations using the <time>
element, the items are sorted by date/time/duration rather than
alphabetically.
Both microformats-2 and RDFa provide additional facilities that enable
publishers to indicate the type of values to support generic consumers.
Microformats-2 properties have a prefix that can indicate when a value
is a URL (u-*), a date/time (dt-*), extended HTML (e-*) or a string
(p-*). RDFa supports a @datatype attribute that publishers can use to
indicate the datatype of a value, usually an XML Schema datatype such
as xsd:integer or xsd:language. Note that once microformats-2 data is
extracted from a page into JSON, these prefixes are no longer available,
so a consumer of the JSON has to know the vocabulary to tell whether a
given value should be interpreted as a string or as HTML markup, for
example. In contrast, the datatypes used to annotate RDFa values are
carried within the RDF data.
RDFa also adheres to a follow-your-nose principle, whereby vocabulary
authors are encouraged to provide a machine-readable description of
classes and properties at the URL used for the class or property. This
can enable generic processors to automatically pick up additional
information about the class or property such as labels, help text,
superclasses, property cardinality and ranges and so on. While microdata
also uses URLs for types and properties, microdata consumers are not
permitted to dereference URLs that they do not already recognise.
>> And we can bring out the guidance on the vocabulary side about not making vocabularies where the datatype of a value can't be determined from the property and its syntax.
>
> The double negative in that statement is confusing.
>
> I'm not sure how this is necessary. I'd need specific examples of how this helps to understand what you're saying.
Well, as an example, there's a particular RDF vocabulary, SKOS, which states that the skos:notation property can be used to give a code for a skos:Concept. Some concepts might have codes from different coding schemes. So that vocabulary says that the RDF datatype of the skos:notation value should be used to indicate the type of the coding scheme. So you're actually encouraged in this vocabulary to end up with something like:
<dog> a skos:Concept ;
skos:notation "3-12"^^eg:CodingScheme1 ;
skos:notation "7-53"^^eg:CodingScheme2 ;
.
There's some more about this at
http://patterns.dataincubator.org/book/custom-datatype.html
What I was trying to say is that this pattern is bad in HTML data vocabularies, because it limits what syntaxes can be used with the vocabulary (you have to use RDFa) and because it places burden on publishers and leads to unreliable data. It should always be possible for a vocabulary-aware application to tell the type of a value based on (a) what property its given as a value for and (b) what the syntax of the value is.
Cheers,
Jeni
--
Jeni Tennison
http://www.jenitennison.com