Hi Ben,
I think the fundamental reason for our differences comes down to your
view (and probably the view of many on this list) is that RDFa is
*natural* to HTML, and that "nearly all HTML documents contain RDFa
anyway." (http://osdir.com/ml/org.w3c.html.rdf/2006-12/msg00022.html)
Whereas my view is that, as an author of HTML content, I want to be able to say (to any user-agent that cares)
whether my HTML contains RDFa or not. This is because I don't view RDFa as a natural extension to HTML,
but an arbitrary syntax for expressing triples within it.
Sure, a custom doctype that signifies that I'm using it goes part of the way,
but it's a different mechanism from other (GRDDLable) syntaxes, and I suspect it's not as robust (or as simple).
And as such, it demands special treatment over other syntaxes, which seems unnecessary.
> If you build software that assumes some RDFa header flag is always there
> when RDFa is present in the document, then you're going to lose big time.
>
As I said previously, it depends on your priorities. If the goal of the
software is simply to find as many RDFa triples as possible, then
obviously it is better not to get hung up on whether a profile is used
or not.
However, if the /quality/ of the data (and/or the performance of the
software is important) then assuming that anything that *looks* like
RDFa *is* RDFa could be a very bad strategy.
> The main argument is simple: we now live in a world of mashups and
> widgets. There are now third-party applications that run inside
> Facebook's very own HTML page. Chances are, some widgets will include
> RDFa, even if the containing page does not flag the presence of RDFa. If
> you want to find the structured data in the page, you're going to have
> to try the RDFa parser and see what comes out. I can't imagine that
> you'll get anything useful out of the structured-data web if you don't
> do this.
>
There is more to the web than blogs and social networking sites. The
less trivial the data, the more important authorial intention is.
A key advantage of RDF, after all, is that you can use it say precisely
what you mean.
> This isn't an RDFa issue. It's just the way the web is: pages aren't
> atomic chunks anymore, they're bags of disparate chunks of HTML, each
> one of which might have been authored by a different party.
>
If the data in those chunks is important, then it argues more for a
mechanism to express the authorial intention per chunk (something like
@profile on any element perhaps, as Jeremy suggested).
Also, if your mashup web page has any RDF-in-HTML smarts to it, it
probably wouldn't be republishing the HTML verbatim anyway - it would
parse out the data, and format it how it likes (eg: see
http://semwebdev.keithalexander.co.uk/snap.html - the page grabs
'chunks' of eRDF from other pages, and republishes them as RDFa )
> good news is that, unlike microformats, there's only one RDFa
> parser, and it's not going to change regularly over time as we use more
> vocabularies. That's a key difference.
>
>
A key advantage of RDF over something like microformats is the precision
available to authorial intention - you can find or create URIs to say
exactly what you mean. But for that to work, you need to use the *right*
parser. (incidentally, this is even a problem right now for those who
want to use RDFa while the spec is still in a state of flux.)
>> HTML (I'd argue) isn't really suited for being a candidate for treating
>> data as a first class citizen, because its primary use is for presenting
>> documents (not units of data) to humans.
>>
>
> We have a notable disagreement here :) What other format would you use
> for providing units of data to humans? XML+XSLT (ouch)?
My apologies, I phrased that clumsily. My sentiment was not that there
are better formats for presenting machine-readable data to humans, but
that humans often need a different representation of (some types of
data) from machines. Machines, for instance, like timestamps, humans
prefer that information represented a little differently. Human's often
prefer to view floats rounded to certain number of decimal places;
humans prefer to see the word "English" rather than the equivalent ISO
639 code. etc etc.
> When units of
> data are presented to a human, they need to be rendered, yet you also
> need to close the loop so that I can point my mouse to the rendered
> stuff and get back to the structured unit of data.
>
>
Yes, hence the need for workarounds like @content.
> That's why, in my mind, HTML is actually a *very good* place to put some
> amount of structured data. Not all structured data, but certainly data
> that's meant to be interpreted by human eyes to some degree.
>
>
>
We don't disagree here (I think). I like embedding data in HTML as much
as anyone. I'm just saying that machine readable data isn't a first
class citizen in HTML, which is first and foremost for encoding
human-readable documents. I think everyone agrees on that (that HTML
documents should be presentable to human readers), but probably some
disagree with my conclusion that therefore HTML not ought to be too
tightly coupled with any one method of conveying machine readable data
within it.
Perhaps it will help the debate if I lay out my assumptions:
1. If the function of a document format (HTML) is to convey
information to human readers, it cannot also be *optimal* as a
data-exchange format, even though it is still often desirable to make
that format perform both functions.
2. Therefore compromises have to be made (for example, in the
simplicity, verbosity, and universality of the format's syntax).
3. Therefore the compromises of some syntaxes may be more acceptable
than others in different situations.
4. Therefore it would be disadvantageous to those who use the document
format if any of those syntaxes became an intrinsic part of the format.
>
> this isn't an *attitude* that RDFa should be First
> Class and other methods should be Third. It's a realization that the web
> needs *some* kind of generic syntax that is mashup-compatible, and
> neither microformats nor eRDF (nor any other syntax that we know of)
> fits the bill.
>
>
I recognise that there are advantages to using a standardised syntax
(reusing existing tools, and exploiting the html context of the data -
like Ben Nowack's Live Clipboard, or my linked data preview demo ), but
there are also valid reasons for using other syntaxes instead.
All I'm arguing for really, is that RDFa remain a *choice* and that
some care is taken not to get in the way of other options that authors
have to express RDF in HTML.
If RDF-in-HTML is going to be at all significant, then we are still at
an early stage in the game. Almost nobody is doing it yet, and the
depths of possibilities are still pretty uncharted. RDFa doesn't need to
make further experimentation and innovation in the wild harder; it can
be both a standard, and an option. All you need to do is to provide a
GRDDL profile and encourage authors to use it where possible.
If the non-atomic nature of 'mashed-up' web pages is a problem for RDFa
using GRDDL, perhaps this is a wider problem for GRDDL to look at?
Cheers,
Keith