(bcc:'ing the XG's Member list; followups to the public list please
('cos I want to find this in Google searches in the future...))
I wrote this late last year, as part of ERCIM and W3C Europe's
contribution to the Quatro EU project. It wasn't widely
circulated at the time. Posting it here un-edited, as a contribution
to discussion of RDF-CL scope and expressivity analysis. I am
picking up some of these themes again via involvement in the
www.medieq.org EU project. Note that when I wrote the draft below,
the RIF WG hadn't formed (nor had this XG).
cheers,
Dan
>
>Quatro label format - architectural review
>
>This document provides a brief overview of the design issues
>and architectural tradeoffs made in the Quatro content labelling
>data format.
>
>Quatro content labels are positioned on a migration path from
>W3C's original content-labelling system, PICS, through simple
>RDF descriptions, and from there, to the more sophisticated
>capabilities of logical rule languages.
>
>The essence of the Quatro labelling approach is a re-expression
>of the PICS labeling system in RDF. PICS technology is dated and
>suffered from poor deployment levels.
>
>The motivation for re-expression of PICS-style content labels is,
>broadly, to leverage the ongoing work in the RDF world:
>
> - use of generic RDF-based standards
> - schema and ontology languages (RDFS/OWL)
> - query language (SPARQL)
> - rule languages (RIF - newly under development at W3C)
> - ability to mix labeling data with other RDF data
> - label instances can also contain Dublin Core, RSS, Creative Commons,
> FOAF, PRISM, RSS-Media, ...
> - labels can be merged into RDF databases that contain
> other relevant info about mentioned pages.
>
>
>Since RDF was itself designed as "PICS next generation", we might ask:
>why is there a need for any particular PICS-like RDF structures such
>as those used in Quatro labels? Shouldn't each PICS scheme (Quatro,
>MedPICS/HIDDEL,
>etc) simply be remodelled as an RDF vocabulary (ie. schema/ontology)
>using
>RDFS or OWL?
>
>The answer here is key to situating the current Quatro label format on
>the
>migration path from original PICS, through basic RDF, into full
>rule-based
>Semantic Web formalisms (eg. OWL and RIF). The core issue here is the
>ability to express *generalisations* that apply to multiple pages.
>
>The original PICS standard included a capability that, until recently,
>has
>not been well addressed in the RDF technology stack. In PICS, it was
>possible to express generalisation such as:
>
>"Each page whose URI begins 'http://playboy.com/images/' has an
> sv:rudeness property whose value is sv:VERYRUDE".
>
>The abstract RDF graph model has no such capability.
>
>The original RDF *syntax* (Feb 1999 Recommendation of W3C) did
>have a syntactic expression of this notion, the rdf:aboutEachPrefix
>construction. Unfortunately, this aspect of the original RDF design
>was widely considered to be flawed, not usefully implementable (since
>it was defined solely over syntactic constructs). The RDF Core
>Working Group removed rdf:aboutEachPrefix from the language; RDF
>in the 2004 edition recommendations no longer contains this construct.
>
>Meanwhile, W3C did continue to improve RDF's ability to express other
>forms of generalisation. The Web Ontology Language, OWL (again completed
>in 2004) provides a sophisticated framework for expressing
>generalisations
>about classes of thing: it can for example express:
>
>"All things that are a Person and have a workplaceHomepage of
>http://www.w3.org/ are a W3CStaffPerson. All W3CStaffPersons are
>beautiful."
>
>It cannot, however, express complex rules that involve "variables"
>or intermediate entities. So it can't say "All things that have a
>URI which begins with the string "http://www.playboy.com/images/ are
>RudeDocuments". So, unfortunately, even the expressive power of
>OWL (in its various dialects) does not quite capture the capabilities
>of PICS.
>
>The Quatro labelling approach was designed just after OWL was finished,
>and as work was beginning on SPARQL, the new W3C language for querying
>RDF.
>It was designed in anticipation of future work on a RDF-friendly Rules
>language, work which (at time of writing, late 2005) is just beginning.
>
>Quatro labels encode a data structure which carries the basic
>information
>that could be modelled in old-style PICS: simple categories, attached
>either
>to a document URI, or to a representation of a regular expression
>against
>such URIs. This allows such a label to be used to exchange information
>broadly equivalent to original PICS labels.
>
>As outlined above, this is a progression from PICS, because such data
>structures can be encoded in RDF/XML (rather than in PICS syntax),
>and mixed closely with other RDF data. The mixing can occur either
>within
>a document, ie. a block of Quatro labelling data, right alongside some
>Dublin Core. Or the mixing can occur in a database system, perhaps
>exposed
>as a Web Service using the SPARQL query language and protocol.
>
>
>The are some important limitations associated with the current design
>that must be understood, to ensure accurate use of the format.
>
>The URI-based generalisation capabilities of Quatro labels are most
>properly only usable with what we might call the PICS-like-idiom. We can
>use Quatro to construct pieces of RDF that say (when exchanged between
>Quatro-aware systems) that some category/value applies to a particular
>document or URI-regex-indicated class of documents. It is, in current
>design, less successful when trying to use arbitrary non-Quatro-oriented
>RDF vocabularies with the URI-regex construction. In other words, it
>doesn't quite work for saying things like:
>
> 'all documents that have a URI which begins with the string
> http://danbri.org/ are things that have a dc:creator whose
> foaf:name is "Dan Brickley"'.
>
>If used carefully (eg. for exchange between systems that share
>additional assumptions about the representation used), Quatro labels
>can carry this information, but strictly speaking the approach
>doesn't fit with the formal semantics of RDF.
>
>The design discussions which led to the current Quatro approach
>[ref Vodaphone meeting with phil, kal, daniel, danbri et al] considered
>several possible alternative designs. Unfortunately each such design
>(namely: use of RDF reification vocabulary; use of an alternate RDF
>reification vocabulary; quoting of RDF/XML using XML escaping; use of
>multiple files) carried a major syntactic overhead for content creators
>and users. The decision was therefore made to limit expressivity in
>version 1 of the format, to hasten adoption.
>
>For expressivity in this area to be improved, without major impact
>on deployability of the syntax, a radically new approach will be needed.
>The problem is that we are trying to express complex "templated" claims
>using RDF vocabulary, and to embed those hypothetical
>templates within "top level" RDF graphs. Fortunately W3C has now
>initiated new work in this area (Rule Interchange Format - RIF). It is
>anticipated that any elaboration of Quatro labelling to improve its
>expressivity in this area will likely be conducted in the context
>of RIF. It is also likely that the RIF WGs deliverables will provide a
>format capable of expressing (to general purpose Semantic Web tools) the
>semantics currently encoded in plain RDF/XML within Quatro labels. This,
>if verified, will prove a useful evaluation mechanism and deployment
>environment for the Quatro work.
>
>
>
>Todo:
> - section headings, examples, diagram
> - test cases
> - refs to specs
>
>