Summary: it seems that tidy/untidy is an implementation detail...
Frank Manola wrote:
> Sergey--
>
> I'd like to see some further discussion of points (a) and (3) you're
> making here, since I think that, while they are key points, I don't feel
> that they are entirely "substantiated" (at least not yet to my
> satisfaction), and I'd like some more details. So adding this stuff to
> the document is great. I don't feel the same about point (b) because I agree with it, but I
> don't think it matters that much. I don't think anyone has claimed
> that, via specifying a datatype like integer for a value, you are going
> to capture all the application semantics that are associated with the
> use of that value in a property, and hence automatically forbid things
> like comparing ages and shoe sizes.
My impression was that key arguments for untidiness built on the
assumption that using strings as ranges of properties such as dc:Creator
or :age was inacceptable, and had to be effectively forbidden by
treating untyped literals as a kind of labeled existential variables.
All I wanted to clarify is that doing so simply elevates the problem of
heterogeneity one level higher, and does not help applications to
interoperate.
> If you want to go to additional
> lengths to further specify the types (like defining types for age and
> shoe size, as some people would do), you can further constrain the
> interpretations, but clearly most people draw the line somewhere. Not
> to mention the fact that you might not want to preclude yourself from
> doing some data mining type of operation that you hadn't thought of when
> you designed the type system that involves comparing people's ages and
> shoe sizes [this gets into my point about wanting different comparison
> operators, which I'll not get into here]. It seems to me the point
> we're trying to address here is somewhat simpler: we've now introduced
> a datatype facility into RDF, where literals can be typed in several
> ways. The question is (unless I'm mistaken), how does *RDF* interpret
> those literals that haven't been explicitly assigned a datatype by one
> of these mechanisms? Do we say they have an implicit datatype of some
> sort (or have a fixed interpretation in some other way), or do we say
> they are the lexical things we talk about in the datatype facility, but
> we don't know what type they are? Either way, applications are going to
> associate additional semantics with the values they get from RDF, and
> RDF won't know anything about those semantics.
I absolutely agree with your conclusion. I think part of the problem is
that "RDF" does not interpret anything ;) Now, seriously, imagine that
there is an application layer that is common to every RDF application
(this is where "RDF" interpretation kicks in). This layers is capable of
parsing RDF/XML documents into graphs, and provides a set of routines
for traversing and updating the graphs. (This is, I guess, a rough
characterization of what "RDF APIs" currently do). This "API" layer has
no schema support, knows nothing about rules, and has to built-in
semantics of any RDF properties.
As you formulated the question above, we are talking about two ways of
implementing this API layer. In one case, all occurrences of an untyped
literal having the same string content map to one graph node, in the
other case, each occurrence results in a separate node. These separate
nodes have internal structure: they contain a single string label.
Notice that even if they contain say some system IDs in a concrete
implementation, these IDs are supposed to be transparent to applications
and the layer itself: each such ID can be replaced by another unique ID
without change in semantics.
The funny thing is that both ways of dealing with the untyped literals
sketched above are isomorphic. In more formal terms, the information
capacity of each of the two data models is equivalent. That is, there is
a bijective function between the set of "tidy" graphs and the set of
"untidy" graphs. In fact, each edge of an untidy graph (s, p, o), where
o is an untidy literal, can be mapped to an edge (s, p,
stringValueOf(o)) of a tidy graph. A reverse mapping takes (s, p, o) as
input that creates (s, p, uniqueUntidy(o)) for each untyped o.
The above effectively proves that each conceivable application that
assumes untidy (or tidy) semantics behaves equivalently if we change the
graph semantics to tidy (or untidy) and plug in an intermediate
"conversion" layer between the application and the original untidy (or
tidy) API layer. That is, "RDF" does not care about (un)tidiness.
Consider the following "Melnik" test (modestly called after Turing test):
Given: an application X that communicates with the external world using
RDF/XML documents.
Goal: find out whether X assumes tidy or untidy semantics for untyped
literals.
My conjecture is that there is no way to distinguish whether an
application deploys tidy or untidy semantics. Therefore, it's an
implementation detail, which matters only for defining a standard,
W3C-blessed RDF API, and is irrelevant for the spec we are working on.
Sergey
> --Frank
>
> Sergey Melnik wrote:
>
>>
>> Brian McBride wrote:
>>
>>>
>>> At 22:21 26/09/2002 +0300, Patrick Stickler wrote:
>>>
>>>
>>>> I ask that the proponents of string-based (tidy) semantics
>>>> present their arguments to the WG in the same manner
>>>> as the proponents of value-based (untidy) semantics were
>>>> asked to do prior ro last Friday's vote.
>>>
>>>
>>>
>>>
>>> That seems sensible. I suggest we collect all the reasons for and
>>> against each proposal into the rationale document we started this week.
>>
>>
>>
>>
>> Brian,
>>
>> how can "tidy" folks contribute to that document? I'd like the
>> reasoning of [1,2] to be included. The points substantiated in [1,2]
>> are these:
>>
>> a) Untidiness is not required for correct modeling, or
>> forward/backward compatibility.
>>
>> b) Untidiness does not solve a general issue of using substitute
>> artifacts in property ranges (claimed by untidy folks). Examples are
>> using strings instead of names, names instead of persons, strings
>> instead of integers, integers instead of kilograms, kilograms instead
>> of masses, integers instead of masses, strings instead of masses. This
>> is common modeling practice and cannot possibly be forbidden, let
>> alone by using untidy literals.
>>
>> 3) Untidiness requires changes in existing apps and APIs, whereas tidy
>> interpretation does not.
>>
>>
>> Sergey
>>
>> [1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Sep/0283.html
>> [2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Sep/0297.html
>>
>
>