Ian Horrocks writes:
------
>Two comments/questions:
>
>Firstly, I strongly support the suggestion to define query answers in
>terms of entailment rather than subgraph. Using an entailment based
>definition has numerous advantages, and no disadvantages that I can
>see.
>
>Entailment based definition:
>
>- is widely used and very well understood
>- is concise, clear and unambiguous
>- builds on existing semantic definitions
>- can deal with a wide variety of languages, including RDF, RDFS, OWL,
>SWRL and FOL, simply by referring to the existing entailment semantics
>for the relevant language
>
>Subgraph based definition:
>
>- is not widely used, not well understood, and not *known* to work at
>all (at least not for more expressive languages)
>- is verbose, obfuscated (it seems capable of confusing even expert
>logicians), and may even be ambiguous
>- ignores existing semantic definitions
>- cannot easily deal with more expressive languages, and may even be
>incapable of doing so
>
>Can someone please explain to me why the subgraph based definition has
>been preferred?
-------
As one who argued for the subgraph-based definition, allow me to chime in.
As I'm sure you know, Ian, since we were co-authors of the DAML query
proposal, I too tend to think naturally of a query as a goal to be
proved, or as a sentence whose entailment by the knowledge base is
being questioned, so that querying is simply a synonym for
posing-as-a-goal-to-be-proved, and the entire story is given in terms
of logical ideas like entailment. As you say, this has many
advantages both for rigor and elegance, and it allows one to
parameterize the query with respect to the semantics. It is really
very nice. Unfortunately, it simply isn't adequate for the intended
uses. This isn't how RDF knowledge bases are being used: or at any
rate, it's not the only way.
Moreover, contrary to the implied claims in your note above, this
'logical' point of view about querying is not by any means universal,
and has many disadvantages. (The most glaring disadvantage - alone
enough to render it unusable - is that it requires any conforming
SPARQL engine to also be a theorem-prover.) There is a different
tradition and view of querying, incorporated into SQL, which arises
from database technology, in which querying is seen as a process of
applying filters to a large set of what are essentially vectors of
items; the meaning of the query language, if formalized, would be
specified rather like that of an algebraic programming language. (I
think of it as the baleen-whale model of querying.) This perspective
hardly mentions (what we call) entailment at all, and instead focuses
on query operations, seen as information-extraction operations on a
databases. There are probably more people who think of querying and
query languages in this way than there are who think of queries in a
logical-entailment way, by at least one and probably several orders
of magnitude, and a huge and mature technology (and body of theory)
which backs it up. It became clear quite early in the SPARQL WG
discussions that it was neither practical not desirable to ignore
this other tradition, so we were trying to have the best of both
worlds for a while. But is simply isn't possible to both kinds of
task at the same time. There are many natural uses of a query
language, giving rise to design constraints for the language, which
not only do not fit naturally into what might be called the 'logical'
picture of querying, but are sharply incompatible with it, in that
the purpose of the query is to extract information that cannot by any
stretch of the imagination be viewed as an entailment by the
knowledge base. (These include such things as the number of matching
triples in the graph, the fact that there is no subgraph matching a
query, and the size of the graph.) Other applications require the
ability to specify constraints on answers which, while they could in
principle be viewed in semantic terms as embodying conditions
involving entailments, would - if one were to attempt to do that -
break any existing semantic framework for RDF, RDFS or OWL, since
they involve for example complex arithmetical constraints on datatype
values which cannot be axiomatized in OWL.
<autobio>
I began as a member of the DAWG, viewing myself as the defender of
the logical-entailment view of what querying was all about. After a
brief attempt to either suppress these clearly not-entailment-like
constructions, or to modify them to make them more entailment-like, I
realized that this was neither useful nor productive. I howled with
protest when it was proposed to allow queries which could detect the
non-presence of a triple in the graph: this would be non-monotonic
inference sneaking in by the back door, aaargh!! But when reasonable
people, with real applications, asked me what possible harm could
there be in users being allowed to discover such things about the
triples in an RDF graph, I had no reasonable reply to make to them,
other than a kind of generalized growl about the Semantic Perils of
NonMon. One can only get so far with growling; and I quickly realized
that the mistake was mine: only only gets the Perils if one thinks of
the query answer as being entailed by the graph. But in this case it
was perfectly clear to everyone that it wasn't entailed; and also
perfectly clear that its was nevertheless useful information, and
that people expected to be able to extract it using a query protocol;
and that I had no reasonable argument why they should not be allowed
to do so.
</autobio>
RDF querying in SPARQL really is not about checking entailment. A
SPARQL engine is not required to perform any inferences, to check
semantic redundancy, etc. (Though it may, it is not required to.)
Querying is about trolling through RDF graphs looking for syntactic
patterns. In one very simple case these coincide: simple entailment
is having an instance which is a subgraph, the basic case for the
SPARQL match, if you think of query variables as a species of bnode.
But only in the very simple basic case do you find this coincidence:
as soon as things get at all complicated, querying and
entailment-checking diverge. SPARL queries allow the expression of a
large number of conditions on matches which do not correspond
directly to any kind of RDF/RDFS/OWL entailment; and anything beyond
simple entailment is not automatically handled by SPAQRL.
So, do not think of querying as entailment checking: if you do, it
will seem to be broken, and it will have all kinds of apparently
irrelevant barnacles attached to it, and it will seem ugly and
ad-hoc. Think of it instead as a kind of process of fishing for
patterns in RDF graphs. Entailment is managed by inference engines.
Querying is a process of trolling a pattern through an RDF graph to
see what is in it. Querying is fundamentally not a semantic business;
it is essentially syntactic, having to do with constraints on
bindings which match patterns.
<autobio>
After learning to live with the chagrin of finding that my cherished
view of querying was fundamentally inadequate for even quite simple
RDF applications in the real world, I have become quite reconciled to
this separation between inference/entailment and querying. I think in
fact that it is potentially beneficial to all.
</autobio>
What it does mean, however, is that one has to learn to mentally
separate querying from inference. Which, to return to the topic, is
why I suggested that the SPARQL documents refer explicitly to
matching subgraphs, rather than to simple entailment: precisely
because the use of the 'entailment' word is potentially misleading.
And I have to say, your message bears out my worries here: for even
when we did not use the e-word, you are already misled, as it were
pre-emptively. I was afraid this might happen... :-)
Hope this helps. I realize this will need more discussion and
justification, and does not yet really answer your last question
(why?) but at least you might now see that SPARQL is not simply a
botched attempt at defining entailment. It has different aims in view.
Pat
PS. One quick response: you say the subgraph-based definition is
"obfuscated (it seems capable of confusing even expert logicians)".
What is confusing the expert logicians in this case, it seems to me,
is that they are not actually reading the definition as given, but
are instead hallucinating their own favorite view of what the
definition ought to have been onto the words as actually written: and
when they do not fit, they are then (characteristically for expert
logicians, in my experience) concluding that the words as written
have a silly mistake in them.
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 cell
phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes