it seems to me that the biggest problem of RESTful RDF is the lack of a well-defined notion of a document. one of the strengths of RDF is that you harvest triples from wherever you find them, take the graph, compute the closure of this graph under whatever schemas or ontologies you might have, and then process the resulting graph. on the semantic level this means you take whatever you can get as data, whatever you can get as reasoning mechanisms, and extend the data using inference. it is this general model of raw data and derived information that makes RDF so powerful.

the price you pay, however, is the loss of provenance and document boundaries. of course this can be overlaid using various mechanisms, but it is not a fundamental part of the RDF model, and it's easy to forget it or get it wrong.

this leads to problems when it comes to the rather document-centric model of REST, which talks about representations that need to be exchanged for enabling interactions. how do you use RDF, when the data you're working with for your application purpose is a mix of retrieved RDF and inferred information?

XML, on the other hand, has a rather static idea of a document. well, that's actually not entirely true, because XSD introduced the concept of the Post-Schema Validation Infoset (PSVI), which in a way is the same mix of source data (the original XML document) and inferred data (validation information, type annotations, and default values). type annotations can be regarded as a bonus that can make data processing much more robust and reliable. ironically, default values, however, are widely frowned upon in XSD best practices, because they tightly bind document processing to document validation, and that is largely perceived to be a Bad Idea.

and this is where it gets interesting: XML's very idea is that of enabling applications to GET XML and blindly process it by simply processing well-formed markup. RDF, on the other hand, is more rooted in the idea that the RDF you find blends into an existing set of other triples and schemas or ontologies. this latter model makes working with RDF more seamless and more convenient in an RDF-only world, because essentially, applications can ignore the fact that data had different origins, and they can work on what sometimes has been referred to as the giant global graph. it seems to me that this replicates the pattern of trying to hide distribution that underlies many of the middleware-inspired approaches to web services in a rather interesting way.

XML, on the other hand, makes it almost impossible to get this seamless view of the world, and requires applications to handle one XML document at a time, often steered by overlaid mechanisms such as XInclude at a very low level, or by following hyperlinks when it comes to RESTful data formats. what this means is that XML and this RESTful picture force you to work on the document level, and this also forces developers to explicitly deal with distribution, which means dealing with failures at the network or representation level.

so it seems as if RDF's virtues, the ability to transcend document boundaries and to seamlessly include data from all kinds of sources, become problematic when moving towards RESTful services, where coarse granularity and well-defined document concepts are an essential part of the architectural style.

this does not imply that RDF cannot be used for RESTful services and document-oriented scenarios; it simply means that XML makes it more natural (or you might say: less avoidable) for developers to deal with strict document boundaries, whereas in RDF, this requires special attention. i am really curious to find out whether somebody already came up with a set of best practices and design guidelines for building RESTful RDF applications, and this becomes particularly interesting when looking at issues such as SPARQL (and its basic model of accessing an RDF triple store through just one URI) and updates of RDF data.

it seems to me that the road towards a RESTful semantic web (if we take the RDF-centric view of that concept) is still not fully clear, and i am really curious to explore in more depth the fundamental tension between REST's things are diverse and distributed and you have to deal with it, and RDF's in the end it's all just triples world views.

Comments

You can follow this conversation by subscribing to the comment feed for this post.

Erik-

Great stuff. I'm particularly intrigued by the analogy to the "distribution myth" that was (first?) drawn into question by the Jim Waldo, et. al. "A Note on Distributed Computing" (http://research.sun.com/techrep/1994/abstract-29.html). Suggesting, I suppose, that RDF is something of a remote interface pretending to be a local interface. While Waldo pointed out the dangers of ignoring issues of latency, concurrency, partial failure, the RDF world runs the risk of underestimating the "difficult questions around trust, confidence, accuracy and time-sensitivity of semantic information" (Jeni Tennison in her recent blog entry http://www.jenitennison.com/blog/node/104).

As a programmer/designer one has constantly to balance the static and the dynamic pieces. It happens at both the level of language (compile time vs. runtime) and architecture (caching, etc.), and it is always a balancing act. It's true that so much of the power of REST comes for the idea of *document* which resolves (at the moment of state transfer) all of those issues. RDF as an open graph does not -- there is no agree upon notion of state (at a readily identifiable & useful level of granularity) to be transferred.

I guess with Google on board, the SemWeb will take on new significance, and I could see definite opportunities in the area of SEO. Whether some of the other goals are achieved seems to me more of a long shot. I do think RDF has a sweet spot, and as a way to share, tranfer, mix, etc. well established ontologies it should be a great boon. There is always the trust, issue, though, which will need to be addressed. I've often thought that we might look to the model offered by DVCS's like git and Mercurial for managing and sharing ontologies. If an ontology is to be changed, merged, branched, etc. it could do so under the same highly controlled, secure, fast and reliable system that manages the Linux kernel (and countless other code bases).

While i too think this will be an interesting development to watch, judging from the things i have to deal with it's also mostly a non-issue.

But one reason for this is, that my websites basically publish read-only data, so the whole batch of problems related to granularity of updates does not come into play. I also use document boundaries dictated by the human readable representation of my sites to conveniently batch triples, so as to make the link rel="alternate" really meaningful.

More interesting for me is the provenance aspect you mention, which is mostly lost in RDF. My guess is, that this will have to be dealt with the Google way, i.e. by ranking sites from which to harvest triples.

We need to differentiate two types of Web documents: those that describe a bunch of resources (mash-ups) and those that describe a single resource. XML has deceived us into believing that a single resource can be described with structured data, but in fact structured data is ultimately a mash-up of a bunch of resources. This is why domain modeling is so useful. Information is either an attribute of a resource or a relationship to a different resource. The mapping to REST is trivial and easily generalized if we accept this reality. This implies that Web documents are much more concise and high-resolution than their mash-up cousins, but this is exactly what we need to maximize the "unexpected reuse" of resources.

@jeff: i have to admit that i am still struggling with what exactly you refer to as "domain modeling". what exactly do you mean by that? to me, that's a very generic term, and what i am talking about here is more the question of the metamodel and how well that works for being used in RESTful scenarios.

domain models are always essential, but the question is what kind of metamodel you're using. if you use XML, your domain model will be rather document- and tree-oriented, if you use RDF, your domain model will be rather triple- and graph-oriented. both choices of metamodels have their implications in terms of granularity, built-in semantics, and document-orientation. my goal is to better figure out some of the tensions between RDF's view of a universe of interconnected triples, and REST's view of a loosely coupled representations of abstract resources.

In general, I am referring to domain modeling (DM) in the sense of http://en.wikipedia.org/wiki/Domain_model. UML is a more specific example with Enterprise Architect being a decent implementation. In MVC terms, this tool can be used to define and even pre-populate the Model using the class diagram feature. You could think of AtomPub as a basic approximation of an HTTP MVC Controller, with LDA starting to fill in the blanks: http://q6.oclc.org/2009/04/the_union_of_do.html.

XML Schema is careless about some important aspects of modeling, so let's assume an RDF perspective. (My resident RDF expert is out for a few days, so bear with me as I try to channel him.)

Roughly speaking, an RDF "subject" would map to a DM "instance", an RDF "object" would map to a DM "attribute value" or "instance in an associated class", and an RDF "predicate" would map to a DM "attribute type" or "association name". In the context of Linked Data (i.e. the Web), a DM "instance" maps to a Real World Object and the DM attributes and associations can be used to algorithmically produce a variety of negotiable Web document representations (in say RDF/XML, N3, JSON, HTML, etc.) describing the Real World Object in a model-neutral way.

I know that the RDF world would love to suck up all the triples they need from a minimum number of Web Documents, but note that I'm suggesting otherwise so that a generic View mechanism can be implemented. For example, the default RDF for an instance of a class in an arbitrary model would only contain the attributes and associations that are directly attached (i.e. have that RWO URI as the "subject" or "object" in the triple). This doesn't mean that an RDF representation that includes more of the graph *can't* be produced, just that this option would require a class-specific MVC View implementation and that making this available is ultimately just a network efficiency maneuver.

Hmm. This is starting to look more like a blog entry than a comment. Still, though, you asked the right questions which is half the battle.

@jeff: thanks for the explanation. you take a very straight path from domain model to UML to RDF, which is something one can do. but at each of these steps, there are many options available with various features, constraints, and side-effects, and many of those have large user communities. i would be interested why you think that it is this path of models that you think is the way to go. it seems to involve a variety of non-trivial metamodels, and each of these have their own idiosyncrasies and side-effects.

UML is very popular as a domain model for software engineering, but it is only one way of doing domain modeling. most RDF users would see RDF (RDFS or OWL) as the domain model they're building, there would be no intermediate UML model. and i tend to think that XML makes sense as a "model" as well, even though i am aware of the fact that XSD is a far cry from being a modeling language, and that there actually currently is no modeling language for XML (http://dret.net/netdret/publications#wil06d is old but nothing has changed since then).

if you do build a domain model in UML and then map it to RDF (and i have to admit that i still don't quite understand what your rationale is for this kind of model language layering), then i assume you have a mapping defined and there is an RDFS that governs all of your derived RDF models? that would be your RDF model of the UML metamodel, and i am quite sure there are many different ways in which this can be done, depending on which parts of the UML metamodel and of the UML models one wants to highlight and make easily accessible.

i am afraid i also don't quite follow the references to MVC and implementations. what i was talking about was simply the question of what you modeling language provides as a dominant granularity, and RDF and XML have significant differences here. and like i said, i am sure you can use RDF in RESTful ways, but then you have to be careful how you handle provenance and document boundaries; that is something RDF doesn't represent well out of the box.

@michael: thanks for the pointer, i hadn't seen it previously. this looks not all that reassuring to me, HTTP seems to be intended for GET only, and all write seems to be tunneled through SPARQL. and i have to admit that the "web of data" term is new and kind of confusing to me. everything on the web is data, isn't it? at least if we look at HTTP for resource access and at the representations and not the potentially physical resources behind them. what is the relationship between "the web", "the semantic Web", "the web of data", and "linked data on the web"? it's getting confusing.

E: this looks not all that reassuring to me,
E: HTTP seems to be intended for GET only,
E: and all write seems to be tunneled through SPARQL.

That's why I'm asking for your advise ;)

GET is only used for read operations, that's what linked data is about. A read-only thing. Regarding SPARQL Update I must say I'm no expert (I'm just using it, dunno about its internals). Looking at [1] (which will be more or less the bases of the upcoming standard; the respective W3C WG has just been launched) doesn't tell me really what they plan to use. I'll ask my colleague, who is active in this WG ...

E: what is the relationship between "the web", "the semantic Web",
E: "the web of data", and "linked data on the web"? it's getting confusing.

Data in terms of finer granular pieces intended to be consumed by machines in contrast to documents, which are intended to be used by humans. Our attempt of an explanation at [2], page 12 might help - or not ;)

Or you invest the 10min for watching Tim's TED talk [3] who motivates that difference very nicely, IMHO.

@michael: i am really looking forward meeting you again and pushing forward our idea of a "RESTful semantic web" that stays within the limits of the web's design principles. to me, many of the current activities in this area have the same underlying assumptions (and thus directions) as the "big web services" approaches: to build an abstraction layer on top of the web, using the web as a transport infrastructure, that hides distribution and gives developers the appearance of one integrated system they can work with. peter keane pointed to the 1994 paper that first pointed out the fact that this approach towards model-centric homogenization happens in cycles, and i don't believe this time it's something different. but i am really looking forward to discussing this and much more in person. i think this discussion is going to be very important for how much the semantic web will be able to really be a "web" with web-inherent properties such as distribution, failure, inconsistency, and non-matching models and metamodels.

wrt the "web of data": i did watch timbl's TEDtalk when it was published and was pleased to see that he did not mention any specific technology choices, so that at least judging from that talk, "linked open data" might as well be implemented using plain web standards. but i don't quite get your argument why "documents" are inherently human-oriented, whereas machine-oriented data has to be fine-grained. that's quite an assumption to start from. i do see the advantages of a super-fine-granular model such as RDF that allows you to have a very fluid view of data and merge graphs and derive new data from inference. but that's just the advantage of having such a fine-granular model as the metamodel world you're living and programming in, and there's little evidence that having a very fine-granular metamodel for implementing loosely coupled systems ever worked very well.

furthermore, when you look at information processing in real-world scenarios (i.e., not at people cooperating towards joint work around some datastore they all agree to use, but just business partners who need to cooperate in a loosely coupled way based on very specific goals and constraints), then documents in fact are a very useful abstraction for data exchange [1], and i would argue that the granularity provided by documents is one of the main reasons why they work so well in loosely coupled B2B scenarios. if you like, you can of course have RDF-based documents in such a scenario, but then you still need to make sure that you have a well-defined notion of a document in your scenario. and that's what i am after.

Interesting post Erik. Have you ever looked at Guha's Adding Object Identity and Explicit Relations to the XML http://web.archive.org/web/20060619175636/http://tap.stanford.edu/xemantics.html ? I recently ran across it in some discussion of Google's crawl/index of RDFa. I like the idea of XIOR because it seems to capture what is missing from the XML document centric view, while not layering on an entirely new paradigm. More like linked-data than Semantic Web I guess. It's interesting and reassuring that Guha is still plugging away at the same ideas, now at Google.

I've recently been feeling some confusion/dissonance between the Semantic Web and REST, so it was nice to read your post. My confusion sort of hinges on the semweb notion of a 'real world resource' which needs to have a fancy httpRange-14 identifier, vs. the resource in REST, which can only ever have representations sent down the wire. So the problem of what a URI for say Erik Wilde returns when resolved seems to kind of go away in the REST view. All you ever get are representations of resources.

I imagine I'm just re-igniting some old debate in a half-baked way. But keep up your thinking on this topic, I think it's important.

@ed: thanks for your comment and the pointer to XIOR, and haven't seen it before and i will definitively check it out. i think whatever google is doing, and even though they only just started carefully checking out microformats, in the end they will (have to) be more pragmatic than many of the idealized assumptions underlying the semantic web.

i also ran into the same issue with "identity". i proposed URI schemes for locations a while ago [1], because it seemed to me that location would be an excellent concept to frame as a URI. some locations might have resolvers, others not or only for certain people (if you want to resolve my home address to a GPS address, you can do so, if you know where i am living). the thing i ran into was the W3C TAG being very critical about inventing *any* new URI schemes, it seemed they wanted to have HTTP as the only scheme from now on, and a "location scheme" would somehow have to be built as an overlay on top of that, so that it can be resolved via HTTP and httpRange-14.

my thinking was and still is that this is against the principles of the web. HTTP URIs should be opaque, and if there is a new concept you want to make known on the web with new interactions around it (and i think it's about time that the web becomes a location-aware information system), then there should be URIs for instances of that concept, so that you can start working with them [2]. the semantic web and its approach to build an abstraction layer that hides distribution and homogeneity, however, has led to the view that for the semantic web to work seamlessly, you need to get your RDF triples, and for that to work, you must be able to retrieve them via HTTP. so, in the end, even though there was not a single deployed scenario where this approach worked, the conclusion from that discussion seemed to be that from now on, all we will ever want or GET is HTTP and RDF delivered via httpRange-14.

Ok, so document vs. bunch of triples view. I think I get the main issue. How about a practical example where at least the provenance should be easier to understand. Take Google's recent support for RDFa [1] (or microformats, FWIW, doesn't matter too much here as one may apply a GRDDL transformation to keep the RDF point of view).

Here, the metadata (the RDF triples) are embedded in an HTML document, that is the document and the triples are clearly connected to each other. One may take a 'local' approach then. Basically it boils down to having a bunch of key-value pairs (if we leave out the, IMHO interesting part, of using resources rather than literal values for the objects for the sake of simplicity). These KV-pairs are expressed in the context of the HTML+RDFa document.

Erik, what do you think? Is this a border-case or could that be something that might form a base for further exploration. As an aside, I guess we should try to focus the discussion on real-world data and examples as much as possible. I know that you're deep into Atom and stuff, hence it would be cool to learn concrete examples from this domain to contrast them with the RDFa example I gave above (or maybe find common elements? ;)

@michael: i think that a microformat can and often should be more than a bunch of key-value pairs. having it like this makes it easy to map it to RDF, but then it's more RDF embedded in HTML than a microformat that's supposed to semantically augment an HTML page. as an example, sequence information (consider various lines for an address field in an address microformat) or grouping (address info grouped as home address, and address info grouped as office address) can also easily be derived from the structural richness of HTML.

in terms of provenance, a microformat is embedded in a web page and thus part of the page. looking at GRDDL already blurs that line, because nothing in GRDDL states (i think) that there is only one GRDDL transform, or that a GRDDL transform has to be provided by the page author/owner. so i think it would be easy to imagine a scenario where a page has microformat info, and then there are various GRDDL transforms that are used for it, simply because they have been produced for a different set of test cases, or because they produce RDF based on different schemas.

i think the whole idea that the microformat *is* RDF is not such a great idea as a general definition of a microformat. some microformats may be built this way (because they have been defined by users with an interest of getting RDF), but it should be up to the microformat to decide its metamodel, and for more document-oriented microformats (sequences, mixed content, grouping), it may be more convenient to use a more document-centric metamodel.

i like GRDDL's general approach that it defines an extraction mechanism, but i think it's very unfortunate that it hasn't left open the question of the target metamodel. GRDDL always produces RDF, and i think there is not any requirement is this general scenario of "Gleaning Resource Descriptions from Dialects of Languages" that makes it necessary to hardcode RDF into the mechanism.

absolutely fascinating post and set of comments... I did note that despite tbl's non-mention of RDF in his TED talk, he did mention HTTP quick often. So, perhaps not as agnostic or neutral as it might seem?

@joebeone: i guess the general consensus is that data should be available via HTTP; the big question is: what data? web data or semantic web data? and if it is semantic web data, does the semantic web even have appropriate models for handling all use cases, such as access beyond read access? the semantic web is much more semantic than it is web, and it is often surprising to see how small a fraction of the semantic web community even cares about the web. to non-experts, the "web" parts of the semantic web may look ok (URI for identification, HTTP for access), but a closer look reveals that a lot of things are half-baked, at best. URIs are used for identifying the things that semantic web data is about, but how do you identify semantic web data itself? very often, not at all, because it lives in some silo that can only be pierced through SPARQL requests, and SPARQL is not RESTful at all. HTTP can be used to access SPARQL silos, but SPARQL's use of HTTP is basically the same as SOAP: use HTTP as a transport protocol. i think it may be possible to actually put the "web" into the "semantic web", but it would take some effort, and also some general interest in web architecture, decentralized systems, and how to handle data in an inconsistent, conflicting, and possibly even non-cooperative environment.