Atom on the Web of Data: The Activity Streams Example

After long years in the wilderness, the vision that "Web of Data" - where users can access, visualize, and manipulate raw data in a browser with the ease that they can currently view web-pages - is finally becoming an more commonly accepted as the next step in the evolution of the Web. However, there are still looming architectural questions of how to best accomplish this task. Unlike XML, RDF has been around for nearly a decade without the significant uptake till recently from either developers or vendors. For example, while many claim that the largest deployed RDF vocabulary is Friend-Of-a-Friend (FOAF), the usage of FOAF is currently rather small outside the Semantic Web community itself, and instead developer momentum in the wider social Web is moving towards PortableContacts and a new version of vCard.

This is odd, as the data model of RDF of subject-predicate-object triples is trivial at least. There are a number of fairly simple reasons why this model may have been confused:

The relative complexity of the W3C Recommendations.

Arbitrary restrictions (such as no literals as objects, no blank node in predicate position) to the data model

The inability to label collections of triples.

The difficult embedding of RDF into XML.

In summary, RDF was viewed as too complicated for a number of fairly small reasons, despite the relatively simplicity of its data model, which is really just connecting two URIs with a link. This model is re-emerging, albeit withiout the use of RDF, in other parts of the Web. In order for semantic data to be used in a wider context, it must become less complicated while retaining its power - in other words, elegant. Discovering the right combination of features and even serialization syntax is harder than it looks, and given the early days of XML, it is no surprise RDF was serialized using the RDF/XML syntax. However, the Web has had ten years to mature, and any new steps towards serializing semantic Web should pay attention to developments that have already been proven successful rather than perhaps badly re-inventing the wheel.

One of the most important kinds of data in the Web is social data, data that is explicitly about people and their relationships. In a parallel universe to the W3C effort to create the Semantic Web, grass-roots groups of coders like those involved in the DiSO project have been working on building an open and distributed social networking platform. This work has not built upon RDF, but has instead built upon the open Atom standard.

Why Atom?

It is a fact of life that the Web changes. In this regard, one of the unspoken assumptions of the Semantic Web that URIs "don't change" is unrealistic. Even if some URIs don't change, there needs to be URIs for changing data-sets, and this precisely the kind of data that many applications and users need. Formats like Atom have been created precisely for the purpose of syndicating feeds of changing data. Assuming that semantic data in a triple-form can be transported with an Atom-based format, one of the major advantages is the ability for data consumers to subscribe to particular feeds of data. As Atom is already widely used and understood by developers, it does not suffer from the problems of lack of a mature tool-set and complexity that RDF does.

There are a number of different ways one could relate Atom and RDF, and the majority of them make things more complicated. One could attempt to model an entire Atom feed in RDF itself with perfect fidelity, such as done in Atom/RDF. One could even use RSS 1.0 and therefore directly embed even more RDF already in RDF. Yet, the problem is that these assume the user and sender of data already are capable of publishing and consuming RDF. One could also use Atom to carry RDF as a payload of an Atom entry, using either the RDF/XML or another format like Turtle. However, this requires first the users consume Atom, and then consume RDF. Since the problem RDF suffers from is lack of adoption, these approaches are non-starters. And the nail of their coffin is that all of these approaches share the common mistake of adding yet another level of complexity.

Extending Activity Streams

What is needed is a way to incrementally add semantics to existing data streams, building on top of Atom directly. This is precisely the approach taken by the social web format Activity Streams, an extension to Atom to describe activities on social objects. Each activity is a single Atom entry, so atom feed describes a stream of activities over time. Each activity is given a title, summary, and time. Each activity consists of an activity actor, an activity verb, and an activity objects. Here is a fictional example activity stream taken from the draft Activity Stream format specification:

Example: This example introduces a "changeset" object
type and a "commit" verb, both ficticious, invented for the
ficticious example service "VersionCentral". Since the "commit" verb
is a derived type of "post", VersionCentral's activity entries
include both the "post" verb and the "commit" verb so that consumers
which do not support their proprietary extension may fall back on
the "post" verb.

4.5.
Activity Object

The resembles to the RDF 'triple' model are hard to miss as the object and verb in the activity are both explicitly given URIs. The actor (the subject of RDF) seems missing, but it is implicitly in either the Atom author element of the activity or the containing feed, or if both of those are missing, the URI of the feed itself. However, the ActivityStream model differs in a number of ways. First, it allows activity targets, which are roughly analogous to indirect objects in human languages.

If one wanted to serialize RDF as Atom in a similar manner, there would only be a small number of generalizations necessary. First, one would allow there to be an explicit subject element, which could also be defaulted to the author element. Furthermore, then one would allow arbitrary text literals and even data-typed literals inside object elements. Lastly, the id elements could be considered to be the URIs of named graphs, or the source of the atom feed could be the URI of the named graph. See the previous example, just generalized for more explicit triples:

4.5.

There are a number of design decisions that need to be carefully evaluated. There is no reason that every single triple that shared the same subject could not be part of the feed, as long as each entry was restricted to a single subject. The parsing rules that determine how to associate predicates with objects also need to be decided. Sensible choices can be made, such as if more than one predicate allowed, then should each proceeding subject be matched to the proceeding predicate. These sort of intuitive and simple guidelines should be built into the format.

Conclusion

The over-arching point is that Web of Data need to be based on streams. If some new version of RDF is going to the standard fore the Social Web, it should also be based on streams.

Another reason Activity Streams was successful - as shown by the deployment already by Facebook, Google, and Microsoft - is that the work was scoped to try to model what people are already expressing on the web, but are doing so with a conventional format. This means adoption and uptake should be much easier by focusing on needs that we can actually witness, rather than trying to solve problems that we think people have, but may not. For the general problem of data on the Web, the problems people are currently expressing with RDF are obviously the first place to begin.