Posted
by
timothy
on Tuesday September 23, 2003 @11:00AM
from the tools-applied dept.

briandonovan writes "World Wide Web Consortium (W3C) Director Tim Berners-Lee and his compatriots would like to transform the current Web into a 'Semantic Web' where 'software agents roaming from page to page can readily carry out sophisticated tasks for users' using 'structured collections of information and sets of inference rules.' The Resource Description Framework (RDF), designed as a language for expressing information about resources on the Web, and allied technologies are the result to date of ongoing efforts at the W3C to furnish Semantic Web proponents with the requisite tools. While it's far too early to predict whether TimBL's grand vision will be realized, RDF/XML (the XML serialization of RDF) is already in widespread use, having been incorporated into a surprising array of applications." Read on below for briandonovan's link-stuffed review of O'Reilly's Practical RDF.

Great introduction to RDF, an assortment of tools and utilities for working with RDF, and some real-world applications.

RDF first hit my radar screen a couple of years ago while I was working on a barebones tool to manage my personal website. I was writing the code to generate RSS feeds ("What is RSS?") for my site and had to choose whether to support RSS 0.9x (non-RDF) or RSS 1.0 (RDF-based) or both. Long story short: I went with RSS 1.0 and was able to implement the feeds, but never got any further into RDF afterwards. I couldn't make headway through the RDF-related working drafts rapidly enough to justify the time that I was spending, there weren't any worthwhile-looking books available at the time, and the few online tutorials that I found were sorely lacking -- possibly because the specs themselves were still evolving as the RDF Core Working Group hashed out some remaining issues.

Fast forward a few years: the dust in RDF-land seems to be settling a bit (although new working drafts of all of the current RDF specs were released on September 5th, most of the changes from previous versions appear to be relatively minor) and, with the publication of Shelley Powers' Practical RDF: Solving Problems with the Resource Description Framework, there's finally a good book available on the subject.

Chapter 7, the first in the tools-focused portion of Practical RDF is dedicated to (mostly Java-based) editors, parsers, validators, browsers, etc. for desktop use. Next, she dives into Jena, the Java RDF toolkit that began life as the labor of love of HP Labs researcher Brian McBride before being elevated to the status of a formal HP Labs project under their Semantic Web Research umbrella. Another HP Labs Semantic Web project, Damian Steer's BrownSauce, a slick little Java-based RDF browser, was introduced back in Chapter7. Means for manipulating RDF/XML in Perl (RDF::Core, part of Ginger Alliance's PerlRDF project), PHP (RAP, the RDF API for PHP), and Python (RDFLib) are addressed in Chapter 9. RDF query engines/languages are taken up next -- rdfDB QL, the query language of R.V. Guha's rdfDB (written in C); SquishQL, implemented in the Java-based Inkling query engine (built atop PostgreSQL); RDQL, used within Jena; and Sesame, a JSP/Servlet querying engine that supports both RDQL and its own query language, RQL, and can be deployed atop MySQL or PostgreSQL. Powers rounds out this part of her book with a chapter that deals briefly with the leftovers. Drive, an RDF API for C#, is briefly discussed along with RDF APIs for less fashionable programming languages : Nokia's Wilbur for CLOS, XOTcl for Tcl, and RubyRDF for Ruby. Redland, an RDF toolkit written in C with Java, Perl, PHP, Python, Ruby, and Tcl wrappers, is covered at some length (about half a dozen pages) and a couple more are given over to Redfoot, a Python RDF framework consisting of RDFLib (mentioned earlier in the Perl/PHP/Python chapter), a small-footprint HTTP server (according to the changelog at redfoot.net, they're using Medusa), and a native scripting language called Hypercode that lives within CDATA blocks in RDF/XML (example).

The last third of Practical RDF is devoted to uses of RDF and begins with a chapter on the OWL Web Ontology Language, an extension to RDF that's designed to supply more constraints for RDF vocabularies than can be provided by RDF Schema alone. This chapter would have been better situated after Chapter 5, which addresses RDF Schema, and feels a bit out of place here. RSS 1.0, the RDF-based syndication format, gets a chapter all of its own, beginning with a short synopsis of the evolution of RSS and the rift between the RSS 0.9x/2.0 and RSS 1.0 camps, progressing through descriptions of the RSS elements, some discussion of the use of modules, RSS autodiscovery, and aggregators (Amphetadesk, Meerkat, and NetNewsWire are mentioned), and finishing with an example RSS file (a syndicated list of book recommendations), producing RSS 1.0 using the Informa RSS Library (a set of Java classes), and merging two RSS 1.0 files using the XML::RSS Perl module. Two "Applications Based on RDF" (commercial and noncommercial) chapters top off the book. Noncommercial applications of RDF are visited first : Mozilla, where history and bookmarks, among other classes of information, are stored in RDF; the Creative Commons licensing scheme, whose proponents encourage content creators to embed RDF snippets into their documents and applications to provide information about the work itself and the restrictions placed on its reuse under the particular CC license that they've chosen; a Java and PostgreSQL based digital library system jointly developed by MIT and HP that uses RDF; and FOAF (Friend-of-a-Friend), an RDF vocabulary designed to express personal information and interpersonal relationships. Among the list of commercial applications utilizing RDF that comprises the final chapter in the book is Chandler, the same as yet very-alpha personal information manager that's managed to garner multiplementions on this site.

The Verdict

The real meat of Practical RDF, for me, was in Chapters 1 through 6 (plus the OWL chapter, Chapter 12). This is not to say that the material in the last 2/3 of the book isn't useful or interesting. The section on RDF software tools is a great annotated survey of what's out there right now ... and I would imagine that installing and testdriving each of the software applications featured in those chapters must have been an extremely time-consuming process. The chapters describing real-world applications of RDF could be useful to someone trying to convince a manager that RDF is a viable, widely-used technology. Given a choice, though, I would rather have seen those pages spent on additional coverage of RDF, RDFS, and OWL with more example RDF vocabularies developed (like PostCon, which the author formulated, then refined through RDFS and OWL). The displaced material could have been made available online at the author's site for the book. A lot of that information will become less accurate over time as the software evolves and people come up with more applications for RDF anyway.

All nitpicking aside, though, if you're looking for a book on RDF, then you can't go wrong with Shelley Powers' Practical RDF.

1. Java runs perfectly adequately for me on my 400 MHz machine. Typical application startup times are ~1 second which is generally acceptable, and once the application is running there's not normally a noticeable difference between it and a 'native' application (whatever that might mean for you...). (Note the distinction between noticeable and measurable, also please bear in mind that I'm not talking about AWT/Swing apps here, those really are slow, but that's the library not the language that's responsible, IMHO).

2. XML might be a little slower to process than other similarly expressive data formats (eg s-expressions, ASN.1 and similar). Maybe by a factor of 10, even. However, the data formats I am comparing it to were considered acceptable for use on 4 MHz processors, and even then the I/O time was a lot more significant than processor time for such operations. Processor speed growth has substantially outpaced IO speed growth over that period.

AFAICT the only people "demanding 4 GHz CPUS" are the "I've got a better PC than you" crowd, serious gamers, and people who are doing really demanging applications, like video editing or scientific applications (or who want to do a lot of work on ).

I'm sure RDF has plenty to offer to the world of online porn. Porn afficianados will more efficiently scour their favorite sites to find the material pertaining to their specific fetishes. Porn merchants will more easily attract the customers who seek them by exactly specifying what they have to offer instead of spamming the search engines with likely keywords.

I've been working in this area. First off the reviewer is wrong. There are very few production systems using RDF. In fact most of it right now is pure academic research. The commercial implementations of RDF graft on a whole bunch of things to make it useful. One critical flaw of the current thinking is URI is authorative and persistent. In other words, a URI uniquely identifies a domain and does not change. That is a falicy which does not exist in commercial sites. URI/URL's are rarely persistent or authorative. RuleML in my opinion is a much better approach to building a semantic web. As far as OWL goes. It is horribly broken and the commercial industry is moving towards other models of onotology. Most are actually going with a webservices model, rather than a strict ontology. There are numerous issues and problems which the current semantic doesn't address. For example the whole concept of binding is poorly addressed and is not flexibly. Many of the researchers believe RDF should be the object model, but companies are using schema, relaxNG and XMI. Semantic web holds a lot of promise if only they work out these critical issues.

I have to admit that I haven't been following RDF closely for a year or so, but I did spend a lot of time investigating the standardization effort from its inception (in like 1996... no joke). At the time I was struck by the appallingly obfuscated specification and syntax.

It seems like a lot of progress has been made since then, but personally I still don't see the point. If you buy into XML as the "lingua franca" of semantic data interchange, then great. I do too. But what exactly is RDF useful for? If we can agree on an XML schema for our data, we can exchange it directly without the need for yet another layer of abstraction on top of it.

The really hard part is agreeing on the schemas, and this has nothing to do with RDF. Having worked in one XML vocabulary standardization effort (Universal Business Language), I can only stress that the technical and political challenges of getting any group of individuals and companies to agree on any common data format are enormous. For example, it would be great if Amazon and B&N used the same schema for their book descriptions, but imagine trying to make this happen (particularly as they are likely to feel that the specificities of their formats represent some kind of competitive advantage).

So until proven wrong I continue to believe that RDF is nothing but smoke and mirrors. The easy stuff is done by XML right out of the box, and the hard stuff has nothing whatsoever to do with data structures and wire serialization formats.

The RDF design addresses the concerns you raise, by virtue of RDF's focus on data merging. You can't take two arbitrary XML documents and (without domain knowledge) reliably merge the information they encode. You can with RDF; just merge the sets of triples that constitute the two RDF graphs. This has knock-on effects in the real world: the granularity of "mixing and matching" between independent vocabularies is much finer. Instead of picking whole document formats, you can use just some parts of another's RDF vocabulary. This gets us away from a situation where you have to decide to use, or not use, an entire XML vocabulary.

RDF makes it cheaper to put together this sort of composite information, since the groups (formal and informal) who came up with these vocabularies didn't need to sit around a table together and agree a single common DTD or XML schema. They each did what they do best, and RDF glues it all together.

Perhaps I am playing devil's advocate here, but not intentionally. I really don't get it. Let's say I design a set of XML schemas using XSD [w3.org], along the lines that you mention (i.e. places, documents, syndication, etc.). Each one has it's own namespace.

Why couldn't I just make an FOAF schema that pulls in the element types from the appropriate "component" schemas, qualifying the types with the correct namespaces?

It still strikes me that RDF is simply an alternative to XSD, and it's not clear to me why it is a better one.