Using RDFa with DITA and DocBook : Page 2

Learn how to add RDFa metadata to DITA and DocBook documents, how to keep those documents valid, and what advantages this technique can bring to a DITA- or DocBook-based publishing system.

by Bob DuCharme

Aug 20, 2009

Page 2 of 4

RDF + Attributes = RDFa

Various syntaxes such as RDF/XML, n3, Turtle, and RDFa enable you to represent RDF triples so that programs can read this data and then store them in databases, query them, and do inferencing with them. The "a" in RDFa refers to attributes, because RDFa lets you embed triples into non-RDF XML by simply adding a few attribute values.

When adding these attributes, RDFa's design lets you minimize redundancy with two nice tricks:

You can treat data that's already part of the XML file as a triple's object by adding the rest of the triple as attributes in an element that wraps that data.

If you specify only the predicate and object of a triple, a program that extracts those triples from the document assumes that the document itself is the subject. This is quite handy, because RDFa is often used to add metadata about the containing document, such as workflow, provenance, and rights re-use information.

Combining these two tricks, if the document http://www.snee.com/bob/index.html starts off with the following:

<h1>My Home Page</h1>

You can add the triple {http://www.snee.com/bob/index.html, http://purl.org/dc/elements/1.1/title, "My Home Page"} to the page by simply adding this single attribute:

<h1 property="dc:title">My Home Page</h1>

A program looking for RDFa triples will treat "My Home Page" as the triple's object and the containing document as the subject. (It will also look for that "dc:" prefix to be properly associated with a namespace; this is the prefix traditionally used with the Dublin Core http://purl.org/dc/elements/1.1/ URL.)

Table 1 lists some of the most popular attributes that RDFa offers to identify subjects, predicates, and objects in an XML document.

Table 1. Popular RDFa Attributes

Attribute

Used to Identify

about

subject

property

predicate, when object is a string of text

rel

predicate, when object is a URI

content

object, when it is a string of text

href

object, when it is a URI

typeof

class of subject

While some of these attributes are new, you'll recognize href as an existing HTML attribute. The content and rel attributes, although never popular, have also been part of HTML for years.

RDFa in DocBook and DITA: Why Bother?

One reason the DocBook and DITA standards have been popular for storing XML documents is their adaptability. If you want to add new information that the original DTDs don't provide for, the DocBook and DITA architectures let you define additional elements or attributes in an orderly, structured way that will survive upgrades to the standards with minimal fuss. Both DTDs offer slots for arbitrary metadata but nothing with the flexibility and structure of RDFa, because adding specialized elements or attributes to the DocBook and DITA DTDs requires you to write specialized code to extract their values.

When you've added a brief module to either standard's DTD to allow RDFa attributes, the RDF data model's flexibility means that storing new kinds of information in the future may not require additional DTD modifications. For example, if you add an RDFa module to DocBook or DITA today to let you store the employee ID of the editor who reviewed a document, and next year you want to add a workFlowStage value so that a staff member can quickly identify which work has been done on the document, you won't need to modify the DTD at all. The same set of RDFa attributes let you accommodate any RDF triples.

Software that can extract RDFa triples from a document (see the rdfa.info site's Implementations and Tools pages for some good lists) let you do all the things you want to do with document metadata, including loading this data into a database, querying for documents that meet certain metadata conditions, and creating reports on an aggregated set of such data.