Parsing and Serializing RDF Data with Ruby

In this tutorial we'll learn how to parse and serialize RDF data using the RDF.rb library for Ruby. There exist a number of Linked Data serialization formats based on RDF, and you can use most of them with RDF.rb.

To follow along and try out the code examples in this tutorial, you need only a computer with Ruby and RubyGems installed. Any recent Ruby 1.8.x or 1.9.x version will do fine, as will JRuby 1.4.0 or newer.

Supported RDF formats

These are the RDF serialization formats that you can parse and serialize with RDF.rb at present:

RDF.rb in and of itself is a relatively lightweight gem that includes built-in support only for the N-Triples format. Support for the other listed formats is available through add-on plugins such as RDF::Raptor, RDF::JSON and RDF::TriX, each one packaged as a separate gem. This approach keeps the core library fleet on its metaphorical feet and avoids introducing any XML or JSON parser dependencies for RDF.rb itself.

Note that the RDF::Raptor gem requires that the Raptor RDF Parser library and command-line tools be available on the system where it is used. Here follow quick and easy Raptor installation instructions for the Mac and the most common Linux and BSD distributions:

In this example, we first load up RDF.rb as well as support for the N-Triples format. After that, we use a convenience method on the RDF::Graph class to fetch and parse RDF data directly from a web URL in one go. (The load method can take either a file name or a URL.)

All RDF.rb parser plugins declare which MIME content types and file extensions they are capable of handling, which is why in the above example RDF.rb knows how to instantiate an N-Triples parser to read the foaf.nt file at the given URL.

In the same way, RDF.rb will auto-detect any other RDF file formats as long as you've loaded up support for them using one or more of the following:

Note that if you need to read RDF files containing multiple named graphs (in a serialization format that supports named graphs, such as TriX), you probably want to be using RDF::Repository instead of RDF::Graph:

The difference between the two is that RDF statements in RDF::Repository instances can contain an optional context (i.e. they can be quads), whereas statements in an RDF::Graph instance always have the same context (i.e. they are triples). In other words, repositories contain one or more graphs, which you can access as follows:

repository.each_graph do |graph|
puts graph.inspect
end

Introspecting RDF formats

RDF.rb's parsing and serialization APIs are based on the following three base classes:

The above is what RDF.rb relies on internally to obtain the correct parser implementation when you pass in a URL or file name to RDF::Graph.load -- or indeed to any other method that needs to auto-detect a serialization format and to delegate responsibility for parsing/serialization to the appropriate implementation class.

Parsing RDF data

If you need to be more explicit about parsing RDF data, for instance because the dataset won't fit into memory and you wish to process it statement by statement, you'll need to use RDF::Reader directly.

Parsing RDF statements from a file

RDF parser implementations generally support a streaming-compatible subset of the RDF::Enumerable interface, all of which is based on the #each_statement method. Here's how to read in an RDF file enumerated statement by statement:

require 'rdf/raptor'
RDF::Reader.open("foaf.rdf") do |reader|
reader.each_statement do |statement|
puts statement.inspect
end
end

Using RDF::Reader.open with a Ruby block ensures that the input file is automatically closed after you're done with it.

Parsing RDF statements from a URL

As before, you can generally use an http:// or https:// URL anywhere that you could use a file name:

require 'rdf/json'
RDF::Reader.open("http://datagraph.org/jhacker/foaf.json") do |reader|
reader.each_statement do |statement|
puts statement.inspect
end
end

Parsing RDF statements from a string

Sometimes you already have the serialized RDF contents in a memory buffer somewhere, for example as retrieved from a database. In such a case, you'll want to obtain the parser implementation class as shown before, and then use RDF::Reader.new directly:

The RDF::Reader constructor uses duck typing and accepts any input (for example, IO or StringIO objects) that responds to the #readline method. If no input argument is given, input data will by default be read from the standard input.

Serializing RDF data

Serializing RDF data works much the same way as parsing: when serializing to a named output file, the correct serializer implementation is auto-detected based on the given file extension.

Serializing RDF statements into an output file

RDF serializer implementations generally support an append-only subset of the RDF::Mutable interface, primarily the #insert method and its alias #<<. Here's how to write out an RDF file statement by statement:

Once again, using RDF::Writer.open with a Ruby block ensures that the output file is automatically flushed and closed after you're done writing to it.

Serializing RDF statements into a string result

A common use case is serializing an RDF graph into a string buffer, for example when serving RDF data from a Rails application. RDF::Writer has a convenience buffer class method that builds up output in a StringIO under the covers and then returns a string when all is said and done:

Customizing the serializer output

If a particular serializer implementation supports options such as namespace prefix declarations or a base URI, you can pass in those options to RDF::Writer.open or RDF::Writer.new as keyword arguments:

Support channels

That's all for now, folks. For more information on the APIs touched upon in this tutorial, please refer to the RDF.rb API documention. If you have any questions, don't hesitate to ask for help on #swig or the public-rdf-ruby@w3.org mailing list.