When is Linked Data not Linked Data? – A summary of the debate

One of the activities identified during last December’s Semantic Technology Working Group meeting to be taken forward by CETIS was the production of a briefing paper that disambiguated some of the terminology for those that are less familiar with this domain. The following terms in particular were highlighted:

Semantic Web

semantic technologies

Linked Data

linked data

linkable data

Open Data

I’ve finally started drafting this briefing paper and unsurprisingly defining the above terms is proving to be a non-trivial task! Pinning down agreed definitions for Linked Data, linked data and linkable data is particularly problematic. And I’m not the only one having trouble. If you look up Semantic Web and Linked Data / linked data on wikipedia you will find entries flagged as having multiple issues. It does rather feel like we’re edging close to holy war territory here. But having said that I do enjoy a good holy war as long as I’m watching safely from the sidelines.

So what’s it all about? As far as I can make out much of the debate boils down to whether Linked Data must adhere to the four principles outlined in Tim Berners Lee’s Linked Data Design Issues, and in particular whether use of RDF and SPARQL is mandatory. Some argue that RDF is integral to Linked Data, other suggest that while it may be desirable, use of RDF is optional rather than mandatory. Some reserve the capitalized term Linked Data for data that is based on RDF and SPARQL, preferring lower case “linked data”, or “linkable data”, for data that uses other technologies.

The fact that the Linked Data Design Issues paper is a personal note by Tim Berners Lee, and is not formally endorsed by W3C also contributes to the ambiguity. The note states:

Use URIs as names for things

Use HTTP URIs so that people can look up those names.

When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

Include links to other URIs. so that they can discover more things.

I’ll refer to the steps above as rules, but they are expectations of behaviour. Breaking them does not destroy anything, but misses an opportunity to make data interconnected. This in turn limits the ways it can later be reused in unexpected ways. It is the unexpected re-use of information which is the value added by the web. (Berners Lee, http://www.w3.org/DesignIssues/LinkedData.html)

In the course of trying to untangle some of the arguments both for and against the necessity of using RDF and SPARQL I’ve read a lot of very thoughtful blog posts which it may be useful to link to here for future reference. Clearly these are not the only, or indeed the most recent, posts that discuss this most topical of topics, these happen to be the ones I have read and which I believe present a balanced over view of the debate in such a way as to be of relevance to the JISC CETIS community.

The first useful post I read on this particular aspect of the debate is Andy Powell’s from July 2009. This post resulted from the following question Andy raised on twitter;

is there an agreed name for an approach that adopts the 4 principles of #linkeddata minus the phrase, “using the standards (RDF, SPARQL)” ??

Andy was of the opinion that Linked Data “implies use of the RDF model – full stop” adding:

“it’s too late to re-appropriate the “Linked Data” label to mean anything other than “use http URIs and the RDF model”.”

However he is unable to provide a satisfactory answer to his own question, i.e. what do you call linked data that does not use the RDF model, and despite exploring alternative models he concludes by professing himself to be worried about this.

Andy returned to this theme in a more recent post in January 2010, Readability and linkability which ponders the relative emphasis given to readability and linkability by initiatives such as the JISC Information Environment. Andy’s general principles have not changed but he presents term machine readable data (MRD) as a potential answer to the question he originally asked in his earlier post.

he is uneasy about conflating RDF with Linked Data and with assertions that

“‘Linked Data’ can only be Linked Data if expressed in RDF.”

Paul discusses the wording an status of Tim Berners Lee’s Linked Data Design Issues and suggest that it can be read either way. He then goes on to argue that by elevating RDF from the best mechanism for achieving Linked Data to the only permissible approach we risk barring a large group

“with data to share, a willingness to learn, and an enthusiasm to engage.”

Paul concludes by asking the question:

“What are we after? More Linked Data, or more RDF? I sincerely hope it’s the former.”

Paul Walk has published two useful posts on this topic; the first summarising and commenting on the debate sparked by the two posts above, and the second following the Giant Global Graph session at the CETIS 2009 Conference. This latter post presents a very useful attempt at disambiguating the terms Open data , Linked Data and Semantic Web. Paul also tries to untangle the relationship between these three memes and helpfully notes:

data can be open, while not being linked

data can be linked, while not being open

data which is both openandlinked is increasingly viable

the Semantic Web can only function with data which is bothopen and linked

Much more recently Tony Hirst published this post which begins with a version of the four Linked Data principles cut from wikipedia. This particular version makes no mention of either RDF or SPARQL. Tony goes on to present a very neat example of data linked using HTTP URI and Yahoo Pipes and asks

“So, the starter for ten: do we have an example of Linked Data™ here?”

Tony broadly believes the answer is yes and is of a similar opinion to Paul Miller that too rigid adherence to RDF and SPARQL

“will put a lot of folk who are really excited about the idea of trying to build services across distributed (linkable) datasets off…”

Back here at CETIS Wilbert Kraan has been experimenting with linked data meshups of JISC project data held in our PROD system. In contrast to the approach taken by Tony, Wilbert goes down the RDF and SPARQL route. Wilbert confesses that he originally believed that:

“SPARQL endpoints were these magic oracles that we could ask anything about anything.”

However his attempts to mesh up real data sets on the web highlighted the fact that SPARQL has no federated search facility.

“And that the most obvious way of querying across more than one dataset – pulling in datasets from outside via SPARQL’s FROM – is not allowed by many SPARQL endpoints. And that if they do allow FROM, they frequently cr*p out.”

Wilbert concludes that:

“The consequence is that exposing a data set as Linked Data is not so much a matter of installing a SPARQL endpoint, but of serving sensibly factored datasets in RDF with cool URLs, as outlined in Designing URI Sets for the UK Public Sector (pdf).”

And in response to a direct query regarding the necessity of RDF and SPARQL to Linked Data Wilbert answered

“SPARQL and RDF are a sine qua non of Linked Data, IMHO. You can keep the label, widen the definition out, and include other things, but then I’d have to find another label for what I’m interested in here.”

Which kind of brings us right back to the question that Andy Powell asked in July 2009!

So there you have it. A fascinating but currently inconclusive debate I believe. Apologies for the length of this post. Hopefully one day this will go on to accompany our “Semantic Web and Linked Data” briefing paper.

Post navigation

19 thoughts on “When is Linked Data not Linked Data? – A summary of the debate”

I think Paul Walk’s definitions are exactly right, and reflect current usage. It also addresses Paul Miller’s and Tony Hirst’s concerns: everyone wants more open data, and open data (in whatever form it comes) is already useful in an of itself. Also, if you want to, it is increasingly easy to convert open data that isn’t RDF based into Linked Data on the fly.

That possibility, however, doesn’t make non-RDF open data Linked Data. The point of Linked Data is that you can assume that things will be identified with URIs, that data will be expressed in triples, and that you can SPARQL it. It’s an infrastructure.

If you’re doing something else, that may be just as worthy, it just isn’t Linked Data (unless you want to play silly buggers with labels)

Having read through the blogs I’m inclined to agree with you. If you’re going to call it Linked Data then it has to be RDF based, otherwise it’s something else. I can understand how the confusion has arisen though. The Linked Data Design Issues paper is not hugely clear.

And regarding playing silly buggers with labels… surely no one would ever do that? Would they?

“There are other cases where the easiest thing for somebody to do is to just put data up in whatever form it’s available. Comma separated values (CSV) files are remarkably popular. They’re exported sometimes from spreadsheets. It’s remarkable how much information is in spreadsheets. Or sometimes pulled out of a database and then put up on the web. It’s not as good, not as useful to the community, as if Linked Data had been put up there and linked. But the first step of actually putting the data out there is the one that nobody else can do.”

I think this is really important – Tim Berners-Lee is clear here that getting data ‘open’ (and machine readable) is more fundamental than getting it linked.

I think this point about getting data out there, and other people will link it, is key. There are some questions (in my mind at least) that haven’t been answered about bridging gaps between Linked Data and other machine-readable data (as both are bound to exist) – at the moment it seems to me that this is slightly one-way (machine-readable data can exploit Linked Data, but not the other way round?).

Finally I was really (really) impressed with the ‘Gridworks’ demo on the Freebase blog http://blog.freebase.com/2010/03/26/preview-freebase-gridworks/ – it seems to me that this is a clever way of getting csv/spreadsheet data into the Linked Data world with benefits on both sides – I can’t wait to have a play with this when it’s released…

@Kingsley thank, I rather like the distinction you’ve drawn here. @Nick interesting contribution to an ongoing debate.

FWIW personally I agree that RDF is not a prerequisite for Linked Data and that it is important to retain the distinction between Linked Data and Open Data, because as many other commentators have already pointed out linked data need not necessarily be open and open data need not necessarily be linked.

1. linked data does not imply open data
2. open data does not imply linked data
3. linked data does not imply RDF data
4. RDF data does not imply linked data

(4) may seem a bit unintuitive, but it is possible to have a set of RDF triples where none of the objects are URIs, ie they are all literals. This would NOT be linked data in my book. Or one could have a set of RDF triples using a set of proprietary URNs instead of HTTP URLs, and again this would not be linked data in my book.

There are actually multiple standards that came about when the semantic web was coming into being. RDF has just been used more and has been taught more. However, it is simply not flexible enough for some types of data sets. Firmly linking RDF as intrinsicly required for Linked Data, therefor, seems like a dangerous game that furthers nothing and may very well leave out the linking of some open data that would be better served by other semantic technologies.