Thursday, 21 January 2010

You may remember that I am interested in the extent to which we should use Semantic Web (or Linked Data) on the DCC web site. After some discussions, I reached the conclusion that we should do so, but the tools were not ready yet (this isn’t quite an Augustinian “Oh Lord, make me good but not yet”; specifically, we are moving our web site to Drupal 6, the Linked Data stuff will not be native until Drupal 7, and our consultants are not yet up to speed with Linked Data). I have to say that not all our staff are convinced of the benefits of using RDF etc on the web site, and I have had a mental note to write more about this, real soon now.

I was reminded of this recently. I wanted to phone a colleague who worked at UKOLN, one of our partners, and I didn’t have his details in my address book. So I looked on their web site and navigated to his contacts page. Once there I copied his details into the address book, before lifting the phone to give him a ring. After the call (he wasn’t there; the snow had closed the office), I thought about that process. I had to copy all those details! Wouldn’t it be great if I could just import them somehow? How could that be? UKOLN have expertise in such matters, so I tweeted Paul Walk (now Deputy Director, previously technical manager) asking whether they had considered making the details accessible as Linked Data using something like FOAF. You can guess I’m not fully up to speed with this stuff, but I’m certainly trying to learn!

Paul replied that they had considered putting microformats into the page (I guess this is the hCard microformat), and then asked me whether my address book understood RDF, or if I was going to script something? I was pretty sure the answer to the second part was “no” as I suspect such scripting currently is beyond me, and told Paul that I was using MacOSX 10.6 Address Book; it says nothing about RDF, but will import a vcard. I was thinking that if there was appropriate stuff (either hCard microformat or RDFa with FOAF) on the page, I might find an app somewhere that would scrape it off and make a vcard I could import.

Paul’s final tweet was: “@cardcc see the use-case, not sure it's a 'linked data' problem though. What are the links that matter if you're scraping a single contact?”

Well, I couldn’t think of a 140-character answer to that question, which seemed to raise issues I had not thought about properly. What are the links that matter? Was it linked data, or just coded data that I wanted? Is this really a semantic web question rather than linked data? Or is it a RDF question? Or a vocabulary question? Gulp!

After some thought, perhaps Paul was as constrained by his 140 characters as I was. Surely a contacts page contains both facts and links within itself. See the Wikipedia page on FOAF for examples of a FOAF file in turtle for Jimmy Wales; the coverage is pretty much like a contacts page.

So Paul’s contact page says he works for UKOLN at the University of Bath, and gives the latter’s address (I guess formally speaking he works in UKOLN, an administrative unit, and is employed by the University); that his position in UKOLN is Deputy Director, that his phone, fax and email addresses are x, y and z. All of these are relationships between facts, expressible in the FOAF vocabulary. With RDFa, that information could be explicitly encoded in the HTML of the page and understood by machines, rather than inferred from the co-location of some characters on the page (the human eye is much better at such inferences). So there’s RDF, right there. Is that Linked Data? Is it Semantic Web? I’m not really sure.

More to the point, would it have been any greater use to me if it had been so encoded? A FOAF-hunting spider could traverse the web and build up a network of people, and I might be able to query that network, and even get the results downloaded in the form of a vcard that I could import into my Mac Address Book. That sounds quite possible, and the tools may already exist. Or, there may exist an app (what we used to call a Small Matter Of Programming, or a SMOP) that I could point at a web page with FOAF RDFa on it. Perhaps that’s what Paul was after in relation to scripting. Maybe the upcoming Dev8D might find this an interesting task to look at?

What other things could be done with such a page? Well, Paul or others might use it to disambiguate the many Paul Walk alter egos out there. You’ll see I have a simple link to Paul’s contact page above, but if this blog were RDF-enabled, perhaps we could have a more formal link to the assertions on the page, eg to that Paul Walk’s phone number, that Paul Walk’s email address, etc.

Well I’m not sure if this makes sense, and it does feel like one of those “first fax machine” situations. However FOAF has been around for a long while now. Does that mean that folk don’t perceive an advantage in such formal encodings to balance their costs, or is this an absence of value because of a lack of exploitable tools? If so, anyone going to Dev8D want to make an app for me?

(It’s also possible of course that Paul doesn’t want his details to be spidered up in this way, but I guess none of us should put contact details on the web if that’s our position.)

By the way, I found a web page called FOAF-a-matic that will create FOAF RDF for you. Here's an extract from what it created for me, in RDF:

3 comments:

Hi chris. You might take a look at what Google have been up to, see the Social Graph API at http://code.google.com/apis/socialgraph/ (with intro video); this is used behind the scenes in some other apps, eg. http://googleblog.blogspot.com/2009/10/introducing-google-social-search-i.html

(that app consumes only the rdf/xml variant)

For an RDFa-based app, see Yahoo SearchMonkey, http://developer.yahoo.com/searchmonkey/smguide/profile_vocab.html

I realised there was another implication. Thinking back to my Glasgow days, they put a lot of effort into creating an authoritative staff list, which linked to the HR system. If your data on that staff list were wrong, then your HR data were also wrong; something that it was probably important to fix. The problem is that many departments ran their own staff lists, which rapidly became un-synchronised. If the departmental staff lists used Linked Data from theUniversity staff list, they would always be synchronised. And if the only way to fix data that looked wrong was to fix the HR data, then that too would be more accurate.

In this case, I note that Paul's University staff list entry differs from his UKOLN entry (relatively trivial; the University not yet reflecting his recent new appointment). But it would be a case where the Linked Data would just work. Similarly the DCC contacts page would use Linked data from the UKOLN contacts page, and also be up to date...

1/. The issue of a standard format *for the Web* for publishing individual contact 'record' that can be ingested into a client-side 'phone book' application. Is RDF a good choice, if the commonly used clients do not support it? But then again, perhaps this is just the usual chicken and egg problem for standards adoption.

2/. The issue of an organisation not being very well joined up in the sense of establishing an 'authoritative' contact record and reusing throughout the organisation. This kind of join-up is some sort of a goal for HEIs and has been for years. There are other, arguably simpler technical approaches to this, but the problem would seem to not be technical so much as simply one of organisational management.

3/. The question of whether or not it is useful for an organisation like UKOLN to publish its contact details in RDF (specifically FOAF) such that other organisations might make use of this in a semi-automated way. It has been some years since I created and published a FOAF file for myself - I can't even remember where iI put it now - it's likely not online any more. It was never actually used for anything useful. However, there is clearly a renewed appetite for RDF currently, so perhaps the tooling and support will finally arrive.

Which brings me the the positive part of my comment (every comment should offer something!):

Chris, if you would like the DCC to be able to expose UKOLN contact details on its Website by directly accessing RDF in the way you describe, then in the interests of experimentation I will commit UKOLN to providing RDFa embedded in its contact pages - most likely using FOAF to describe the information.

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.

Creative Commons

As believers in open science, our writing in this blog will be available under a Creative Commons Attribution licence. By adding comments to the blog, you agree for those comments to be made available under the same licence.