Disruptive Technologies Director (cool job title!) Anita de Waard from Elsevier was asking what were the conclusions of the workshop. So here is an incomplete summary: Roughly speaking, people agreed to disagree (again). Keynote speaker Barend Mons argued that redundant data should be eliminated through the use of “nano-publications” and micro-attribution in his entertaining but controversial keynote. Some people in the audience disagreed with this. Greg Tyrelle thinks that redundancy is a feature, not a bug, in the Web and we have to deal with it. Alan Ruttenberg argued that semantic web reasoners are required to clean up and sanity check all the messy and noisy biological data but emphasised the importance of Computer Scientists learning to speak Biologists language.

The good thing about this workshop is its size: small, friendly but internationally attended. Thanks to M. Scott Marshall, Albert Burger, Adrian Paschke, Paolo Romano and Andrea Splendiani for organising another good workshop, hope to see you again next year (if not before).

The Extensible Markup Language (XML) has been around for just over ten years, quickly and quietly finding its niche in many different areas of science and technology. It has been used in everything from modelling biochemical networks in systems biology [1], to electronic health records [2], scientific publishing, the provision of the PubMed service (which talks XML) [3] and many other areas. As a crude measure of its importance in biomedical science, PubMed currently has no fewer than 800 peer-reviewed publications on XML. It’s hard to imagine life without it. So whether you’re a complete novice looking to learn more about XML or a seasoned veteran wanting to improve your knowledge, register your place and find out more by visiting xmlsummerschool.com. I hope to see you there…

February 26, 2008

So, no-one told you life was going to be this way
Your job is a joke, you are broke, your love life is DOA.
It is like you are always stuck in second gear
Well, it has not been your day, your week, your month, or even your year…

Tim Berners-Lee delivered his one hour keynote at the AAAI’06 conference yesterday on the Semantic Web, after an introduction from Yolanda Gil. Tim gave an impassioned speech covering the last 16 years of the web and discussed the future of sharing data on the web using persistent URI’s and W3C standards like RDF and OWL. At the end of it all, there were some searching questions from Peter Norvig, Director of Research at Google Labs.

Peter opened his questions by saying.

Many people usually ask me, when I stand up and ask questions after keynote speeches at conferences:

“Peter, what do you have against the Semantic Web?”

Here is roughly what Peter said, the semantic web will never work because:

People are stupid: Google has lots of experience of dealing with peoples stoopidity on the web. Many people don’t write well-formed HTML, they don’t run web servers properly and they keep changing what their URIs identify. It sucks, but this is the world, imperfect and messy and we just have to deal with it. These same people can’t be expected to use the Resource Description Framework (RDF) and the Web Ontology Language (OWL), which are much more complicated and considerably less fool-proof. (Perhaps you could call this the dumb-antic web?!)

Tim replied that a large part of the semantic web can be populated by taking existing relational databases and mapping them into RDF/OWL. The structured data is already there, it just needs web-izing in a mashup-friendly format. (What I like to call the romantic web: people will publish their data freely on the web this way, especially in e-science for example. This will allow sharing and re-use in unexpected ways.)

People are competitive: People working for commercial companies and market leaders can’t be expected to just put their raw data on the web as RDF/OWL, they have little interest in standards. This is how they make a living, beating their competitors and locking their customers into proprietary data formats, so they can keep selling them software / hardware to use their data. (Analagous problems in science, scientists can be reluctant to share and publish data, if someone else will make new discoveries with it and claim all the glory)

Tim replied that most bookstores thought putting their stock levels and prices on the web was a bad idea as it would give sensitive information away to their competitors. However, they soon realised that this would allow their customers to search, browse and eventually (kerr-ching) buy their books.

People cheat and lie: People lie about what their content is about, again, Google is on the receiving end of this. Cheats try to fool the PageRank algorithm by saying their web pages are about books or movies, when they are really about Viagra or Pornography. The same fate awaits RDF and OWL, cheaters will use ontologies to tell bare-faced lies about their data. (What I like to call the satanic web: people do evil things).

Tim didn’t have any good answers to this, although later in the day there were some papers touching on the issue of Trust and Policies in the semantic web layer cake.

These lively debates are raging on un-abated, in the corridors, lecture theatres and bars. AAAI is now in full swing and its great to be here!