10 Years of Semantic Web:

does it work in theory?

Keynote at ISWC 2011,

Frank van Harmelen

Video registration of the talk (including synchronised slides) are at
VideoLectures.net.

Duck & Birdie

Apology for wrong title

Original subtitle was: "searching for universal patterns"

Conference organisers always ask you for a title when you only have a
rough idea what you want to to talk about.
Then you get esprit d'escalier and realise what you should have
said.
Better title would have been "The Semantic Web: does it work in theory?"

When looking back at 10 years of Semantic Web, there's no question
about the engineering feats we have achieved, and I'll have a bit more
to say about that later.

But: besides the engineering, did we learn any permanent,
generic, scientific knowledge? Can we discover any laws that
arise from our decade of work?

Jeff Naughton slide

talking about science is always a pretentious thing to do
(and I mean talking about science instead of just "talking science").

Jeff Naughton (leader in DB field) recently gave a talk at ICDE, which
was all about how we have organised the scientific process, and he
used the following health warning at the start of his talk. I've
freely borrowed his slide to give you the same health warning:

But notice the flawed piece of logic in the final bullet:
if you don't give the keynote, you might still well be a washed up has
been. In which case I thought I might as well give the keynote
anyway.

talking about generic laws is certainly a
pretentious thing do (and certainly in computer science), so you have
been warned.

Philosophical confession

and even worse: before I can speak about our community discovering any
scientific laws, I must first explain how I think about scientific
laws, and how there could ever be any scientific laws in computer
science.

task of science is to find the laws that govern that independently
existing world.

so, I'm not a *constructivist,

Constructivists, maintain that scientific
knowledge is constructed by scientists and not discovered from the
world. Constructivists claim that the concepts of science are mental
constructs .

I do not believe that scientific
knowledge is just a mental or social construction, and that our
scientific laws have only relative and subjective value.

Clearly, such a view realist view, with laws describing an
independently existing world would apply to physics, but what does it
have to do with computer science?

Laws about the Information Universe

Well, I believe that data, information, and knowledge have inherent
structure & properties, and

that there are laws that govern these structures & properties.

I believe we can discover these laws (just like we can discover
physics laws).

thus: just like the physical universe "exists out there" (and is not
just a mental or social or cultural construction), so is the information
universe "out there" (and is not just a mental or social or cultural
construction).

Of course, many of the actual objects in the physical universe are our
own construction (billiard balls, space ships, nuclear power
stations), but the laws that govern these objects are not just
mental/social constructs, these laws are "objective", "real", they are
"out there to be discovered".

In the same way, the actual objects in the informational universe are
our own constructs (programs, databases, languages, URI's), but the laws
that govern these objects are not just mental/social constructs,
these laws are "objective", "real", they are "out there to be
discovered".

Distorted Mirror slide

of course it is the case that our perception of these laws at any
particular time during our scientific progress will be somehow
coloured by our perceptions and social and mental constructs.

what we perceive to be the universe may well be coloured

by the limitations of our cognitive machinery,

by culturally shaped expectations and desires

by the limitations and distortions of our experimental apparatus

and in general it is hard to distinguish the "real" laws about the
external universe from cognitive artifacts and observational bias.

but that doesn't imply that all laws are only fictions of our
culturally biased imaginations.

and it is the role of science to continuously chip away at these
cognitive, cultural and historical biases to find out what the "real"
laws are like.

Now, the parallel with physics is of course a bit pretentious.

Physics is a very mature science, with a high degree of
mathematisation.

and it will be a long time before Computer Science will
reach the same degree of maturity,

Physics slide

and before we can write the beautiful sets of concise equations about
the information universe.

we cannot yet hope for such beautifully mathematised laws,
in such a concise language that fits on a very compact space

in fact, Computer Science is a very young field, and I think that
instead of comparing ourselves with physics, maybe we are more
comparable something like alchemy,

Alchemy slide

historians of science describe alchemy as a "protoscience"

it was not just a failure to turn lead into gold,

it was a "protoscience",

searching for proper goals,

proper conceptual framework

developing their experimental apparatus

and this is now recognised as having lead to the more mature sciences
of chemistry and physics that we now know.

and in fact, one of the originators of modern science, Isaac Newton,
was an active alchemist.

So there's really no negative connotation to the description of
computer science as alchemy, it just describes the fact that our
science is very young, and that perhaps we have not discovered many of
the laws about the information universe yet.

So, the central question that I will boldly (and perhaps rather
foolishly tackle in the rest of this talk is this one:

Question slide

Did a decade of Semantic Web work help to discover any Computing Science
laws?

What have we built over the past 10 years

So let's first take a look at what we actually built in the past
decade.

We can characterise what we have built over the past 10 years in 3
parts:

Babel Towers slide

We built a whole lot of vocabularies (including the languages to
represent them, the tools to construct and deploy them, etc)

Naming slide

We built a whole lot of URI's to name lots of things in the world, in
fact, many billions of URI's

Neural Network slide

We connected all of these in a very large network

Engineer slide

But all of these have been mostly treated as one very large engineering
exercise.

Did we learn any science, ideally science that is valid beyond the
particular artifacts that we have so successfully built over the past
10 years?

10 years experiment

So what I'm going to do now, is to treat the past 10 years of SemWeb
engineering as one giant experiment:

designing languages for representing information and knowledge on the web

building very many ontologies in all kinds of domains

building many ontologies in a single domain (eg medicine)

building DBPedia,

building, populating and linking the Linked Data cloud

the widespread use of RDF, RDFS and OWL across very many domains
(these are now the most widely used knowledge representation languages
ever, by a very large margin).

So take that as a giant experiment and ask the question:

If we would build the Semantic Web again, surely some things would end
up looking different, but are there things that would end up looking the
same, simply because they have to be that way?

for example

languages full of angle brackets. If you reran the experiment, surely
it would be different, because it's just an accidental choice. That
feature isn't governed by any "law in the Information Universe"
(or at least not one that I can imagine).

but other features of what we've built what turn out in essentially
the same way,

you would find the same pattern over and over again, every time we ran
the experiment.

And that is because they are governed by fundamental laws that rule
the structure and behaviour of information and knowledge.

So, let's see if we can discover any of such laws, such stable patterns
that we would rediscover by necessity every time we ran the experiment.

Now, fortunately, we don't have to start from scratch. Some well known
laws of Computer Science already can be seen to apply to our 10 year
experiment as well. I'll give you two examples:

Zipf law

Zipf law says that many datasets have long tail distributions

Roughly this means that the vast majority of some phenomenon of
interest is caused by a vast minority of items,
and that the vast majority of items (the long tail) each barely contribute
to the phenomenon

We know from our 10 year long experiment that our datasets also obey
Zipf's law, and this has been well documented in a number of empirical
studies.

this phenomenon is sometimes a blessing, sometimes a curse

nice for compression

awful for load balancing

It's important to realise that knowing Zipf's law helps us deal with the
phenomenon, both in the cases where it's a blessing (so we can exploit
it) and in the cases where it's a curse (so that we can try to avoid
it).

that's why it is worth trying to discover these laws.

Here's a second well known law from Computer Science:

Use vs Re-use

Another known law also applies:

Use vs reuse: use = 1 - re-use

(of course doní»t take linear form literally)

lesson from ontologies

Law of conservation of misery, you caní»t have it both ways

OK, so now I'll start proposing some "laws" that originate from our own
field, and from our own 10 year experiment:

Factual knowledge is a graph

the dominant life-forms in our information space is the graph.

The vast majority of our factual knowledge consists of simple
relationships between things,

represented as an ground instance of a binary predicate.

And lots of these relations between things together form a giant
graph.

Now this may sound obvious to us in this community, but stating that
factual knowledge is a graph is not obvious at all.

For example, if you would ask this question to a DB person, they'd say:
factual knowledge is a table.
And a logician would say: knowledge is a set of sentences.

I know that you can convert one form into the other

every table is a (simple) graph, and every graph can be hacked into
table format (but not so nicely)

every graph is a (simple) set of sentences, but not always the other
way round,

but that's a bit beside the point: just because all our programming
language are Turing complete doesn't mean that there aren't very real
and important differences between them.

So in the same way, graphs, tables and sets of sentences are all really
different representations, even with the theoretical transformations.

And the law that I propose says that factual knowledge is a graph

and the DB people may think it's a table, but actually, many of their
tables with lots of foreign keys are really encoding graphs.

and the logicians may think it's a set of sentences, but that
representation is wildly overshooting the mark (and typically not even
aimed at or used for representing factual knowledge)

So let's switch to a less controversial law;

Terminological knowledge is a hierarchy

this law has been rediscovered in knowledge representing and
information modelling many times over.

the details may differ, but the notion of simple hierarchies with
property inheritance is widely recognised as the right way to
represent terminological knowledge.

And this observed repeated invention, makes this a much stronger law.

So to say: this experiment has already been rerun many times in the
history of computer science, and this has proven to be a stable
finding.

So now I've talked about both factual and hierarchical knowledge. But
how do these two types of knowledge compare?

Terminological knowledge is much smaller than the factual knowledge

or alternatively, in a picture:

Small hierarchy, big graph

And again, this may sound obvious to all of us in this audience, but
really it wasn't all that obvious before we started the 10 year
experiment. And in fact, it sharply contrasts with a long history of
knowledge representation

traditionally, KR has focussed on small and very intricate sets of
axioms: a bunch of universally quantified complex sentences

but now it turns out that much of our knowledge comes in
the form of very large but shallow sets of axioms.

lots of the knowledge is in the ground facts, (not in the quantified
formula's)

And with this law, we can even venture to go beyond just a qualitative
law, and put some quantitative numbers on it.

Jacopo numbers

Here are some numbers obtained by a Jacopo Urbani, a PhD student in our
lab (and some of you will have seen these figures in his presentation
yesterday), in the session on reasoners:

three of the largest datasets around (two real, one artificial)

compute full deductive closure of schema hierarchy only

runtime counted in seconds or small number of minutes

then compute full deductive closure of schema + instances

then runtime counted in hours

notice that this is now using an interesting measure of "size" here:
we're not just counting triples, but we're measuring somehow the
complexity of these triples by seeing how expensive it is to do
deduction over them.

And we observe that the graph is 1-2 orders "larger" or than the
schema.

So, if we revisit the diagram I sketched before:

Small hierarchy, big graph

then the size of the hierarchy (although already small) is actually
still vastly overstated. If we have to believe the numbers on the
previous slide, the real size of the terminological knowledge wrt to
the size of the factual knowledge is like this

Now the black dot representing terminological knowledge is 2 orders of
magnitude smaller than the size of the factual graph.

To put this in a slogan:

"It's the A-box, stupid"

knowledge is much more dominated by specific instances than by general rules

Apparently, the power of represented knowledge comes from
from representing a very small set of general rules that are
true about the world in general,

together with a huge body of rather trivial assertions that
describe things as they happen to be in the current world (even though
they could easily have been different).

And again, understanding this law helps us to design our distributed
reasoners. It is the justification that when building parallel
reasoners, many of us just take the small schema and simply replicate it
across all the machines: it's small enough that we can afford to do
this.

We've already seen that the factual knowledge is very large but very
simple. We can ask ourselves how simple or complex the terminological knowledge
is.

Terminological knowledge is of low complexity

When we go around with our data telescope, and we try to observe what
real ontologies look like when they are out there in the world, what do
we see?

Telescope with OWL

We see very wide spread of expressivity in ontologies, all the way from
undecidable OWL Full to very simple RDF hierarchies. But this spread
is very uneven: there are very many lightweight ontologies, and very few
heavyweight ones.

This is of course well captured by Jim Hendler's timeless phrase:

A little semantics goes a long way (JH)

And combining both this law and the previous law, we can now see that
his "little semantics" means both: low expressivity and low volume

We could also phrase this as "the unreasonable effectiveness of
low-expressive KR"

And there is another way in which this law is true:

Of course it is nice that we can express also the highly expressive
ontologies in our languages (like OWL2).

And some of these languages have very scary worst-case complexity
bounds.

But when writing ontologies in these expressive languages, we often
find that the behaviour of the reasoners for these expressive
languages perform quite well.

In other words: the information universe is apparently structured
in such a way that the double exponential worse case complexity bounds
don't hit us in practice.

If the world of information would be worst case, we wouldn'í»t have been
able to deal with it, but apparently the laws of information make the
world such that we can deal with the practical cases.

So: for highly expressive KR we could say that it works better in practice then in theory

The next law has of course been staring us in the face ever since we
started this work on the semantic web (and it has been staring database
people in the face for quite a bit longer):

Heterogeneity is unavoidable

It's for a good reason of course that I choose a Tower of Babel to
symbolise our vocabularies:

Tower of Babel slide

A crucial insight that perhaps distinguishes the work in this community
from many earlier pieces of work is that instead of fighting
heterogeneity, we have seen that it's inevitable anyway, and that we
might as well live with it.

And actually, I would claim that the fact that we have embraced this
law (instead of fighting it) has enabled the enormous growth of the Web
of Data.

Compared to many previous attempts, which try to impose a single
ontology, the approach of let a 1000 ontologies blossom has been a key
factor for the growth of our datasets.

But of course, embracing heterogeneity is nice when you are publishing
data, but it's not so nice when you are consuming data. So heterogeneity
is not only an opportunity, it's also a problem. And the question is:
can we solve that problem.

Heterogeneity is solvable

I'll argue that yes, heterogeneity is solvable, but maybe not in the way
that our community likes to hear).

We can see what's going on by looking at the Linked Data cloud.

LOD cloud

This is the picture we all know so well,

it's carefully hand crafted, and
kudos to the hard work that went into it,

but actually the picture is also somewhat misleading.

It (no doubt unintentionally) suggests an evenly spread out cloud of
lots of colourful datasets.

The true image of "let a 1000 ontologies blossom".

It suggests lots of connections between lots of datasets

But that's not actually the structure of the Linked Data cloud.

Instead, the Linked Data cloud looks like this:

circular cluster map

This is a picture generated on the LOD cloud as it was last week,

it shows a heavily clustered structure.

And here's the same picture,

but now with some more emphasis on displaying the clusters;

linear cluster map

so, LOD cloud is not evenly connected

(unlike traditional LOD cloud diagram),

but highly clustered

with strong links inside the clusters

and low links between the clusters)

And how did these clusters come about? T

not by ontology mapping,

but mostly by a combination of social, economic and cultural processes:

Why is SNOMED so important in the medical domain?
Partly because it was the first to be around

Why will schema.org be so important:
Because it carries the economic weight of 90% of the web-search market

etc.

Does that mean that ontology mapping should be abandoned?

No, it doesn't.

Many of the links inside these clusters are created by
algorithmic ontology mapping.

But I would claim that this is only possible inside such a
cluster, ie the fine-grained structure of the graph,

whereas the the course-grained structure of the graph is
determined through social, economic and cultural processes.

For the next law, we must remember that we are not only a semantic
web community, but also a semantic web community. So let's look at distribution:

speed decreases with distribution, centralisation is necessary

The original dream of this community has sometimes been formulated as
turning the Web into a database.

earth globe slide

But unfortunately, observations from our 10 year experiment tell us rather
the opposite:

the Web is a good platform for data publication,

but it's a pretty bad platform for data consumption.

Indeed, the distributed model for data-publishing is a key
factor that has enabled the growth of the Web and indeed of the Web of
Data, but for data-consumption, physical centralisation works
surprisingly well.

And this is not just us finding this out.

Google is combining our distributed publishing with their centralised
processing,

Facebook is combing our distributed publishing with their centralised
processing,

Wikipedia, etc.

So, you might think that centralisation would become a bottle neck.
wrong, distribution is the bottle neck,

The Web is not a database, and I don't think it ever will be.

So if all this massive data has to be in one central place to process
it, how are we going to cope? Well, the good news from the Information
Universe is that

speed increases with parallelisation

at least for our types of data. I'll show you how well this works.

Jacopo graph 1

This was the performance of triple stores on forward inferencing,
somewhere in 2009.

Jacopo graph 2

and this is how much parallelisation improved the performance. So
apparently, the types of knowledge and data that we deal with are very
suitable for parallelisation.

And it's interesting to see that the previous laws actually help us to
make this possible: the combination of

knowledge is layered

Contrary to the other laws, this law does not come so much yet from our
own observations in this field. But other fields tell us that knowledge
is like a set of Russian dolls:

Russian dolls

with one doll nested inside the other.

From fields like

Cognitive Science,

Logic,

Linguistics,

Knowledge Representation

we know that
statements of knowledge need not only refer to the world, but that they
may refer to other bits of knowledge, creating a multi-layered
structure.

The examples are plenty: we may say that a fact in the world is true,
and then we can say

what the certainty of that statement is,

or what the provenance of that statement is,

or what our trust in that statement is

or at what date that statement was made,
etc.

Now curiously enough, there is lots and lots of demand in our
community for this kind of layered representation, but our
representation language serve this need very poorly.
Re-ification can be seen as a failed experiment to obtain such
layering, and now people are abusing named graphs because there is
nothing better.

So, being more aware of this law would have helped us to create better
representation language sooner.

So, we're reaching the end of the talk, final slide in sight:

Final slide

and I'll end with the same slide that I started with:

does it work in theory?

well, what theory?

My hope for this talk is that
- many of you might disagree with some of my proposed "laws"
- and some of you may even disagree with all of them

but regardless of that,

I hope that I will have prompted you to start thinking about the
notion of laws in the Information Universe:

that such laws may exist

and it's our task to discover them

And this has very concrete impact on how we organise our community:

it's an invitation to journal editors and conference chairs to also
consider papers that have the ridiculously ambitious aim to discuss
one of these laws

and it's also a challenge to you:

Of course we won't really redo the last 10 years of our experiment, but
when you do your research and write your papers, try to think about what
are the repeatable patterns, these laws, and try to separate the
incidental choices you make from the fundamental patterns you are
uncovering.