schema.org, Wikidata, Knowledge Graph:

strands of the modern semantic web

For some balance, I worked for eight years at IBM on DB2 Universal Database.

So I'm someone who is passionate about collecting and organizing the world's knowledge.

Knowledge graph IDs in the middle

Wikidata IDs and relations at the edges

schema.org data as bugs (raw material for the semantic web)

A deliberately crappy diagram illustrating an idea
of how these three pieces (amongst others) can fit
together to make the modern semantic web.
Of course each of those nodes can in turn relate to other nodes,
and have their own statements, and the identifiers help us link
up that data.

Laziness

The virtue is about the avoidance of future work.

Virtues of the Perl Programmer, Larry Wall

In 2012 Ade Oshineye was introducing Google+ Pages for
businesses. I asked if Google would offer a
write API that would allow updates to be automated
and avoid having yet another place to update
business hours; he tipped me off to schema.org, and
so began five years of hard work in pursuit of
laziness.

Linked open data aspires to be lazy

Publish and let the data be consumed

Avoid the hard work natural of language parsing / entity recognition and disambiguation

Pull the data you need from various aggregators

Something like coral dispersing clouds of
reproductive cells in the hopes that some might
find a cell-mate (hah), while serving the broader
ecosystem as nourishment...

Who is the governor of Ohio?

Always good to get a lay of the landscape when travelling to a new place.

You can do some fun things with Google and Bing these days...

When was the governor of Ohio born?

Who is the governor of Ohio? When was he born?

You're undoubtedly familiar with these fact cards from the past few years.

In Google land, this is the domain of the Knowledge Graph.

Microsoft calls their version Bing Knowledge.

Knowledge Graph sources

Freebase

Wikipedia / Wikidata

Crawling the web

Licensed data

Web pages often include tabular data, as well as human-annotated
structured data (OpenGraph, schema.org, and the like). (Google Knowledge Vault)

Text, images, and data extracted from Wikipedia made up much of
the Knowledge Graph, now found in Wikidata.

Knowledge Graph IDs

So the Knowledge Graph gives you IDs for entities that you can associate and augment with your own data.

Those are the same IDs you get back from the Discovery Widget.

Knowledge Graph API problems

Limited constraints = poor precision

Can't limit to name of the entity

Type mapping to schema.org loses more precision

A Fictional character is a Thing, not Person

Simplistic entity results: no facts, no relationships

When was John Kasich born? - no birthdate returned

Search cannot be constrained to just the name of the entity, so "Buffy" returns references found anywhere in the description of the entity

Compounded by having to use https://schema.org/Thing for many old Freeweb types like "Fictional Universe" or "Fictional Character" -- "Person" filters out fictional characters, so books / tv shows / actors who played the role show up

The results contain only disconnected entities; none of the relationships to other entities in the graph are reflected in the results. Many of the properties used in Google's results are missing (e.g. birth date)

The Graph is a lie!

One could download the 250GB Freebase data dump and reconstruct the relationships based on the IDs (most of the KG IDs were inherited from Freebase), but that data is now rapidly aging. Who's the President-Elect now?

Knowledge Graph: entity relations

Note: The Knowledge Graph Search API
returns only individual matching entities, rather
than graphs of interconnected entities. If you need
the latter, we recommend using data dumps from
Wikidata instead.

This is a cautionary tale: if you don't supply
structured data, machines have to make a best guess at
interpreting your page content. In this case, the modal
UI is inserted dynamically as the first node of the
DOM, and now that search engines can render dynamic
HTML, we get this sort of modal mess.

<head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# medium-com: http://ogp.me/ns/fb/medium-com#">
<title>LIVE REVIEW: Midpoint Music Festival — Cincinnati, OH – The Owl Mag – Medium</title>
<meta name="title" content="LIVE REVIEW: Midpoint Music Festival — Cincinnati, OH">
<meta property="og:title" content="LIVE REVIEW: Midpoint Music Festival — Cincinnati, OH">
<meta property="og:url" content="https://medium.com/the-owl-mag/live-review-midpoint-music-festival-cincinnati-oh-a922af156600">
<meta property="og:image" content="https://cdn-images-1.medium.com/proxy/1*MXL-j6S8fTEd8UFP_foEEw.png">
<meta name="description" content="Cincinnati is not a music city by any means. Numerous bands skip the city in lieu of Columbus, OH on national tours and the city’s hottest music venues reside over the river in Kentucky. Midpoint…">
<meta property="og:description" content="Cincinnati is not a music city by any means. Numerous bands skip the city in lieu of Columbus, OH on national tours and the city’s hottest music venues reside over the river in Kentucky. Midpoint…">
<meta property="og:site_name" content="Medium">
<meta property="og:type" content="article">
</head>

<head prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# medium-com: http://ogp.me/ns/fb/medium-com#">
<title>LIVE REVIEW: Midpoint Music Festival — Cincinnati, OH – The Owl Mag – Medium</title>
<meta name="title" content="LIVE REVIEW: Midpoint Music Festival — Cincinnati, OH">
<meta name="description" content="Cincinnati is not a music city by any means. Numerous bands skip the city in lieu of Columbus, OH on national tours and the city’s hottest music venues reside over the river in Kentucky. Midpoint…">
<meta name="twitter:description" content="Cincinnati is not a music city by any means. Numerous bands skip the city in lieu of Columbus, OH on national tours and the city’s hottest music venues reside over the river in Kentucky. Midpoint…">
<meta name="twitter:image:src" content="https://cdn-images-1.medium.com/proxy/1*MXL-j6S8fTEd8UFP_foEEw.png">
<meta name="twitter:site" content="@Medium">
</head>