[A] Podcast: The Semantic Web and Linked Data

Interview With Teodora Petkova

Teodora Petkova discusses the importance and evolution of the Semantic Web and Linked Data.

Bio

Teodora is a philologist fascinated by the metamorphoses of text on the Web. Curious about our networked lives, she explores how the Semantic Web vision unfolds, transforming the possibilities of the written word.​ A proud (and happily sleep-deprived) mom of a ​2-years-old Alexander.

Resources

Teodora is currently working on a book with essays, called The Brave New Text. She has also developed a course called Content Writing in the Semantic Web looking at web writing in an increasingly interconnected cyberspace. In-between all this jazz, she also helps Ontotext with their content strategy and with writing a series of blog posts called Semantic Technologies in Plain English.​

Transcript

Cruce Saunders
Hello, and welcome to "Towards A Smarter World". This is Cruce Saunders, and today I'm here with Teodora Petkova. Teodora is a philologist fascinated by the metamorphoses of text on the web. She's an amazing writer and somebody I've been following for some time, as she explores the Semantic Web both in her Twitter stream, and in her really well-written blog, which you can find at teodorapetkova.com. Teodora, thanks for joining us today.

Teodora Petkova
Thanks for inviting me. I'm really happy to be with you.

Cruce Saunders
So, in your writing you focus very heavily on the Semantic Web. Many of us that have been around for a while remember the early heydays of the Semantic Web back from 2002, 2004, when there were conferences everywhere that included breathless predictions for the Semantic Web. I remember walking around the SXSW back in that time and every other session had Semantic Web content in it, and progress since then has just been slow. Many people were almost let down by this idea that there was all the promise in the Semantic Web, but it's been a slow slog.

What's happened since then? Why is the Semantic Web important, and what's new about what's happening today?

We can very easily see that with a simple Google search, if we look for an answer for something we will not get a list of keywords, we'll get a direct answer. We can also ask Siri to answer something, so in very broad strokes, this is the Semantic Web in action. This is personal assistants (in this case, I mentioned Siri) and search engines understanding our intent through highly connected data.

This is what the Semantic Web is－a web built of links between data pieces; and this is why it is so important－because it will enable a hyper-connected cyberspace where we'll be navigating things, and thoughts, and texts, seamlessly, and we will be exchanging a lot more easily.

Just like what the web did, it revolutionized everything we do, in terms of collaboration, and in terms of business, and in terms of individual relationships. The Semantic Web is poised to do the same, it's just not so visible, but it is happening. More and more companies are realizing the need for semantics in their data.

Cruce Saunders
That makes sense. What is your definition of semantic search itself in very broad strokes?

Teodora Petkova
Well, it's a search where you search for relationships, like for the meaning between the words, and the things that you are searching for. Not for the keywords themselves that represent the things, and it's again, bits of data, bits of information related to other bits in a meaningful way.

Cruce Saunders
Can you define Linked Data? Help to differentiate that from semantic search? What are some use cases for Linked Data?

Teodora Petkova
Linked Data is, again, this highly connected data where, based upon certain standards and universal agreed upon formats, data are made as a lingua franca. I'm not sure how this is pronounced. How do you pronounce this?

Cruce Saunders
Yeah, lingua franca.

Teodora Petkova
Lingua franca, yeah. This is where data is made to connect to any other data piece in any combination you might want it to, and maybe we should not try to define this in some terms. Maybe I should just tell you about the Linked Data jazz project. It's called Linked Jazz Project, and there you can see Linked Data in action.

If you go to their website you'll see a beautiful visualization of jazz musicians, of their work, of their connections, all in all you'll see a hyperconnected, again, cyberspace where you can click anywhere and understand anything through the relationships it has or will be entering with other things before your eyes.

And that is, you don't have a fixed model or a fixed thing to look at, you have a living thing before your eyes thanks to Linked Open Data – to Linked Data, sorry.

Cruce Saunders
Can you help us, help the audience understand how that's different from, for example, an author just creating a hyperlink to another HTML document, you know, from HTML to HTML. There's hyperlinks between documents, but what makes that different from Linked Data?

Teodora Petkova
Well, the links between data allow for a lot more granularity, and if you link to a document, you're linking to one entire thing and somehow closing the doors to all the tiny things that this document consists of. However, if you make links between these documents through data, you will be making links between the parts and the things these documents are mentioning and talking about.

Cruce Saunders
All those entities, yeah. So we're connecting entities with Linked Data to other entities, but what's the source? What are we linking to?

Teodora Petkova
I'm not sure I understand the question.

Cruce Saunders
If I've got some sort of object or entity inside of my article, like I'm talking about a form of jazz, what am I linking to? In order to create a Linked Data connection, how am I forming that link, that connection between my article about jazz and the rest of this hyperspace you're talking about?

Teodora Petkova
Yeah. That's a nice way to look at this. Well, you're not gonna be making the link from your document to a data piece, you will probably be linking something that your document mentions to a data piece mentioned somewhere else. For example, [if] you are talking about jazz, and you can link the word "jazz" to its URI defined by, say, Wikidata, which is the Wikipedia of data for anyone who doesn't know what that is.

If you will be linking this, your document, and the words, and the entities you're mentioning to other data sets, which contain the described thing, you will be creating this big branched thing that talks, not only to people, but also to machines.

So If a machine comes to your article and it sees the data version, the data description of jazz, the machine will know what you're talking about.

Cruce Saunders
Because it's got a common reference in DBpedia. There's some sort of- okay.

Teodora Petkova
Yeah.

Cruce Saunders
Okay. Yeah, it's interesting. Between Wikidata and DBpedia there's a bunch of entities defined that are in this kind of canonical source that machines can understand, and by connecting that to our entities that we're writing about, we're allowing the things we're talking about to be connected in a known way, or a highly validated way, because we're directly drawing lines between-

Teodora Petkova
We're making what we're talking about unambiguous.

Cruce Saunders
Unambiguous, cool. Yeah, the old phrase from the Semantic Web was "things, not strings." Can you talk a little bit about the meaning of that?

Teodora Petkova
Well, I'm not sure if this is an old Semantic Web phrase. Isn't that a Google phrase?

Cruce Saunders
"Things, not strings." You know, I've heard that for so many years. "We're creating things not strings."

Teodora Petkova
To elaborate on this, it is about computer representation that acknowledges an ecosystems' view of the different content we have. So again, if you're talking about jazz, it's not just the string or the word ‘jazz’, it's the entity and the thing jazz that is related to so much more than the mere representation in words. Does that answer your question?

Cruce Saunders
Yeah, absolutely. Absolutely. I think that it might help to connect some ideas for listeners that have heard that before and trying to understand kind of where it fits in this overall set of concepts between the Semantic Web and Linked Data, and DBpedia, and Wikidata, and what Google's doing with Knowledge Graphs. All of these are ideas that for a lot of folks have been in the ether, but aren't anchored to activities in their publishing processes.

So I wonder, can you speak to how Linked Data contributes to the Semantic Web, how much content is actually currently enriched by inked data, and then we'll maybe talk about how authors can start adapting their own process for Linked Data?

Teodora Petkova
Yeah. Well, the Semantic Web is a web of Linked Data, and it is ... Linked Data is the back bone that would enable this hyperconnected cyberspace that the Semantic Web will be. It's for the enrichment by enterprises, and by individuals. I know it has been growing slowly but steadily, which is, again, an optimistic thing to know.

There is also the Linked Open Data cloud website (http://lod-cloud.net/), which is a visualization of Linked Data sets that are available openly on the web, and from 2007 to 2017 they have grown to 1,163, and they were only 12 in 2007.

Cruce Saunders
Wow.

Teodora Petkova
Yeah. To add a little bit of flesh to this explanation. Again, these data sets and the universal format they're available in, would allow any word or any concept to be enriched with as many connections as it can possibly contain.

Cruce Saunders
Got it. Interesting. Yeah, as somebody that works with content management systems, implementations, and publishing systems of various kinds, I'm very intrigued by Linked Data publishing cycles, because we have to find a way to allow for semantic enrichment in the easiest possible way for authors, and this is why I'm so intrigued by all of the various kinds of tools out there that are helping to engender what you see as semantically markup content authoring.

Teodora Petkova
Yeah.

Cruce Saunders
As opposed to, ‘what you see is what you get’ chunks of presentation-oriented text, so authoring for the Semantic Web has been really hard just because our tools haven't caught up. Most CMSs – you've got structured elements within a content type, and eventually get to one big chunky thing called "insert text here," you know, the body text. And the body text is filled with semantic concepts that aren't really linked anywhere.

Anything that's structured that's a structured element we can automatically append semantic markup, too, in the render of the content, but we can't do that if we've got that big chunky "insert content here" body that's all semantically marked up, or should be semantically marked up, but is just presentation-oriented.

And so bridging that gap I think is one of the big challenges in publishing, especially in large enterprises, but really, truly, for anybody that's publishing content to the web.

Teodora Petkova
This is what WordLift are doing. The people from WordLift. It's a WordPress plugin, which uses Linked Data to enrich your content.

Cruce Saunders
Yes, yeah. Andrea Volpini is a really good thinker. Looking forward to him extending that beyond the WordPress environment, and I'm sure that they will. I also see products like Acrolinx out there on the market that have an opportunity to impact that same authoring life cycle, but there's others. Fonto XML, for example, is an authoring environment that will allow you to plug into something like PoolParty or other kinds of semantic suites, and pull in, essentially, a whole Linked Data ontology that can inform the authoring process, and so it's really great to see various innovators working towards this semantic relationship with the authoring environment.

I think it's still very early, in terms of where the technology will need to end up before we have massive adoption, because I think we do need mass adoption before the Linked Data hyperspace to really, truly grow.

Teodora Petkova
I agree, and maybe we should talk about the other side of the coin, which is: Google and other search engines are very much interested in understanding your text, even if you haven't marked it up properly. Again, this is going back to your question about "things and not strings," and Google's Knowledge Graph, where they categorize and link things, and are trying to create that universe of connected "things, not strings."

Cruce Saunders
Yeah. Google is trying to understand our content from the outside.

Teodora Petkova
Yeah, I agree. We need to be able to create from within. I agree with you.

Cruce Saunders
Yeah, it's the Knowledge Graph that Google is creating is a natural language processing-oriented map of our knowledge, but as publishers I think we really need to be able to create our own knowledge representation and expose that so that the robots have clean understanding of it, but also humans do, and also our publishing systems.

A big part of the business justification for these technologies is being prompted by new uses of that same semantically marked up content. We have all kinds of new ways of working with rich semantic snippets, or pieces of content that are separated from their original source and presented independently, and they might appear on a watch, or in an augmented reality environment, or in a chatbot dialogue, and all of those kinds of publication scenarios require a different kind of semantic.

I really think we're gonna get to this critical mass that we're talking about that's necessary to create a rich hyperspace.

Teodora Petkova
Yeah.

Cruce Saunders
I wonder for an author, because you've worked with authors in your writing a lot, and relate to authors. For an author that's never worked with Linked Data, what's your advice to begin to understand how to work with the Linked Data, and how they might begin to leverage that in a useful way in their writing?

Teodora Petkova
That's a nice question. It's again going back to basics, and going back to what they already know about text; and text is never a thing that stands alone. It is a living thing that is always connected to other things, and this is a great starting point.

When you create content with Linked Data in mind, you need to see what you're writing about in an ecosystem, and not to be afraid to link to all kinds of things, and to create your text as a tapestry woven out of many threads.

So maybe, again, I should say here that hypertext was conceived by Ted Nelson as a literary device that is a thing that would enable the nonlinearity of content. We don't think in a linear way.

You talked about CMSs and the way that they somehow constrain us, it's true. Maybe it's not about learning new things when writing with Linked Data in mind. Maybe it's about unlearning what our technology has constrained us to think about text.

Cruce Saunders
Interesting. Interesting. Can you speak a little more about that?

Teodora Petkova
In terms of interconnectedness?

Cruce Saunders
Yeah, and how that author's unlearning process helps to unlock the future.

Teodora Petkova
Wow, that's a million dollar question. Where does this learning process start? Well, it starts with understanding that the Semantic Web, as conceived by Sir Tim Berners-Lee, was half a web of machines and half a web of people, so at the end when writing and when authoring content for the web, we are to be aware that we are directly connecting to another human being on the other side of the screen, and that we are to aim for clarity, for inspiration, for whatever we want to communicate through our content.

I think it sounds a bit extra, but it's a nice basis to begin with. You need to think about the other person on the other side of the device, whatever that device would be, and, again, you should also start thinking about being a really useful node in the information plumbing that is happening on the web.

Just like data will be flowing freely and seamlessly, we need to open up and be ready to connect content bravely, and to perceive, and to imagine content in new ways, and to understand, to see the future. How will informational, educational, inspirational content, how would that content change given it is presented by hologram?

Cruce Saunders
Love it. I love it. It's like we're creating the human brain, and we are – but, not just one human's brain, we're creating those connections with the human brain- like the global brain.

Teodora Petkova
It's a swarm. It's the intelligence of the many. Writing and creating content we are to tap into this understanding, this core of publishing on the web. The idea that the web is a giant library where we share knowledge and experience, and make the world a better place in text.

Cruce Saunders
I think of it as kind of a sort of form of fire. That we're creating this light, this inner-connected sense of knowledge between all of these nodes or these nexuses of information and awareness, and when we write about something we're adding our fire to a global bonfire or around that topic that makes it brighter.

When we link it up we actually can make that fire brighter, or we can just have a candle sitting out there in isolated space, which is valuable, but when it's connected to the larger fire it becomes something bigger and transformative.

Teodora Petkova
Yeah, it's a beautiful way to see that. I like your fire idea, although I prefer to look at it like synapses, like the neurons connected, but not in our brain, but to the other person's' brain. Like the informational space around the earth McLuhan was imagining. It's happening now.

Cruce Saunders
It always felt to me like a bit of a process of discovery that the information space is almost something that exists, and we sort of bring it out.

Teodora Petkova
Yeah. Mm-hmm.

Cruce Saunders
But, there's this underlying information space to everything and that's a bit more of an abstract idea, but I think it's something that makes an intuitive sense to me, that the process of learning is this process of discovery, and the more we can connect up our views, our individual perspectives as humans on the big information graph out of which we emerge, the more we can understand the whole.

Teodora Petkova
Yeah, absolutely, and machine intelligence is just this container, which will be transferring and carrying this fire.

Cruce Saunders
Love it. Let's leave it there. There's many more questions I have for you about writers, and the mindset around this, but I really think this is a beautiful note to end on.

Let's wrap up our interview here. Teodora, I cannot tell you how much I appreciate you taking the time in the afternoon on the other side of the world to meet with us and talk about the Semantic Web, Linked Data, knowledge, and the human experience, and thanks for all of the wonderful writing you contribute to the wider world. I know you're inspiring others out there to pick up their torches, and their candles, and connect their neurons to the larger graph, so thanks a lot for everything you do.

Teodora Petkova
Thanks for inviting me, and let me just end with an Aaron Bradley quote: "Entities loom larger with Linked Data."

Cruce Saunders
Ah, yes! I love that. The big shadow of big data, yeah. Loom much larger with a linked shadow. All right. Well, thanks, Teodora. Have a good one, and we look forward to following your work in the months and years ahead.

Treasury

Contact [A]

[A] is the Content Intelligence Service. In partnership with leading global enterprises, [A] orchestrates content intelligence systems that unify the people, processes, and technology for omnichannel publishing and real-time personalized customer experiences at scale.