3 And, if you resist until the end you will receive a prize!!! A BabelNet t-shirt!!! [model is not included] 3

4 Part 1: Identifying multilingual concepts and entities in text 4

5 The driving force Web content is available in many languages Information should be extracted and processed independently of the source/target language This could be done automatically by means of high-performance multilingual text understanding 5

6 Word Sense Disambiguation and Entity Linking «Thomas and Mario are strikers playing in Munich» Entity Linking: The task of discovering mentions of entities within a text and linking them in a knowledge base. WSD: The task aimed at assigning meanings to word occurrences within text. 6

7 The general problem POLYSEMY Natural language is ambiguous The most frequent words have several meanings! Our job: model meaning from a computational perspective 11

8 Monosemous vs. Polysemous words Monosemous words: only one meaning Examples: plant life internet Polysemous words: more than one meaning Example: bar a room or establishment where alcoholic drinks are served a counter where you can obtain food or drink a rigid piece of metal or wood musical notation for a repeating pattern of musical beats 12

9 Thesauri Groups words according to similar meaning Relations between groups (e.g., narrower meanings) Roget s Thesaurus (1911) Machine Readable Dictionaries Enumerates all meanings of a word Includes definitions, morphology, example usages, etc. Oxford Dictionary of English, LDOCE, Collins, etc. But how do we represent and encode semantics? Computation Lexicons Repositories of structured knowledge about a word semantics and syntax Include relations like hypernymy, meronymy, or entailment WordNet 15

21 1. From six to 50 languages; 2. From two resources to six; BabelNet From 5 million to 9.3 million synsets; 4. From 50 million to 68 million word senses; 5. From 140 million semantic relations to 262 million semantic relations 27

33 Exercise 1: Retrieve the senses of a given lemma Given a word, e.g. home, retrieve all its senses and corresponding synsets in all supported languages: SELECT DISTINCT?sense?synset WHERE {?entries a lemon:lexicalentry.?entries lemon:sense?sense.?sense lemon:reference?synset.?entries rdfs:label?term. FILTER (str(?term)="home") } LIMIT 10 39

34 Exercise 2: Retrieve the senses of a lemma for a certain language We can restrict to a given language, e.g. English: SELECT DISTINCT?sense?synset WHERE {?entries a lemon:lexicalentry.?entries lemon:language "EN".?entries lemon:sense?sense.?sense lemon:reference?synset.?entries rdfs:label?term. FILTER (str(?term)="home") } LIMIT 10 40

35 Exercise 3: Retrieve the translations of a given sense For instance, given the sense SELECT?translation WHERE { }?entry a lemon:lexicalsense.?entry lexinfo:translation?translation. FILTER (str(?entry)="http://babelnet.org/2.0/home_en/s n") 41

42 Entity Linking in a Nutshell Thomas (target mention) Thomas and Mario are strikers playing in Munich (context) EL system knowledge Named Entity 48

43 Entity Linking EL encompasses a set of similar tasks: Named Entity Disambiguation, that is the task of linking entity mentions in a text to a knowledge base Wikification, that is the automatic annotation of text by linking its relevant fragments of text to the appropriate Wikipedia articles. 49

44 The multilingual aspect of disambiguation In both tasks, WSD and EL, knowledge-based approaches have been shown to perform well. What about multilinguality? Which kind of resources are available out there? Open Multilingual WordNet 51

45 A Joint approach to WSD and EL The main difference between WSD and EL is the kind of inventory used 52

46 But BabelNet can be used as a multilingual inventory for both: 1. Concepts Calcio in Italian can denote different concepts: 2. Named Entities The text Mario can be used to represent different things such as the video game character or a soccer player (Gomez) or even a music album 53

47 Calcio/Kick in BabelNet

48 Calcio/Calcium in BabelNet

49 Calcio/Soccer in BabelNet

50 Disambiguation and Entity Linking together! BabelNet is a huge multilingual inventory for both word senses and named entities! 57

51 So what? 58

52 Babelfy: A Joint approach to WSD and EL [Moro et al., TACL 2014] Based on Personalized PageRank, the state-of-the-art method for graph-based WSD. However, it cannot be run for each new input on huge graphs. Idea: Precompute semantic signatures for the nodes! Semantic signatures are the most relevant nodes for a given node in the graph computed by using random walk with restart Andrea Moro and Alessandro Raganato and Roberto Navigli Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2. 59

63 Open Problems: grammar-agnostic All current approaches exploit: POS tagging Lemmatization Noisy (>90% for English, but much less on morphologically rich languages). How to improve? Waiting for better POS taggers Character-based analysis of text 92

64 Open Problems: language-agnostic All current approaches exploit: Knowledge of the input language Automatic language recognition How to improve? Waiting for better language recognition systems Unify the lexicalizations of different languages Noisy (>90% for English, but much less on resource poor languages). Moreover, text which consists of text in multiple languages will be wrongly analyzed for sure! 93

65 Open Problems: fragment recognition Most of the current approaches exploit: Named Entity Recognition Not overlapping text assumption How to improve? Waiting for better NER system Overlap and match everything Noisy (>80% for English, but much less on resource poor languages). Moreover, when assuming that entities and word senses should not overlap you lose information! 94

66 Hands-on Session: Babelfy 99

67 Exercise 1 Go to babelfy.org Type in or copy/paste your favourite text in your favourite language in the text area Select the text language Click on «Babelfy!» Understand the difference between green and yellow balloons 100

72 NIF Reuse of existing standards (such as RDF, OWL 2, the PROV Ontology, etc.) NIF identifiers are used in the Internationalization Tag Set (ITS) Version 2.0 Royalty-free and published under an open license. Driven by its open community project NLP2RDF good uptake by industry, open-source projects and developers. 105

73 106

74 Annotate text with a few lines of code! Set the language Set the text Obtain the annotations Convert into RDF 107

75 Babelfy2Nif: an example Toy sentence: the semantic web is a collaborative movement led by the international standards body world wide web consortium 108

76 Babelfy2Nif: an example the semantic web is a collaborative movement led by the international standards body world wide web consortium nif:context defines the overall text Text is modelled by fragments Fragments are identified by left and right indices The BabelNet synset (i.e., the annotation of the fragment) 109

79 Exercise 1 Take the first paragraph of the English ISWC Wikipedia page Web_Conference Feed it in the property file and produce 2 outputs: 1. With LONGEST_ANNOTATION_GREEDY_ALGORITHM algorithm and NTRIPLE rdf format 2. With FIRST_COME_FIRST_SERVED_ALGORITHM algorithm and TURTLE rdf format 112

80 To summarize 114

81 To summarize We have taken you through a tour of: A very large multilingual semantic network: BabelNet A state-of-the-art WSD and EL system: Babelfy 115

82 Acknowledgements European Research Council and the EU Commission for funding our research Maud Ehrmann and Andrea Moro for their help with slides 116

Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group

Towards the Integration of a Research Group Website into the Web of Data Mikel Emaldi, David Buján, and Diego López-de-Ipiña Deusto Institute of Technology - DeustoTech, University of Deusto Avda. Universidades

LinkZoo: A linked data platform for collaborative management of heterogeneous resources Marios Meimaris, George Alexiou, George Papastefanatos Institute for the Management of Information Systems, Research

We have big data, but we need big knowledge Weaving surveys into the semantic web ASC Big Data Conference September 26 th 2014 So much knowledge, so little time 1 3 takeaways What are linked data and the

Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised

Natural Language Processing: current projects and research at the IXA Group IXA Research Group on NLP University of the Basque Country Xabier Artola Zubillaga Motivation A language that seeks to survive

Semantic Lifting of Unstructured Data Based on NLP Inference of Annotations 1 Ivo Marinchev Abstract: The paper introduces approach to semantic lifting of unstructured data with the help of natural language

How semantic technology can help you do more with production data Doing more with production data EPIM and Digital Energy Journal 2013-04-18 David Price, TopQuadrant London, UK dprice at topquadrant dot

XML, Seman9c Web and Content Analy9cs XML Prague Pre- conference 2014 Felix Sasaki DFKI / W3C Fellow 1 What do you need to follow this session? Ideal: a computer with internet access, to be able to provide

ViewerPro enables traders to automatically capture the impact of news on their trading portfolios Integrate Emerging News into Trading Strategies With ViewerPro, you can automatically identify the impacts

SPARQL UniProt.RDF Everyone has had some introduction slash knowledge of RDF. Jerven Bolleman Developer Swiss-Prot Group Swiss Institute of Bioinformatics Get these slides! https://sites.google.com/a/jerven.eu/jerven/home/

Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

Walk Before You Run Prerequisites to Linked Data Kenning Arlitsch Dean of the Library @kenning_msu First, Take Care of Basics Linked Data applications will not matter if search engines can t first find

LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model 22 October 2014 Tony Hammond Michele Pasin Background About Macmillan

Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that

LIDER: FP7 610782 Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe Deliverable number Deliverable title Main Authors D4.3.1 Preliminary report on

Getting Started with GRUFF Introduction Most articles in this book focus on interesting applications of Linked Open Data (LOD). But this chapter describes some simple steps on how to use a triple store,

Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search