Valentin Goranko, Department of Philosophy, Stockholm University

This lecture will consist of two parts. The first part will be devoted to the 30th edition of ESSLLI, where I will look back in time and will share some historical notes and recollections about ESSLLI over these 30 years. In the second part I will give a brief overview of some logical frameworks for multi-agent reasoning. In particular, I will introduce and discuss some of the most popular multi-modal logical systems for modeling and analyzing strategic reasoning in games and multi-agent systems. Besides the purely technical and intrinsically logical problems that naturally arise in that area, a multitude of conceptually new questions have emerged. These refer to the fundamental notions of strategies and strategic abilities of agents/players to achieve objectives, particularly in the context of incomplete or imperfect information. I will also discuss briefly some recent problems and ideas in the area, related to strategic reasoning in social context and to the interaction between information and strategic abilities of individual agents and coalitions.

Valentin Goranko is a professor of logic at Stockholm University. He obtained PhD in mathematical logic from Sofia University in 1988. Since then he has taught mathematics and computer science at several universities in Bulgaria, South Africa and Denmark, before joining the Philosophy Department at Stockholm University in 2014. His main area of research expertise is theory and applications of logic in computer science, AI, multiagent systems, and philosophy. He has published — alone or in collaboration — 3 books and over 100 papers and chapters in international journals, conference proceedings, research books and handbooks. He has taught courses at 12 ESSLLI schools and is the current president of the FoLLI Management Board

Judith Tonhauser, Department of Linguistics, The Ohio State University

Oehrle (1988, 1994) advocated for multi-dimensional analyses of complex linguistic expressions according to which the global properties of an expression depend on properties of its phonological, syntactic and semantic components and their respective modes of composition. One goal of his work on proof-theoretic grammars was to show that the tripartite division of signs into phonology, syntax and semantics could shed light on complex empirical patterns of natural languages. In the spirit of Oehrle’s multi-dimensional linguistic analysis, this talk explores how multiple dimensions of linguistic analysis play a role in an aspect of meaning not considered in Oehrle’s work, namely projective content, which is utterance content that the listener may take the speaker to be committed to even though the expression that contributes the content is in the scope of an entailment-canceling operator (Simons et al 2010; Tonhauser et al 2013, in print). Some well-known classes of projective content are presuppositions (e.g., Heim 1983, van der Sandt 1992), conventional implicatures (e.g., Potts 2005) and expressive content (e.g., Potts 2007, Gutzmann 2015). In the talk, I show that the extent to which a listener takes a speaker to be committed to a projective content depends on multiple dimensions, including the lexical meaning of the uttered expressions, the syntactic structure and the information structure of the uttered sentence, as well as the structure of the discourse context in which the utterance is made. I also show that different dimensions may play more or less important roles for different classes of projective content. While this research has not yet reached the level of formalization achieved in Oehrle’s work, I conclude by identifying what the differences between projective contents already reveal about empirically-adequate formal analyses of projective content.

Judith Tonhauser is an Associate Professor in the Department of Linguistics at The Ohio State University. She holds a Diploma in Linguistics from the University of Stuttgart and a Ph.D. in Linguistics from Stanford University. Judith conducts research on a variety of topics in formal semantics and pragmatics, including, most recently, nominal reference, focus and presuppositions. She collects primary data from English and Paraguayan Guaraní (Tupí-Guaraní) through one-on-one elicitation and experiments, as well as from corpora.

Jan Odijk, UIL-OTS, University of Utrecht

Boosting Linguistic Research with CLARINTuesday 14.08.2018

In this lecture I will use a specific linguistic problem to illustrate how the CLARIN infrastructure (1) enables acceleration of linguistic research, (2) enables linguistic research that is based on orders of magnitude larger data than ever before, and (3) at the same time does not require any technical or programming knowledge.

In 2011, when the activities in the Netherlands to contribute to the CLARIN infrastructure had just started, I described a linguistic problem and I described which elements the CLARIN infrastructure should offer to address this problem. In the meantime, most of these elements have been created and integrated into the CLARIN infrastructure.

The linguistic problem is the following. The Dutch words heel, erg and zeer are (near-)synonyms (meaning `very’), but, based on initial limited set of data, we conclude that they differ in their combinatorial potential: heel can modify only adjectival phrases while erg and zeer can modify adjectival (1a), adpositional (1b) and verbal (1c) phrases:

(1)

(a)

Hij was

heel / erg / zeer

blij

He was

very / very / very

glad

`He was very glad’

(b)

Hij was

*heel / erg / zeer

in zijn sas

He was

very / very / very

in his lock

`He was very glad’

(c)

Dat heeft mij

*heel / erg / zeer

verbaasd

That has me

very / very / very

surprised

`That surprised me very much’

We want to investigate (1) whether these preliminary
findings can be maintained when we investigate a much larger set of data; (2)
how children can acquire the differences in combinatorial potential in first
language acquisition.

In order to investigate this, we use (inter alia) corpus and treebank search applications for Dutch that have been created in CLARIN such as OpenSoNaR, PaQu and GrETEL, and we search in large corpora for written Dutch (SoNaR, LASSY), for spoken Dutch (Spoken Dutch Corpus), in language acquisition corpora (Dutch CHILDES corpora) and in a corpus with materials intended for primary school pupils (BASILEX).

We show that we must adapt our initial findings based on these empirical data, and though we have no account for the acquisition problem yet, we show that our investigation leads to boundary conditions that any account of this problem must meet, and speculate on some possible approaches.

Though the example is specific to Dutch, the problem is not. Differences in combinatory potential between words that are (near-)synonyms or belong to the same semantic class occur in every language, and pose an important problem that theories of first language acquisition must account for. CLARIN contains data, applications and services for multiple languages, so it enables this kind of research for a wide range of languages.

Prof. Dr. Jan Odijk is full professor of language and speech technology at UIL-OTS (University of Utrecht) since 2001. He has a background in theoretical and computational linguistics and got his PhD at the University of Tilburg in 1993. He worked in the past at Utrecht University and in a variety of companies (Philips, Lernout & Hauspie, Nuance) on theoretical syntax, machine translation, language and speech generation, and infrastructure for language resources. He has been involved in a wide range of national and European projects related to research infrastructures. He was member of the Spoken Dutch Corpus (CGN) and IMIX steering committees. He was board member of ELRA from 2000-2006, chaired the programme committee of the Dutch/Flemish language and speech technology programme STEVIN, participated in the EU project META-NET, is the Netherlands National Anchor Point for the META network, was programme director of CLARIN-NL (2009-2015) and CLARIAH-SEED (2013-2014), is national coordinator for the Netherlands in CLARIN, is CLARIAH-CORE director since 2015, and intended director of CLARIAH-PLUS (2019-2023).

A relevant similarity measure between two sequences of symbols is the Levenshtein distance called also edit distance, defined as the minimal number of symbol insertions, deletions and substitutions required to transform the first sequence into the second one. Similarity search is defined as the task of searching for all sequences in a large database which are within a certain edit distance to a given sequence. This lecture is dedicated to two topics: (i) efficient techniques for similarity search in large databases, and (ii) NLP applications of similarity search.

One common approach to similarity search can be summarized as the traversal of a relevant part of the database and filtering the matching entries from it. We start the first topic by introducing the Levenshtein automata approaches for efficient edit distance filtering. Afterwards we show efficient strategies for database traversal for selecting the similar entries. Those strategies include forward-backwards traversal and the hierarchical traversal by using a search-tree for controlling it. For the realization of the search strategies we describe the use of directed acyclic word graphs and bi-directional structures for representing the database and its infixes. Finally we discuss methods for compressing the database structures in order to cope with very large databases.

On the second topic we present the use of similarity search in NLP applications. We start with the most commonly used applications – spelling correction and error tolerant information retrieval. Then we proceed with OCR text correction and historical text normalization. At the end we present empirical evaluation results for the presented similarity search approaches and performance evaluation for its applications achieved in the OCoRrect, IMPACT and CULTURA research projects.

Stoyan Mihov is associate professor at the Institute of Information and Communication Technologies of the Bulgarian Academy of Sciences since 2006. Stoyan is responsible for the speech laboratory of the institute, where he is leading the development of speech technology for Bulgarian. He is the scientific advisor of the master program in Computational Linguistics at the Faculty of Mathematics and Informatics of Sofia University St. Kliment Ohridski.

Stoyan completed his Ph.D. at the Bulgarian Academy of Sciences and his master studies in mathematical logic at Sofia University St. Kliment Ohridski. His research interests lie in the area of Natural Language Processing techniques, more specifically finite-state automata and approximate search methods. He leaded the development of WallBreaker – the system with best performance in Track 1 (approximate string search) of the International competition on Scalable String Similarity Search and Join held as a workshop in conjunction with EDBT/ICDT 2013.

Stoyan has served on several conference and workshop program committees in the area of finite-state automata and natural language processing and participated in multiple joint European research project.