Seminars 2017

10.10.2017Speaker: Assoc. Prof. Dr. Svetla Boytcheva, Linguistic Modeling and Knowledge Processing Department, Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, BulgariaPlace: Moscow, 3 Kochnovsky Proezd, room 509, 18:10Abstract: Today more than 80% of the patient-related clinical information is stored as free text in the Electronic Health Record systems. During the last decade several Information Extraction system for analysis of clinical narratives were developed – for diagnosis extraction, drugs and dosage identification, recognition of complaints and related events, risk factors, etc. Despite the achievements in this area these systems are difficult to (re)use because most of them, including the associated linguistic resources, are language specific (mainly for English language) and cannot be easily adapted for other languages. Moreover, they are developed either as academic research projects or as commercial software. Usually their results are evaluated on annotated corpora manually tuned to specific tasks, so that the performance assessment is difficult as well. The presentation discussed the automatic generation of Diabetic Register from very large repository of free text clinical documents (currently 262 million pseudonymised outpatient records submitted to the Bulgarian National Health Insurance Fund in 2010-2016 for more than 5 million citizens yearly). The construction relies on advanced automatic analysis of free text information as well as on Business Analytics technologies for storing, maintaining, searching, querying and analyzing big data. Original frequent pattern mining algorithms enable to discovery of complex relations between some disorders (comorbidities) taking into account context information. The experiments confirm some known comorbidities; in addition novel hypotheses for discovery of stable comorbidities were generated. Effective explication of comorbidities can fill knowledge gaps and assist informed clinical decision making. The claim is that the synergy of modern analytics tools transforms a static archive of clinical patient records to a sophisticated software environment for knowledge discovery and prediction.

05.09.2017Title: Bounded Skeptical ReasoningSpeaker: Prof. Dr. Steffen Holldobler, TU-Dresden (Germany)Place: Moscow, 3 Kochnovsky Proezd, room 205, 13:40Time:13:40-17:40Abstract: The weak completion semantics is a new cognitive theory which has been applied to model — among others — the suppression task, the selection task, and syllogistic reasoning. In each of these applications it was necessary to apply skeptical abduction. The application of credulous abduction leads to either wrong conclusions in the suppression and the selection task or to an overall weaker performance in syllogistic reasoning. On the other hand, from a complexity point of view, computing skeptical conclusions is quite expensive. If reasoning tasks and, in particular, the sets of abducibles considered in abductive reasoning tasks become larger, then skeptical reasoning appears to be infeasible. Hence, I will argue for bounded skeptical reasoningPresentation

25.05.2017Title: Adjusting sense representations for knowledge-based word sense disambiguation and automatic pun interpretationSpeaker: Tristan Miller, Technische Universität Darmstadt (Germany)Place: Moscow, 3 Kochnovsky Proezd, room 317, 16:40Time:16:40-18:40Abstract: Word sense disambiguation (WSD) – the task of determining which meaning a word carries in a particular context – is a core research problem in computational linguistics. Though it has long been recognized that supervised (i.e., machine learning–based) approaches to WSD can yield impressive results, they require an amount of manually annotated training data that is often too expensive or impractical to obtain. This is a particular problem for under-resourced languages and text domains, and is also a hurdle in well-resourced languages when processing the sort of lexical-semantic anomalies employed for deliberate effect in humour and wordplay. In contrast to supervised systems are knowledge-based techniques, which rely only on pre-existing lexical-semantic resources (LSRs) such as dictionaries and thesauri. These techniques are of more general applicability but tend to suffer from lower performance due to the informational gap between the target word's context and the sense descriptions provided by the LSR. In this seminar, we treat the task of extending the efficacy and applicability of knowledge-based WSD, both generally and for the particular case of English puns. In the first part of the talk, we present two approaches for bridging the information gap and thereby improving WSD coverage and accuracy. In the first approach, we supplement the word's context and the LSR's sense descriptions with entries from a distributional thesaurus. The second approach enriches an LSR's sense information by aligning it to other, complementary LSRs. In the second part of the talk, we describe how these techniques, along with evaluation methodologies from traditional WSD, can be adapted for the "disambiguation" of puns, or rather for the automatic identification of their double meanings.

25.05.2017Title: Introduction to CodaLab Competitions / LaTeX for NLP researchersSpeaker: Tristan Miller, Technische Universität Darmstadt (Germany)Place: Moscow, 3 Kochnovsky Proezd, room 317, 16:40Time:16:40-18:40Abstract:This workshop will focus on tools that researchers and teachers in computer science and computational linguistics can use to evaluate and disseminate results. The first half will introduce CodaLab Competitions, a platform for running comparative evaluations of data analytics software. CodaLab Competitions can be used in the classroom to automate the evaluation of AI programming projects. It can also be used by researchers to run collaborative or competitive tasks on shared data sets. The second half of the workshop will cover LaTeX, the popular document preparation and typesetting system. Topics covered will be of greatest interest to those conducting teaching and research in natural language processing, and will include overviews of packages for linguistic and multilingual typesetting, and for the preparation of slides, homework exercises, and exams.

22.05.2017Title: Flow-networks: graph theoretical approach to study flow systemsSpeaker: Liubov Tupikina, Ecole Polytechnique (Paris, France)Time:16:40-18:10Place: Moscow, 3 Kochnovsky Proezd, room 317,Abstract: Complex network theory provides an elegant and powerful framework to statistically investigate different types of systems such as society, brain or the structure of local and long-range dynamical interrelationships in the climate system. Network links in correlation, so-called climate networks typically imply information, mass or energy exchange. However, the specific connection between oceanic or atmospheric flows and the climate network’s structure is still unclear. We propose a theoretical approach of flow-networks for verifying relations between the correlation matrix and the flow structure, generalizing previous studies and overcoming the restriction to stationary flows [1]. We studied a complex interrelation between the velocity field and the correlation network measures. Our methods are developed for correlations of a scalar quantity (temperature, for example) which satisfies an advection-diffusion dynamics in the presence of forcing and dissipation. Our approach reveals the insensitivity of correlation networks to steady sources and sinks and the profound impact of the signal decay rate on the network topology. We illustrate our results with calculations of degree and clustering for a meandering flow resembling a geophysical ocean jet. Moreover, we discuss the follow-up approaches and application of the flow-networks method [2].

22.05.2017Title: Natural language processing with UIMA and DKProSpeaker: Tristan Miller, Technische Universität Darmstadt (Germany)Time:18:10-19:40Place: Moscow, 3 Kochnovsky Proezd, room 317Abstract: This talk introduces UIMA (Unstructured Information Management Architecture), an industry-standard software architecture for content analytics. UIMA provides extensible data, component, and process models for annotating, exchanging, and analyzing unstructured data such as natural-language text. We also introduce DKPro, a family of ready-to-use natural language processing (NLP) components built on UIMA. Using UIMA and DKPro, students and researchers can rapidly develop and deploy experimental text processing pipelines. In a classroom setting, these tools are valuable because they significantly reduce the barriers to entry for learning and applying advanced NLP techniques. Using DKPro, students can start projects in text classification, discourse analysis, etc., without needing to spend time implementing lower-level NLP tasks such as morphological analysis, word sense disambiguation, or text similarity. In a graduate-level research setting, UIMA and DKPro facilitate conducting experiments in a fully reproducible manner. The talk will provide a tutorial-style overview of both frameworks, including code snippets and sample applications.

Tristan’s bio:Tristan Miller holds a doctorate in computer science from Technische Universität Darmstadt (Germany), where he is engaged as a Research Scientist in the Ubiquitous Knowledge Processing Lab. He has previously held research and teaching appointments at the German Research Center for Artificial Intelligence (Germany), Griffith University (Australia), and the University of Toronto (Canada). From 2008 to 2011 he worked as a language engineer and business analyst at InQuira, an enterprise knowledge management company subsequently acquired by Oracle. Dr. Miller's research interests lie mainly in natural language processing, and more specifically in computational lexical semantics. He has published on topics such as argumentation mining, word sense disambiguation, lexical substitution, and computational detection and interpretation of humour. He is also an ardent science popularizer, serving as an advisory panel member or contributor to non-specialist linguistics publications such as Babel: The Language Magazine and Word Ways: The Journal of Recreational Linguistics.

16.03.2017International workshop: Formal Concept Analysis for Knowledge Discovery Official siteAbstract: The International Workshop “Formal Concept Analysis for Knowledge Discovery” was held at the Faculty of Computer Science. The event brought together scientists and specialists of Data Analysis from St. Catherines (Canada), St. Petersburg, Novosibirsk, Tula, Kazan, Perm and other cities. Prof. Ivo Duentsch from Brock University (St. Catherines) made keynote talk Knowledge structures and skill assignments: Structural tools for diagnostic assessment.

06.03.2017Title: Reactive Systems: A Powerful Paradigm for Modeling and Analysis from Engineering to Biology Speaker: Thomas A. Henzinger (IST Austria) Place: Moscow, 3 Kochnovsky Proezd, room 317, 15:10 Abstract: A reactive system is a dynamic system that evolves in time by reacting to external events. Hardware components and software processes are reactive systems that interact with each other and with their physical environment. Computer science has developed powerful models, theories, algorithms, and tools for analyzing and predicting the behavior of reactive systems. These techniques are based on mathematical logic, theory of computation, programming languages, and game theory. They were originally developed to let us build a more dependable computer infrastructure, but their utility transcends computer science. For example, both an aircraft and a living organism are complex reactive systems. Our understanding and the design of such systems can benefit greatly from reactive modeling and analysis techniques such as execution, composition, and abstraction.

In this talk, I will describe the Vellvm project, which seeks to provide a formal framework for developing machine-checkable proofs about LLVM IR programs and translation passes. I'll discuss some of the subtleties of modeling the LLVM IR semantics. I'll also describe some of the proof techniques that we have used for reasoning about LLVM IR transformations and sketch some example applications including verified memory-safety instrumentation and program optimizations. Vellvm is implemented in the Coq theorem prover and provides facilities for extracting LLVM IR transformation passes and plugging them into the LLVM compiler, thus enabling us to create verified optimization passes for LLVM and evaluate them against their unverified counterparts. This is joint work with many collaborators at Penn, and Vellvm is part of the NSF Expeditions project: The Science of Deep Specications.

21.02.2017Title: Probably Approximately Correct Computation of the Canonical Basis Speaker: Daniel Borchmann, Postdoctoral Research Associate, Technische Universität Dresden Place: Moscow, 3 Kochnovsky Proezd, room 205 Abstract: To learn knowledge from relational data, extracting functional dependencies is a common approach. A way to achieve this extraction is to convert the given data into so-called formal contexts and afterwards compute exact implicational bases of them. A particularly interesting such basis is the so-called canonical basis, which is not only a basis of minimal cardinality, but for which also algorithms are known that can perform well in practice. However, all these algorithms are of high runtime complexity, i.e., are not output-polynomial, and are thus likely to fail in certain situations. On the other hand, most data sets stemming from real world applications are faulty to a certain degree, and an exact representation of its implicational knowledge – as provided by the canonical basis – may not helpful anyway. Usual approaches of considering association rules instead of implications usually do not solve this problem satisfactorily, as they still require to compute exact implication bases.

This talk wants to investigate an alternative approach of learning approximations of implicational knowledge from data. For this, we revisit the notion of probably approximately correct implication bases (PAC bases), survey known approaches and results about the feasibility of computing such bases, and shall discuss first experimental results showing their usefulness. In particular, we shall show how methods from query learning can be leveraged to obtain an algorithm that allows to compute PAC bases in output polynomial time. Finally, we shall give an outlook how attribute exploration, an interactive learning approach based on querying domain experts, can be combined with PAC bases to obtain a probably approximately correct attribute exploration algorithm.

15.02.2017 - 22.02.2017Title: From digital pixels to life Speaker: Prof. Peter Horvath, to Institute for Molecular Medicine Finland (FIMM)Place: Moscow, 3 Kochnovsky Proezd, Abstract: In his course Prof. Peter Horvath made stress on high-content screening (HCS), which includes cell biology, automated high resolution microscopy, informatics and robotics. High-content screening aims to discover small and large molecules (such as drugs, siRNAs) that change the phenotypes of cell in a desired manner. High-content analysis (HCA) refers to the analysis and evaluation of large data produced during an HCS scenario. Despite the fact that informatics was revolutionized recently, HCA suffers from the lack of solutions to the computational problems that arise and the limited computational capacity. To overcome these, recently numerous image analysis and machine learning approaches were proposed. This course gave an insight into the different most popular methods including automated microscopy, image processing, and multiparametric analysis of the data. During this course 10.000-100.000 images (virtually in this case) were created and we also developed methods to analyze them using image segmentation and supervised machine learning.

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!