Parallel Corpora: Creation and Applications

International Symposium PaCor 2018

Madrid, November 5-7 2018

The Instituto Universitario de Lenguas Modernas y Traductores (IULMYT) and the Department of English Studies: Linguistics and Literature of Universidad Complutense of Madrid (UCM) are pleased to announce the 2nd edition of the International Symposium PACOR 2018 on Parallel Corpora at the Faculty of Filología, Universidad Complutense de Madrid (UCM) on November 5-7, 2018.

Research on parallel corpora spans a number of topics from contrastive linguistics and translation studies, lexicography, language teaching and learning to computer-assisted and machine translation. In the Natural Language Processing (NLP) community, parallel corpora are a key resource as training data for statistical machine translation, and for building or extending bilingual lexicons and terminologies. However, high-quality parallel corpora that are large enough to produce statistically reliable results are scarce, despite the development of automated methods to mine them from the Web. In recent years this scarcity of parallel corpora has motivated research on comparable corpora –pairs of monolingual corpora compiled according to the same set of criteria in different languages– to mine information about possible translations.

After the successful celebration of the 1st PaCOR 2016 Symposium at the University of Santiago de Compostela in 2016, we decided to initiate a biannual series of events, with the aim of encouraging dialogue and contact among researchers and practitioners working on building parallel corpora and those exploring such resources for various purposes. With this 2nd Symposium we would like continue with the exploration of these issues, but also to extend the range of topics to research on comparable corpora for a wide range of applications.

CONFERENCE THEMES

We welcome submissions on the following (but not limited to) research strands:

1. Design, analysis, annotation and visualization of parallel and comparable corpora for research and applications in the areas of contrastive linguistics, human translation and translation learning, computer-assisted and machine translation, language teaching and learning, lexicography and terminology.

2. Tools and methods for the creation, annotation and exploitation of parallel/comparable corpora, such as:

Automatic and semi-automatic methods for building parallel/comparable corpora

Methods to mine parallel and comparable corpora from the Web

Tools and criteria to evaluate the comparability of corpora

Parallel vs non-parallel corpora, monolingual corpora

Multimedia/multi-modal parallel and comparable corpora

3. Presentation of existing bilingual parallel and comparable corpora, in which Spanish is included or multilingual corpora in which Spanish is the pivot language.

PUBLICATION

We have an agreement with the journal Languages to publish a Special Issue on the basis of a selection of the presentations at PACOR2018.

Languages (http://www.mdpi.com/journal/languages) is an international, peer-reviewed open access journal on interdisciplinary studies of languages and linguistics, indexed in ERIH Plus. We welcome contributions within any theoretical, expermental or applied approach.

International Conference on Evidentiality and Modality 2018 (ICEM'18)

The Department of English Studies: Linguistics and Literature of the Complutense University of Madrid is pleased to announce the International Conference on Evidentiality and Modality 2018 (ICEM’18), which will take place at the Facultad de Filología, Universidad Complutense de Madrid, 19-22 September 2018.

ICEM’18 invites you to submit abstracts for the general session (papers or posters) and also welcomes proposals for theme sessions. The conference aims to cover a wide range of research concerning the domains of evidentiality and modality, with special interest in empirical work, discourse-pragmatic perspectives and crosslinguistic studies.

Madrid, 12-14 June

The overarching aim of the Textlink Action is to unify (scattered) linguistic resources on discourse structure and build systems searchable by form and meaning to allow cross-linguistic investigations. As discussed in the “Portal Use Case Focus Meeting” in Edinburgh in February 2017, a group of researchers within Textlink has taken the initiative to develop a multilingual, crosslinguistic corpus of TED talks (TED-MDB), where TED talks transcripts are annotated in the PDTB style. Currently this resource includes annotations on six languages (English, European Portuguese, Polish, German , Russian and Turkish) and is intended to be extended to new languages with richer annotations involving aspects of spoken language, which are a component of the TED talks.

To discuss issues related to cross-lingual discourse-level annotation, we will hold a meeting on Annotation of Discourse Relational Devices (DRDs): Multilingual and Multimodal Challenges in Madrid (Spain) on 12-14 June 2017.

The meeting has three main aims: a) to discuss, plan and facilitate the extension of TED-MDB to additional languages; b) to consider complementary aspects of the annotation of DRDs in spoken and written language, in both multilingual and multigenre contexts; c) to explore complementarities between multilingual and multimodal annotation using the TED talks as our testbed.

The meeting has a very practical focus: we will provide extensive hands-on-experience on the multilingual and multimodal annotation of DRDs in the TED talks, revising the methodologies used in the annotation of different languages and the annotation proposals for spoken DRDs in different genres.

If you would like to participate in the meeting, please send a message to the organisers, by April 30, indicating in (at most) one page:

what language(s) you propose to extend the TED-MDB corpus to cover;

your background (if any) in resource annotation;

what skills you can contribute to the annotation of multilingual and/or multimodal DRDs.