Call For Papers

Researchers and industry professionals are invited to participate in the first shared task on diachronic lexical semantics in Italian (DIACR-Ita) organised within Evalita 2020, the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian, which will be held in Bologna (Italy) and will be co-located with CLiC-it 2020 (November 30th – December 3rd 2020).

DIACR-Ita is the first task on lexical semantic change for Italian, combining together computational and historical linguistics. The task challenges participants to develop systems that can automatically detect if a given word has changed its meaning over time, given contextual information from corpora.

DIACR-Ita is a twin task of the Semeval 2020 Task 1: Unsupervised Lexical Semantic Change Detection (https://competitions.codalab.org/competitions/20948), which has hosted for the first time a task on unsupervised lexical semantic change detection.

---- Task Description ----

The goal of the task is to establish if a set of (target) words change their meaning across two periods, t1 and t2, where t1 precedes t2. Following the SemEval 2020 Task 1 setting, we rely on the comparison of two time periods. In this way we tackle two issues: 1) we reduce the number of time periods for which data has to be annotated; 2) we reduce the task complexity, allowing different model architectures to be applied to it, widening the range of possible participants.

Participants will be provided with two corpora C1 and C2 (for time periods t1 and t2, respectively), and a set of target words. For each of them, systems have to decide whether a word changed or not its meaning between t1 and t2 according to the occurrences of target word(s) in sentences in C1 and C2. For instance, the meaning of the word “imbarcata” is known to have expanded (i.e, it has acquired a new sense) from t1 to t2 (originally it refers to an acrobatic manoeuvre of aeroplanes, but nowadays it is also used to refer to the state of being deeply in love with someone.) This will be reflected in different occurrences of use in sentences between C1 and C2.

The task is formulated as a closed task (i.e., participants must train their models on the data that are provided).

---- Important Dates ----

29th May 2020: Distribution of data sets for training and development
4th September 2020: Distribution of data sets for testing
4th - 24th September 2020: Evaluation windows and collection of participants results
2nd October 2020: Assessment returned to participants
TBD: report due from participants
30th November – 3rd December 2020: EVALITA 2020 (co-located with CLiC-it 2020)