Producing Language resources

When: Monday 26 March 12.00h.

Where: 52.735+52.737

In this talk, the main research line of the group Technologies of Language Resources will be introduced: the production of language resources for applications using Natural Language Processing components. Language resources are language data that can take different forms: texts and lexicons, raw or enriched, domain specific or general, monolingual or multilingual. Applications need these datasets for every language they are to process, and in many cases also for every domain they are to be used for.

For many languages, the scarcity of available resources causes problems for having local versions of such applications, and in some cases changing the domain represents also a very expensive investment in developing new resources. Our work is focused on automatically producing language resources and in testing their performance in different types of applications. In my talk, some examples of resource production will be presented and, in particular, how we have used word embeddings for generating bilingual dictionaries out of monolingual corpora, for supporting a short text classification system for identifying opinion aspects, and, finally for mapping classifier features to a new domain to avoid retraining the classifier.