ArMExLeR: Arabic Meaning Extraction through Lexical Resources. A general-purpose data mining model for Arabic texts is proposed which employs a chained pipeline of existing public domain and published lexical resources (Stanford Parser, WordNet, Arabic WordNet, SUMO, AraMorph, A Frequency Dictionary of Arabic) in order to extract a weakly hierarchised, single-predicate level, representation of meaning. This kind of model would be of high impact on the study of the computational analysis of Arabic for there is no such comparable tool for this language, and will be a challenge for the nature of its specificities. One should, in fact, cope with the unique writing system that is mostly consonant-based and doesn't always mark vowels explicitly. This is crucial when you want to analyze an Arabic corpus for the same consonantal ductus may be read in several ways.

The SALAH Project: Segmentation and Linguistic Analysis of ḥadīṯ Arabic Texts. A model for the unsupervised segmentation and linguistic analysis of Arabic texts of Prophetic tradition (ḥadīṯs), SALAH, is proposed. The model automatically segments each text unit in a transmitter chain (isnād) and a text content (matn) and further analyses each segment according to two distinct pipelines: a set of regular expressions chunks transmitter chains in a graph labeled with the relation between transmitters, while a tailored, augmented version of the AraMorph morphological analyzer (RAM) analyzes and annotates lexically and morphologically the text content. A graph with relations among transmitters and a lemmatized text corpus, both in XML format, are the final output of the system, which can further feed the automatic generation of con-cordances of the texts with variable-sized windows. The model results can be useful for a variety of purposes, including retrieving information from ḥadīṯ texts, verify the relations between transmitters, finding variant readings, supplying lexical information to specialized dictionaries.More info and contacts: See the pages of Giuliano Lancioni and Marco Boella.

Categorial grammar for information retrieval in Arabic: this project aims to explore the 'computational' fitting of some grammar models, such as the Combinatory-Categorial Grammar, in order to design information retrieval tools for Arabic texts.More info and contacts: See the pages of Giuliano Lancioni and Marco Boella.

Computational analysis of Alchemic corpora from the work of Jabir Ibn HayyanMore info and contacts: see the page of Ilaria Cicola

Linguistics, Sociolinguistics and History of language

Ongoing works:

Rhetorical functions and loci of diglossic code-switching in Arabic - the project deals with the issue of the diglossic code-switching in the Arabic spoken language and especially in Christian religious discourse. The main aim is of describing rhetorical inherent value, the rhetorical functions and loci in the diglossic code-switching in the spoken language.More info and contacts: see the page of Marco Hamam

Words use in arabic comics focused on semantic fields and on lexical structures. Focus on the main features of the arabic comics in order to work on tagging.More info and contacts: see the page of Milena Di Canio

Lexicography and Philology

Ongoing works:

Tagging models of Classical Arabic medical texts: the projects aims to define a reasonably complete tagset, compliant with the Text Encoding Initiative standards, to tag all relevant information in Classical Arabic medical texts.More info and contacts: see the page of Francesca Romana Romani

Defining a wordlist of Arabic medical terms: the projects aims to compile a wordlist of currently used medical technical terms together with definitions and English (and Italian) translations.More info and contacts: see the page of Francesca Romana Romani

Ghafiqi Project (launched by McGill’s Institute of Islamic Studies and The Osler Library): the aim of the project is to produce a critical edition of the Arabic text, with translation and commentary, of Kitāb 'l-'adwiya 'l-mufrada by al-Ġāfiqī.More info and contacts: see the page of Eleonora di Vincenzo

Language teaching

Ongoing works:

Teaching ESA? This project aims to develop a learning model of spoken varieties that could be useful for Arabic learners who already have some knowledge of Modern Standard Arabic (MSA).More info and contacts: see the page of Anjela Al-Raies