I developed an open-source large-scale finite state morphological transducer for processing Arabic texts, AraComLex, or Arabic Computer Lexicon, containing more than 30,000 lemmas. The competitive edge this morphology has over Buckwalter's is that it tried be specialized purely in MSA by avoiding the noise coming from Classical Arabic and the wrong word-clitic formation which are rampant in Buckwalter's morphology. My morphology is compatible with the open-source finite state compiler Foma. All you need to do is download Foma, download AraComLex from Sourceforge.net and read the README file to learn how to compile. You can compile the transducer under Windows, Linux or Mac OS X. +Show Reference:

I developed a database of 490 templatic patterns for Arabic (الأوزان الصرفية في اللغة العربية) that has been successfully used in detecting unknown words in a statistical parser and in lexical profiling tasks. [Download from Sourceforge.net] +Show Reference:

Mohammed Attia. (2008) 'Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation'. PhD Thesis. School of Languages, Linguistics and Cultures, the University of Manchester. [pdf version]

I developed the Arabic word list for spell checking containing 9 million Arabic words. The words are automatically generated from the AraComLex open-source finite state transducer and from a one billion word corpus. The entire list is validated against Microsoft Word spell checker. [Download from Sourceforge.net] +Show Reference:

A list of Arabic Broken Plurals automatically extracted from a large contemporary corpus, provided with morphological patterns for both the singular forms and the plural forms. It contains 2562 broken plural forms. [Download from Sourceforge.net] +Show Reference:

This is a list of unknown words, or words that are not included in the Buckwalter Morphological Analyser version 2.0. It includes about 18,000 new lemmatized words, and they are weighted and ordered so that there is a good likelihood that words which are most relevant (lexicographically) will surface to the top and the least relevant words will be pushed down the list. [Download from Sourceforge.net] +Show Reference:

This is a list of obsolete words, or words that are outdated or not in contemporary use, in the Buckwalter Morphological Analyser database. This list is developed according to a threshold of frequency on the web and the Arabic gigaword corpus. The list contain about 8,400 words that fell out of current use with a margin error of 1%. [Download from Sourceforge.net] +Show Reference:

I developed a web application (dictionary writing system) for curating a large-scale, corpus-driven lexical database for Modern Standard Arabic following the modern lexicographic practices containing 30,000 lemmas.: [View here] +Show Reference:

I the developed first Arabic rule-based parser to be freely available on the internet for Modern Standard Arabic, using XLE. The output this parser gives is a phrase structure tree (c-structure) and a dependency structure (f-structure). The parser is hosted by Bergen University in Norway, along with English, German, Malagasy, Norwegian and Welsh. [Arabic grammar not working] Test the parser here [Arabic grammar not working]+Show Reference:

Mohammed Attia. (2008) 'Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation'. PhD Thesis. School of Languages, Linguistics and Cultures, the University of Manchester. [pdf version]

Ph.D. thesis:Title: Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation.
Description:
This research investigates different methodologies to manage the problem of morphological and syntactic ambiguities in Arabic. I built an Arabic parser using XLE (Xerox Linguistics Environment) which allows writing grammar rules and notations that follow the LFG formalisms. I also formulate a description of main syntactic structures in Arabic within the LFG framework.
Mohammed Attia. (2008) 'Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation'. PhD Thesis. School of Languages, Linguistics and Cultures, the University of Manchester. [pdf version]

Mohammed Attia. 2008. Handling Arabic morphological and syntactic ambiguity within the LFG framework with a view to machine translation. Ph.D. Thesis. School of Languages, Linguistics and Cultures, the University of Manchester, UK. [pdf version]

Mohammed Attia. 2004. Report on the Introduction of Arabic to ParGram. The ParGram Fall Meeting 2004, The National Centre for Language Technology, School of Computing, Dublin City University, Ireland. [pdf version]

Presentations:

Mohammed Attia. 2012. 'Arabic Language: Nature and Challenges'. A presentation at the the British University in Dubai, UAE, May 29, 2012. [Slides available]

Mohammed Attia. 2008. 'Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation'. PhD Thesis. School of Languages, Linguistics and Cultures, the University of Manchester. [pdf version]