To access code and for more details on how to download/run the tool please navigate to our OSMAN GitHub repository by clicking on View on GitHub button up there or via link below:
OSMAN Readability GitHub Repository: https://github.com/drelhaj/OsmanReadability.

OSMAN Arabic Readability Formula

The formula calculates readability for Arabic text with and without diacritics (Tashkeel). The tool presents a novel way towards counting syllables in Arabic which has been a difficult task for many years. The tools provides accurate results for text with diacritics. As we are aware that the majority of Arabic text available online these days is written with the absence of diacritics, we provide the user with an option to use Mishkalsourceforge.net/mishkal/, which is a free online tool that adds diacritics back in to Arabic text, the tools reaches an accuracy over 85%.

Arabic Syllables!

In our tool we count the two main types of Arabic syllables, short and long in addition to stressed syllables.
Short syllables are simply a single consonant followed by a single short vowel (e.g. “كَتَبَ” [ka-ta-ba], “he
wrote”). A long syllable usually is a consonant plus a long vowel (e.g. “كِتَاب” [ki-taab], “book”) the example shows a short syllable followed by a long one. Stress syllables are those considered as double letters, indicating a double consonants with no vowel in between (e.g. “شَدَّدَ”, [shaDDaDa], “he stressed”).

Dataset

we used 73,000 parallel English and Arabic paragraphs from the United Nations (UN) corpus uncorpora.org/ – a collection of resolutions of the General Assembly from Volume I of GA regular sessions 55-62 (Rafalovitch and Dale, 2009). The Arabic text by the UN has been written with the absence of diacritics. We used Mishkal to add diacritics to the Arabic text. Each language has around 3 million words from more than 2,000 documents with each document containing 36 paragraphs on average.