hi @Apollo14, you're welcome.Yes, I have tried that too with other fonts but some letters weren't attached properly. I believe there is some code left in the repo that shows that. Plus I added vowels! will work more on this when coming back then I'll post the whole thing insha'Allah.

I am using the following piece of code to preprocess arabic before sending to TextField. It seems to be ok according to the few arabic customers I have, but I don't read arabic myself so can't tell for sure.I'd like to integrate it into gideros, so I'd be glad if some arabic speaker/writer could check it up!

let's begin: in your code you have 0x... for arabic letters I suppose in what format? html?

I am working with unicode for arabic:ex: 1574 ئ

this allow us to deal with vowels as well.

Some quotes from an excel forum:

Function atu(rng As Range) As String ' arab word to unicode

Dim i As Integer Dim CellValue As String Dim arabunicode As Integer Dim NewValue As String

'Get the string from the cell. Although it may not look like it 'this is in fact unicode. It's kinda hidden from you. CellValue = rng.Value

'go through the string character by character (note that 'each character is 2 bytes - you just don't see it) For i = 1 To Len(CellValue) 'get the unicode value for this character arabunicode = AscW(Mid$(CellValue, i, 1)) 'append this to our string - as unicode NewValue = NewValue & arabunicode Next i

I don't think my solution is the answer to dealing with arabic in gideros, but it works for my projects.

The only problem is that it has a limited number of characters (15 in my gideros function), but it can handle vowels, spaces, comma, column, ...

The function is not complete yet because I wanted to give it a try before leaving.

When i am back i should complete it and share the whole class again with a use case. But the basics are here in the github (some classes are missing I believe because they were linked classes, but they are like the basics class for buttons, easing,...)

So the way I do it is via a texture pack of all the letters isolated, beginning, middle and end of words. Then I add some more graphics for the vowels and voilà.

Hope that will help you guys. You give us so much you need some payback.

In the meantime I am going to have a look at your code again to see what I understand from it (but it's hard because you put too few comments ).

@MoKaLux 0x is used to denote hexadecimal notation. My code handle ligatures for chars in the range 0x627 to 0x64a (1575 Unicode to 1610 Unicode): it replaces independent characters with connected versions (Unicode 65165 and next ones)I will at your example to see if I can understand what goes wrong in my code

Digging and reading further I understood a few things: arabic vowels glyphs are classified as 'combining diacritics' that should just be drawn above the preceding character. This should already work in theory provided that unicode chars are reversed before sending them to text field, assuming that those 'combining' characters don't take space by themselves.However it turns out that a lot of fonts are buggy: they report a non zero advance distance for those characters. I was using arial.ttf it IS buggy. I switched to NotoSansArabic and it already looks better, although the diacritic mark doesn't seem to be correctly cenetered...

Hours later: diacritics are misplaced because freetype doesn't handle opentype GPOS tables, which indicates how to lay out composed glyphs. Freetype guys suggest to use HarfBuzz library instead for complex script rendering. Benefit: it would handle vietnamese, thai, khmer, etc too, not just arabic. Drawback: I am afraid it would enlarge gideros codebase significantly.I am looking at to which extent hafbuzz could be made a plugin, hooking into gideros text rendering. The good thing is that it is MIT licensed

@MoKaLux, you are mixing up things In order to display a text, we have to go through several steps. Roughly:

1. Get a string with the text we want to display: lua strings are only 8 bit, so in gideros texts are expected to be UTF8 encoded. This is what the editor do automatically, but this cause trouble with RTL texts such as arabic, so I used utf8.char() to encode the two codepoints 1606 and 1614 into UTF8 by code for better readability.2. text is separated into RTL and LTR chunks. This is a process known as BiDi algorithm. As part of this process RTL texts are reversed to displayed LTR by drawing engine. Gideros doesn't do that step at all and just assume texts are always LTR.3. Text is is converted to font glyphs4. Glyphs are 'shaped': glyphs are laid out relative to each other depending on language rules. Gideros doesn't do that either, again it assumes latin rules. This is what HarfBuzz is doing.5. Glyphs are rendered to screen

My code above was doing step 2 for arabic, and probably not accurately, but even with that, step 4 was missing too.ICU used to do the same as HarfBuzz plus BiDi. They have dropped there shaping routines and integrated HarfBuzz instead, but ICU seems to be much more than just a text layout engine.

HarfBuzz .dll is around 1MB in size, which is quite a lot compared to other plugins. I can't tell for other platforms yet. It actually handle most languages, not just arabic.

I am not sure how I will implement BiDi yet, but I'll make it optional. I am looking for a lightweight implementation. So far I found ICU and FriBiDi, but I could make my own for trivial cases too. Using HarfBuzz and BiDi will make arabicProcessing obsolete, but since it will be a plugin there shouldn't be much concerns about backward compatibility.

@MoKaLux 0x is used to denote hexadecimal notation. My code handle ligatures for chars in the range 0x627 to 0x64a (1575 Unicode to 1610 Unicode): it replaces independent characters with connected versions (Unicode 65165 and next ones)

I tried getting the letters (Unicode 65165 and next ones) but I could not find those Unicode 65165 and next ones. In what ttf did you find those Unicode?

@MoKaLux, Unicode 65165 and above are in NotoSansArabic and Arial at least, the two fonts I tried. And yes, letters are misplaced in Gideros, that's why I worked on integrating HarfBuzz, which purpose is exactly to place them correctly.

See the result in attached image for two texts: each is rendered three times:- with standard gideros functions- with arabicProcessing being applied first- with HarfBuzz plugin (without arabicProcessing, which is no longer necessary)