Saturday, June 10, 2006

Punjabi and what IS that script

On the WiktionaryZ main page, we have a list of languages, they point to "portal" pages for those languages. It is quite clear that a project like WiktionaryZ has to take the different scripts into account that a language may manifest itself in. After a lot of head scratching, I created a link to "cmn-Hans" and "cmn-Hans", to indicate that there is Mandarin both in a simplified and a traditional script. One other reason, it looks more organisms this way.

I then tried my hand at the Punjabi language. Punjabi is written in two scripts, and I guessed wrong trying to identify them. One was indeed an Indic script, ਪੰਜਾਬੀis written in the Gurmukhī script while پنجابی is written in the Shahmukhi script. Shahmukhi is indeed an Arab script but it is not the Arab script. In order to properly identify these words, I looked them up at Unicode where there is a nice list of the ISO-15924 script codes. Gurmuki has Guru as its code and Shahmukhi .. is absent.

It is probably pretty safe to indicate it as Arab, but when my information says that it is not, it is indeed problematic. I could also have it as an uncoded script. The problem with these standards is that they work up to a point. The point is what are they there to do.

When you write Dutch the standard Latin script is used, however that leaves out one character and consequently all word processors capitalise the ij wrong, it should be IJ and not Ij.. I think a similar thing is happening with the Shamukhi script. It is assumed to be Arabic but the style of the glyphs is different. I think it is just one of those things that may change in the future.