Sixteen symbols have been encoded in the Arabic
Presentations
Forms-A block for use in pedagogical materials and documents
discussing the features of the Arabic script.

Please note that these are not combining characters but
stand-alone
symbols. These should only be used to display the dots and
diacritics
in isolation, and not for making new letters. For example,
one can
*not* use a Seen and add U+FBB6 Arabic Symbol Three dots
Above to get
a Sheen. If you type that, you will get a Seen followed by
three dots.
According to the standard, "These are spacing symbols
representing
Arabic letter diacritics considered in isolation, as for
example as in
discussions about the Arabic script."

The Qur'anic character U+06DE ARABIC START OF RUB EL
HIZB has had
its glyph and properties changed.

For some unknown historical reason, the character was mistakenly
classified as a combining character instead of just a
symbol, which
made it unusable. The character is now a normal spacing
symbol and is
usable as originally intended.

Two characters have been encoded in the Arabic script
block for use
in Kashmiri, one of the official languages of Jammu and
Kashmir, the
Indian-administered part of Kashmir. The language is written
in both
Arabic and Devanagari, along religious lines of Muslims and
Hindus.

The two new characters are U+0620 Arabic Letter Kashmiri Yeh and
U+065F Arabic Wavy Hamza Below. Also, U+0673 Arabic Letter
Alef With
Wavy Hamza Below has been deprecated (the first Arabic script
character to ever get deprecated in Unicode), and the character
sequence <U+0627, U+065F> should be used instead of it.

Mandaic has been encoded. Mandaic is the script used by the
Mandaeans (mostly living in southern Iraq and southwestern Iran,
especially Khouzestan) for liturgical purposes. This the
community
that some people believe the Qur'an refers to as Sabians,
the third
member group of the People of the Book (next to Jews and
Christians).

Unicode Standard Annex #9, The Unicode Bidirectional
Algorithm, has
been updated to include more information and some
clarifications. Note
that the algorithm has not changed. The update just explains the
original intentions in more details. For the list of
informational
changes to the text, see the following link (Behdad Esfahbod
and I
have contributed to this and previous versions of the
standard annex):http://www.unicode.org/reports/tr9/tr9-23.html#Modifications

A new data file has been added to the Unicode character
database,
listing some characters that are used with several scripts
(and which
scripts those are). For example, from the data file one can
learn that
the Arabic Tatweel and some of the Arabic harakat are also
used with
the Syriac script, the Arabic-Indic digits are also used
with Thaana,
and the Arabic comma, semicolon, and question mark are also
used with
both Syriac and Thaana:http://www.unicode.org/Public/UNIDATA/ScriptExtensions.txt

Please note that Unicode encodes beverage containers, but not
alcoholic beverages (I personally made sure of that, to reduce
possible objections). For example, there is no BEER encoded,
but only
BEER MUG (which is also used for non-alcoholic beer, among other
uses).

Religiously devout people that may object to some game
characters or
musical instruments getting encoded should note that Unicode
implementations are not required to support any specific
character,
and are allowed to choose their own set of characters to
support. The
game symbols are encoded only for the sake of Unicode
implementations
(especially those in East Asia) that need them to support
their users.

Ahmadinejad: I will be in New York next week,
with thousands of other Iranians and non-Iranians, to show
my opposition to Ahmadinejad’s being internationally
recognized as Iran’s president. He stole the election,
and he helped several of my people getting killed, raped,
and tortured. He is not Iran’s president, he is just
another liar, thief, and murderer.

Calendrical calculations: For whoever who
may be computing Singapore holidays any time in the future:
Singapore’s Vesak Day (Buddha’s birthday)
holiday does
not follow the Buddhist
calendar or the recommendation by the first Conference
of the World
Fellowship of Buddhists held in Sri Lanka in 1950 (that
recommended the first full moon in May). It is
calculated using the Chinese
calendar, but not the 8th day of the 4th
moon like the Chinese and the Koreans celebrate it, but
seven days later, on the 15th day (calendrical full moon) of
the 4th moon.

I lost at least three hours today finding about this, and I
found about it by accident, because I had Calendrical
Tabulations at hand and happened to look at the Chinese
calendar column. There are several conflicting pieces of
information on the internet here and there, which really
confused me to the point that I thought the actual algorithm
is not publicly available.

Just wanted to share a bit of my own experience with being
overweight, losing a lot of it, and then gaining some of it
back:

One may have misconceptions about how weight is lost
and gained. Specifically, one may think that “by
eating only what my body needs and some exercise, I can lose
weight”. That’s rarely true.

You need to understand how diets work. Generally, one
doesn’t really need nutritionists. But it’s
important to understand the simple science behind dieting,
in order to make the whole thing effective and avoid putting
it just back.

The personal psychology of dieting is important. You
need to know why you are doing it, and care about it.

You don’t need to spend time thinking about the
diet, following it, or even exercising. There are good ways
to lose weight without the usual obsessions associated with
diets, like that of the Atkins diet.

The very short book helped me lose about 15 kilos easily
(and with no exercising) a few years ago. I have started to
diet again these days, with a goal of losing about 30 pounds
(almost the same amount, but I know live in the US).

Even if you hate diets and diet books, still read it. I
would recommend reading it even if you are not overweight!

Footnote: The author of the book has made all the code he
used in the book (with several updates) available as public
domain code online. He also runs a server with the tools
installed for public use, if you are the lazy type, like me.
It's
all here.

Unicode: I am thinking again about the
brilliant Joe
Becker. I
met the gentleman last October in San Jose, when everyone
was celebrating twenty years of Unicode. His
short 1988 article, titled Unicode 88, is
amazing. It is interesting that a lot of Unicode principles
remain the same, after twenty years.

Fonts and Languages: I was repackaging my
fonts for Fedora 11, when something caught me. The font
packaging policy involved the list of languages my font
package supported. But it was a font with a wide range of
Latin and Cyrillic glyphs, and it probably supported dozens
of languages. Happening at the same time, I found that Fedora
11 is considering supporting automatic
font installation. Among various things, this means that
we need to know which fonts support which languages.

Font files don’t have that information directly. How
would a font designer know that his font
supports ArbuanPapiamento just
fine, which uses a different orthography than Papiamento as
written in Netherlands
Antilles, for example? What about African or native
American languages? Or Mongolian? Or Kurdish? He just
designs and tests
glyphs for characters and languages he is interested in. If
the resulting font happens to support Filipino too, good for
him and his users, if it doesn’t, he may not care. At best,
a list of the languages the font
designer believes the font is supporting may be found
somewhere in the documentation.

In the present freedesktop stack, the language support
detection task is done by
fontconfig. When an application, like Firefox, wants to
display text in some language, a text layout engine, like
Pango, will ask fontconfig for a font that supports
displaying text in the language (possibly with some other
properties, like the font being bold and sans serif).
fontconfig then uses its various font
suggestion rules and orthography files to give the best font
it can find back to the engine. If FontConfig doesn't know
anything about the language, or has wrong information, it
may give you something totally off, like a Latin or
Devanagari font for a language written in the Arabic script.

What font designers may not know (or care about), fontconfig
needs to know. The usual way of knowing, especially for
not-very-famous fonts or languages, is through orthography
files. These files
contain a list of Unicode characters that play a letter-like
role in the language. For example, for French, it is a list
of basic Latin letters plus all the ligatures (like
œ) and accented
letters (like ï). fontconfig runs the list
through each font
installed on your machine and sees if it has glyphs for all
the characters listed. If it does, the font is assumed to
support the language.

Getting back to my own story, I thought of checking
orthography files to see which languages my packaged fonts
support. But when I looked into a few, I found several bugs
and unsupported languages. Behdad encouraged me to
fix them early, for a chance for them to get them into
fontconfig 2.7.

During the past few weeks, I’ve been trying to hunt things
down and fix them during my free time. I achieved my first
target of matching glibc
locales (those without ‘@’). I’m now on my second
target
of matching languages
with two-letter codes; remaining are: Akan, Avestan, Cree,
Ewe, Herero, Sichuan Yi, Javanese, Kanuri, Kongo, Kuanyama,
Luba-Katanga, Nauru, Navajo, North Ndebele, Ndonga, Ojibwa,
Pali, Quechua, Rundi, Sango, Shona, Sundanese, Tahitian, and
Zhuang. After that, there are thousands of languages with
three letter codes, which would need an army the size of SIL
International.

I was just reading an
article (in Persian) about the registration of the
100,000th domain in “.ir”.
There’s been an event, with a long list of speakers that
includes quite a few Iranian politicians involved in
linguistic or Information Technology issues.

The best quote ever is from the highest ranking government
official in charge of IT issues: “Engineer Rezaee, the
Secretary of the Supreme Council of Information Technology,
[...] expressed his gratitude toward the people responsible
in the institute [in charge of .ir] for their vigilance in
in selecting the domain name .ir for Iran, and added that if
the choice had not happened in time, other countries like
Ireland or Iraq may have chosen it for themselves”.
That’s
all that is quoted from him, which tells the rest of
his speech has probably been worse...

The poor guy probably doesn’t know about standards,
and I’m
quite sure no one corrected him, pointing to ISO
3166, first published in 1974, years before the founding
of the
institute in 1989.
Even those codes were based on the
codes introduced in the 1949 Geneva
Convention
on Road Traffic. When “IR” was first
internationally
introduced for Iran, Siavash
Shahshahani, the gentleman in charge of .ir’s growth,
had been seven years old!

Update: According to this
Wikipedia page, “IR” has been in use for
Iranian cars since 1936 (interesting date, since until early
1935, Iran was internationally called “Persia”).
But the article does not cite its sources,
so I can’t really confirm it. Still, even if it came
into use
in 1936, it
was definitely not standardized internationally until 1949.

Arabic in movies: I’ve been watching some
24,
which is so full of stereotypical “terrorists”. Most of them
are Middle Eastern of course. To try to get “balanced”, in a
few episodes they go and add a few “good” Muslims or Middle
Easterners, probably to protect themselves. Sometimes it
gets pretty funny too. To prove the innocence of some Muslim
US government agent, someone says “But she’s even a
registered Republican!” I really don’t know if they knew
it’s funny... Anyway, that’s not what I want to talk about.

What’s really annoying is that to someone knows a bit about
Middle Eastern culture and language, a lot of things are
very phony. These are some random things from 24 that I
found. (Note: I am not a native speaker of Arabic. I just
learned some in school.)

There is an hostage execution scene, with the captors
talking in front of a black background with Arabic text on
it. Guess what the text says: “الموت لأمريكيين”, which means
“Death to Americans”! I’m quite sure no “terrorist” would
want to say that. “Death to America”, they may say.

The names of some Middle Easterners are pretty made up.
There is this family, named “Araz”. Now that’s an
Azerbaijani name, and no one would really be named Araz if
he’s not an ethnic Azerbaijani or from the Caucasus. But
guess what? Their first names are very Arab first names (not
even names common in non-Arab Muslim world), and their son
has a very Persian first name (Behrooz)! A totally
impossible combination.

The writers seem to have taken “terrorist” names from
whatever was at hand. Two minor terrorists, Arabs in
apperance, whose names is mentioned almost next to each
other in the same episodes. Guess what are they last names?
The first is named “Khatami”, the second “Ardakani”. Where
are these names coming from? They come from the full name of
the very popular reformist former President of Iran, Seyyed
Mohammad Khatami Ardakani. Interestingly, that full name
is rarely mentioned, except in one place, an old version of
CIA’s world factbook. The writers simply got their hand on
whatever they could find about “terrorist” regimes, and took
the smiling president’s name. They didn’t know that Ardakan
is the name of a small city in central Iran, and Arabs would
probably not name themselves after that city.

Arabic text is not what it looks like in the real world
at all. The letters are usually disjoint, each letter on its
own, instead of contextual shaping. In some cases, it’s even
both left-aligned and left-to-right.

Of course, 24 is famous for showing torture to be working
sometimes, depicting huge conspiracies, showing government
officials on very foolish errands and breaking laws left and
right, and very interestingly, a Democratic Chief
of Staff becoming a Republican Chief of Staff in the
next administration. (All in all, I really think the world
of 24 is a parallel
universe. Fun to watch, but not much connection to real world.)

The disjoint Arabic phenomenon is not unique to 24, of
course. Even better-produced shows like Lost do
it. In Season
4, Episode 9, a TV news programming is shown, supposedly
in Tunisia broadcasting something happening in Iraq. The
Arabic text is totally disjoint, and unacceptable to anybody
who knows anything about the language or script.

I suppose the producers pay people to translate the text
into Arabic. Can’t they also make sure the software they use
to render the text also displays it fine? If it doesn’t, why
bother? Just show some squiggles!

New world: It’s still a couple of
month until the beginning of spring, the time we Persians
celebrate as our New Year, Nowrooz, the
time the world renews itself.

But I think the world renewed itself earlier this year.

But today, I witnessed a new US president, clearly wise,
clearly intelligent, and clearly a thinker. I was longing
for the day to hear such a thing as “we reject as
false the choice between our safety and our ideals”
from a US president. Or pearls of wisdom like “know
that your people will judge you on what you can build, not
what you destroy” or “we can no longer afford
indifference to the suffering outside our borders, nor can
we consume the world's resources without regard to
effect”.

I am so happy to be in this country at such a time as this.
And I am
surprised of myself for considering him my ideal US
candidate for president since I found about him back in
2004. I didn’t think he would run, I didn’t
think he would win, but I followed all his moves. All this
time, I cried, laughed, drank, read, informed, and debated.
Back home in Iran, in transit, and here in California. I
could not vote him, and would not be able to vote for him in
2012 either, but as a fellow citizen of the world, he has my
support.

Fedora: The other weekend, I flew to Boston
for FUDCon
F11. I mostly did it to reboot myself back into free
software contribution, something I hadn't done a lot last
year (because of settling
in California and various other stressful and depressing
situations).

I saw interesting stuff and boring stuff, but the best thing
that happened was meeting "spot". He spent a
couple of hours with me over drinks, providing free wisdom
(and selling me ideas?). He’s so amazing!