August 2010 Archives

As mentioned in this blog's previous entry, the government of India has designated a new rupee symbol. One the the more interesting questions is how quickly it could be integrated into Unicode and other standards.

According to Live Mint.com, the symbol has been voted into Unicode at code point U+20B9. That would be in the currency block right after the Tenge currency sign. I do not see any official conformation from Unicode, but it was on the agenda for that meeting.

You can read some of the discussion on the issue from late July which includes information about multiple proposals, the source of the new rupee design and comparisons to the design of the euro (€) symbol. Fascinating if somewhat heated.

Categories:

In case you've ever wondered whether the Unicode standard will ever be "complete", the answer is probably not. This was highlighted by the fact that India adopted a new rupee currency symbol just last month (July 2010).

Design History

Actually, there had been a rupee symbol already (₨) and it was in Unicode at codepoint U+20A8, but if you see the character, you'll see that it's a rather boring ligature of Western Capital R plus s. The new symbol melds Western R and Devanagari र ("Ra") and adds a currency bar to boot - very clever.

Actually the rupee story gets more complex because there are "rupee" signs for different scripts/countries.

The "Bengali" rupee is actually the "Taka" sign of Bangladesh, but I am perplexed by the Tamil and Gujarati versions since they would be regions of India and/or Sri Lanka. I am guessing that they are regional "informal" characters, but enough in use to be included in Unicode.

What Now

Even though the Government of India has signed off on the symbol, there's a long road ahead. There are fonts to be retooled, but the rupee sign won't be in its Unicode code point...because one hasn't been assigned to it (although they're working on it...). That means that even though this sign was born in the era of Unicode, a "legacy" pre-Unicode system will be in place which will have to be corrected later. Ah well.

Other systems that will have to be retrofitted include currency databases, Excel formatting options, and probably cash registers (at least what prints out on the receipt). And that's no doubt the tip of the iceburg. Interestingly, there are no plans to put the symbol on bills and coins, but as this Times of India piece article notes, most bills/coins don't have a currency symbol. Americans can pull out a dollar bill to check - no $ in sight.

A final comment is how speakers in non-Devanagari areas will react. The crossed bar shape actually works for many Northern Indian scripts such as Devanagari (र) and Gujarati but R looks very different in a lot of scripts including Tamil () and others. I occasionally run into comments from Tamil writers about not assuming that Devanagari is a universal script in India. I wonder what the impact here will be.

Pictures instead of Text?

Some of you may be interested to note that the Tamil/Gujarati/Bengali text are actually images. For some reason the MT CSS is insisting on font selections and I haven't been able to override it yet, not even with !important!. Not sure how to troubleshoot, but this does not happen to me in Web 1.0...

Categories:

If you want to develop a language code headache, head straight to the former Yugoslavia, once a country whose national language was "Serbo-Croatian" (ISO-639 language code: sh(Deprecated).

In the 1990s of course, Yugoslavia violently broke up into its constituent ethnic groups, all of whom agreed that Serbo-Croatian had been an artificially imposed literary language.

Today, most agree that "Serbo-Croatian" is really a macrolanguage of national forms for Croatian (ISO-639 language code hr), Serbian (ISO-639 language code sr), Bosnian (ISO-639 language code bs) and Montegrin (too new to have a code)

Indeed, if you look at the pages for Croatian Wikipedia (http://hr.wikipedia.org), Serbian Wikipedia (http://sr.wikipedia.org) and Bosnian Wikipedia (http://bs.wikipedia.org), you will see that although words are similar, there are distinct vocabulary differences. You will also see that Serbian Wikipedia is in Cyrillic, unlike Croatian and Bosnian.

Another Wikipedia

Yet there's another Wikipedia - the Serbo Croatian Wikipedia (http://sh.wikipedia.org/), dually scripted in both Cyrillic and the Latin alphabet.. This Wikipedia is yet another related form, similar to Croatian, Serbian and Bosnian yet eerily different. A ghost of a language that has been officially declared dead, but still breathes through living speakers.

All linguists know the standard line "A language is a dialect with an army", but creation of Serbo-Croatian Wikipedia shows that governmental language planning is not so easy. Several generations of speakers grew up in "Yugoslavia" learning to become educated in "Serbo-Croatian". I even learned about Serbo-Croatian syntax from a linguistics professor from Yugoslavia...who said absolutely nothing about regional differences or Serbo-Croatian being an artificial language. As far as I knew in 1990, she was a native speaker of Serbo-Croatian.

I don't know what the fate of these linguistic forms will be. I would doubt that Yugoslavia would reunite anytime soon, and the longer the countries remain separate, the more that speakers will feel they are speaking separate languages. Yet, somewhere out there is a community who still speaks Serbo-Croatian and probably mourns the passing of a nation and its language. How long it can last is an interesting question.