Monday, December 24, 2007

An ingenious programmer has used Apple's Dictionary Development Kit to produce a Chinese module for Dictionary.app based on CEDICT. You can download it here. The unzipped file goes in Library/Dictionaries.

Monday, December 3, 2007

In the Apple Forums a user pointed out that Leopard's version of the Anjal Tamil Input Method has at least two bugs: the output of the combination "tr" is wrong and the em-dash is not available. Luckily Leopard's Anjal system contains two text files which users can modify to fix such things or make other customizations -- Anjal.keylayout and AnjalTransliterator.txt. You will find them in System/Library/Input Methods/TamilIM/Contents/Resources (you need to do Control + click > show package contents on the TamilIM item to get to Contents).

On my iDisk you can find a folder called "New Anjal" which has modified copies of these files to fix the two bugs mentioned. Or if I haven't got it quite right, it should not be hard to do so yourself.

Monday, November 26, 2007

Somehow Apple forgot to include the character ё (U+0451) in the Russian PC keyboard layout, at least as far as the ANSI physical keyboards are concerned (whether it shows up on the extra key on an ISO keyboard I don't know). If you need a replacement, you can download the RussianPC or RussianPC2 layouts from my iDisk.

Friday, November 23, 2007

Some users of Leopard have reported not having any Keyboard Viewer or Character Palette entry in System Preferences/International/Input Menu, so they cannot activate and use these functions. A possible fix has been posted in the Apple Forums here.

Thursday, November 22, 2007

In Leopard Apple decided to switch the default Thai font from Lucida Grande to Thonburi, making the Thai characters in the former inaccessible. This causes problems because a) the Thonburi Bold font is non-functioning for some reason and b) Thonburi's spacing is botched for mixed Thai/Latin text.

Pending Apple's fixing these issues, one idea is to replace Thonburi in System/Library/Fonts with a different Thai font. I downloaded the Garuda set described at this site and used FontForge to rename them Thonburi. After making backup copies of Apple's Thonburi, I replaced it with the renamed Garuda, and this seems to work. A copy of the renamed set can be gotten from my iDisk (the folder Garudathonburi). Feedback on whether this solution is helpful would be welcome.

Another fix would be to replace the Lucida Grande in Leopard with the same font from Tiger. I have seen reports that this does not seem to cause any problems.

Tuesday, November 20, 2007

I've always kept FireFox on my Mac as an alternative to Safari, but its value has been limited because of FireFox's inability to recognize certain Apple fonts that are required for the correct display of Devanagari and similar scripts. This has now been fixed in FireFox 3, a beta version of which has just become available. Using this test page, you should see that Hindi, Tamil, and Tibetan all look right (Tibetan requires 10.5). Bugs in some earlier versions related to Thai and Cyrillic also seem fixed.

Wednesday, November 14, 2007

For some reason, Leopard no longer has the setting previously found in OS X in System Preferences/International/Input Menu for "Allow a different input source for each document." This is badly missed by many who need to have different keyboards active in different apps and don't want to constantly have to switch the layout. A possible fix is the app InputSwitcher. Instructions are in the ReadMe contained in the download.

Friday, November 9, 2007

Leopard adds a new set of Vietnamese keyboards which implement the most common ASCII systems for inputting this language -- Telex,VNI, and VIQR.

More interesting is that when one of these is selected you get a new item in the "flag" menu, "Convert to Hán-Nôm," which lets you convert selected modern Vietnamese Latin script into the Nom or Chinese characters used in ancient Vietnamese.

Firmware update 1.1.2 of Nov. 9, 2007 reportedly adds French, German, and Italian user interfaces plus UK, French, German, and Italian keyboards/predictive typing. There is an "Asian Fonts" setting for Chinese or Japanese, the purpose of which is unclear, since it apparently does not enable input for those languages. The latest iPhone User Guide on Apple's site (p. 19) indicates this device has the same language capabilities as the iPod Touch, but that is not yet accurate as far as I can tell.

While previous versions of OS X included a way to make custom input methods, it was only good for Chinese and had other limitations. Leopard comes with a generic system for creating input methods that should work with any Unicode font, and thus open new options for typing scripts that don't lend themselves to the usual alphabetic keyboard layout. Essentially all you have to do is produce a tabbed file equating ascii strings to the output you wish to appear, put it into the proper format, and install it in Home/Library/Input Methods. Details can be found here.

Note that the Apple article example is only for the .inputplugin format. For the .cin format, see this page.

Wednesday, October 31, 2007

One of Leopard's new language features is the addition of a cool Japanese dictionary alongside the English one in Dictionary.app. But I see that in the XCode stuff, in Developer/Extras, there is also now a Dictionary Development Kit. Assuming it is not too hard to use, this could open the door to the creation by ordinary users of all kinds of additional dictionaries in other languages.

Monday, October 29, 2007

Playing around with TextEdit, I've discovered that the new Leopard can apparently display Windows Arabic fonts correctly. Previously trying to select such fonts would result in disconnected and wrongly-shaped glyphs, so that only Apple's Geeza Pro or other AAT fonts could be used. This was especially a problem in Safari when the user had installed MS Office, which included Arial and Times New Roman fonts with Arabic in them that often got selected by web pages. Persumably this new feature represents expanded OS X support for OpenType font features. My tests indicate that Windows Devanagari and other Indic script fonts are still not supported however. Also Windows Arabic fonts do not work in Pages.

Sunday, October 28, 2007

The main Multilingual Mac FAQ Page has now been redone for OS X 10.5 Leopard. As I am still learning all the features of the new system, some parts may not be totally accurate, and they will be updated as soon as possible.

The Tiger page is still available here. It will not be updated further, but may be useful for people running that OS and for OS 9. Because Leopard no longer supports OS 9, the main FAQ page will no longer cover it.

The presence and order of the names in OS X's System Preferences > International > Languages pane determines things like system and application localization, font priority, list sorting, and the Mail encoding menu. Normally by default it contains the available system localizations (18 in Leopard), but you can easily add to this by using the Edit button.

The total number of languages you can select depends to some extent on whether you have installed any extra fonts, but even with those provided by Apple it is remarkable, well over 100. I had 110 on Tiger and in Leopard this has increased to 138. A few of the more unusual entries are Klingon, Latin, Navajo, and Sanskrit. If you have a look for yourself, you will see that the languages appear in their own script, but mousing over a name gives you the English translation. You can see a full list here.

Saturday, October 27, 2007

One very nice new feature of Leopard is that the Help for the Japanese, Korean, and Traditional and Simplified Chinese Input Methods is finally available in English. Previously the only English resources available from Apple regarding these complex systems were old manuals for OS 9. I don't think there is any way to print out all the Help at once, but you can find the files at System > Library > Input Methods ( do Control + Click > Show Package Contents on the IM you want) Contents > Resources > English.lproj.

Unfortunately there is no Help of any sort provided by Apple for the Tamil and Tibetan keyboards, which would also be useful for those not already familiar with the standard ways to input these scripts.

Friday, October 26, 2007

Tibetan: This is the primary new script included with Leopard, covered by the fonts KaiLasa and Kokonor, with 3 input methods provided.

Georgian: Included in the newly-added (but actually old) Windows Arial Unicode font. (This also contains Bengali, Oriya, Telugu, Kannada, Malayam, and Lao, but Leopard has been programmed to ignore them, perhaps because they would not display correctly.)

Shavian: Apple Symbols now has this in addition to Deseret.

No keyboard layout has been provided for Georgian, Shavian, or Deseret.

Reflecting the new Russian and Polish OS localizations, the number of fonts which include Cyrillic and the Latin characters needed for Polish has been significantly increased, and includes Marker Felt and Hoefler text among others.

In the Apple forums some users have reported that Leopard's Latvian keyboard layout has a non-functioning ' deadkey for making the special characters which that language requires. It was OK in Tiger. For a replacement, try downloading and installing LatvianT.keylayout from my iDisk.

After installing Leopard, I found the following totally new keyboards: Tibetan (3 options), Jawi, Kazakh, and Uighur. There is no Kurdish layout, although this language is mentioned as being supported in Apple's Leopard "new features" page.

Other new keyboards which add options to what was already available in Tiger are: Traditional Chinese Zhuyin, Vietnamese Unikey (4 options), Arabic PC, Russian PC, Persian QWERTY, Sami PC, Norwegian Sami PC, and Swedish Sami PC . Traditional Chinese Pinyin is reportedly a totally new version of what went by the same name in 10.4.

Tuesday, October 16, 2007

Linear B was used to write an early form of Greek called Mycenaean. Although there is a keyboard for this script on my iDisk, the layout is arbitrary and not of much practical use. Thanks to Alessandro Vatri there is now a much more professional version, based on phonetic input, available from this page:

Saturday, September 29, 2007

Someone in the Apple forums reported that when using Safari to view certain web sites, all accented Latin characters were being replaced by Hebrew. Normally the culprit for such behavior should be a non-Unicode Hebrew font with the same name as that required by the web page code, which was Arial in this case. But it seemed that no such font could be found anywhere on the machine.

It turns out that the old Windows font Web Hebrew AD, with the filename wehad.ttf, is in fact just such an animal, replacing accented Latin by Hebrew according to the Win-1255 encoding. When examined with a font editor, you can see that its internal name is actually Arial, so that Safari can't tell the difference. Mystery solved.

Friday, September 28, 2007

iPhone firmware update 1.1.1 of Sept. 27, 2007 adds the capability to make accented Latin characters to the iPhone keyboard. If you press and hold a letter, a pop-up menu will appear where you can choose an accented or other variation.

Oddly this update fails to duplicate the capabilities provided recently in the iPod Touch for localizing the user interface, predictive typing other than English, and for switching among various layouts or input in Japanese.

Sunday, August 19, 2007

An earlier article mentioned a virtual keyboard that can be used for inputting some European languages plus Greek and Russian on the iPhone, whose built-in keyboard is English only. Now there is one for Japanese. To use it, point your iPhone or other browser to this page:

Friday, August 17, 2007

I already have two Windows Arabic keyboard layouts on my iDisk, but it seems they may not be exactly right, so I have now put up another one which should match that provided with Windows Vista, seenhere (use Opera if it doesn't display right in your current browser):

The new layout is named WinVArabic101.keylayout and can be downloaded directly here:

Thursday, August 9, 2007

The updated version of iLife, issued August 7, includes new Russian and Polish localizations for all its components -- iPhoto, iWeb, iDVD, iMovie, and GarageBand. These are in addition to the usual 15 that have been customary for OS X and most of its apps.

Strangely, the updated iWork (Pages, Numbers, Keynote), issued the same date, continues to have only a truncated set of 8 localizations -- Dutch, English, French, German, Italian, Japanese, Spanish, and Simplified Chinese.

Wednesday, July 11, 2007

Georgian was made part of Unicode over 10 years ago. Though OS X has never included fonts or keyboards for this script, some have been available via downloading from the internet to enable a basic level of support. Thanks to Reno Siradze there is now a much more advanced and refined set of tools for Georgian, which you can get here:

Note that it is customary in Georgian headings, ads, captions, and titles to use a special capitalized version of the normal unicameral alphabet (called the Didi style of Mkhedruli or the Mtavruli style of Mkhedruli). Inputting this style requires switching to a special font, although some fonts like those from BPG put this type style in the 10A0 range in place of what is supposed to be there.

Saturday, July 7, 2007

No doubt Apple will eventually update the iPhone OS to included keyboards that enable input beyond the current ASCII English. But pending that it's possible, with a skillful combination of javascript vitual keyboards and the html "mailto" function, to create online keyboards that let you send email from the iPhone in a variety of languages. For an example, see this app

Friday, June 29, 2007

I don't have an iPhone, but reports from users confirm that the OS in the device released June 29 has language capabilities that fall short of those in OS X Tiger. The user interface and keyboard input are English only. Text display in the browser (presumably the same as in email and other apps) includes W. European, E. European, Greek, Russian, Chinese, Japanese, and Korean. But no font is provided for Vietnamese, Arabic, Hebrew, Hindi, Thai, or Tamil, all of which are part of the full Tiger OS, or Tibetan, rumored to be added in Leopard. This particular iPhone is, of course, intended only for the US market.

Monday, June 25, 2007

If you would like to have the months and days of the week in another language in the calendars of iCal, go to System Preferences/International/Formats and change the Region to the language of your choice. Check the Show All Regions box to add more languages if you want. When iCal restarts the names should be changed. What might surprise you is that this can be done even in languages for which OS X has no localization, e.g. Arabic. Display will not be correct, however, for languages where you do not have an appropriate font installed.

Wednesday, June 20, 2007

Since the new Windows Safari beta has a number of bugs related to non-ascii text, and there also remain many questions about what the Safari used in the iPhone will be able to display, I have posted a simple web page that contains short samples of the scripts which often cause problems here.

http://homepage.mac.com/thgewecke/ltp.html

At the bottom of this page there is a graphic of what should be seen at the top, so users can easily compare the two and see what may not be displaying right.

Tiger should display everything correctly except Tibetan (unless you have installed the special font from XenotypeTech). Leopard comes with a Tibetan font and should also display that sample in proper fashion.

Wednesday, June 13, 2007

Russian is not listed in the Tiger tech specs as an available localization, and neither my retail PPC install nor updates to 10.4.9 have it. But I recently got an Intel iMac with 10.4.7, and this does in fact include Russian as an option for menus and dialogues. I would assume all versions of Leopard should have this.

Friday, May 18, 2007

Of the 13 most widely spoken languages of India, OS X comes with built-in support for 6 of them via its Devanagari, Gujarati, Gurmukhi, and Tamil fonts and keyboards. You can find info on support for another three (Telugu, Kannada, and Malayalam) in my earlier note here. Below is info on using the remaining 4 -- Bengali, Urdu, Oriya, and Assamese.

For Bangla/Bengali, you can download OS X fonts and keyboard layout from this site. The Bengali script is also used for Assamese.

For Urdu, OS X comes with the required Arabic script font, but you need to install a keyboard layout. One can be obtained here or from my iDisk.

For Oriya, you will need to download a font like Utkal listed at this site. Correct display is only possible in OpenOffice/X11. A (very) expermental keyboard layout is on my iDisk.

Wednesday, May 16, 2007

I've recently updated the trouble-shooting section of my main web site, where you can find brief answers to several dozen of the most common questions Mac users have when working in languages other than English.

Saturday, May 12, 2007

The Apple TV, a device which lets you put digital content from Macs and PCs running iTunes onto your TV screen, uses a customized version of OS X which apparently has somewhat different language capabilities than the full one. Although not mentioned in Apple's specs for the product, you can choose the language for menus and dialogues in Settings > Language, where the choices are English, Danish, Spanish, Korean, Japanese, Traditional and Simplified Chinese, Finnish, French, Dutch, Russian, German, Italian, Norwegian, and Swedish. This adds Russian and subtracts Portuguese from the OS X 10.4 list.

Regarding display, this note gives a somewhat odd list of languages *not* supported, presumably because the fonts are not provided, but surely there are more, including Vietnamese, Tamil, and Devanagari and other Indic scripts. Hopefully this list will be reduced via software updates in the future.

Reports indicate that input, for example in search fields, probably does not support languages other than English.

Monday, May 7, 2007

Lao is closely related to Thai, and these two scripts are the only ones encoded by Unicode in visual rather than logical order -- i.e. some vowels are typed before consonants, even though they actually follow them when pronounced.

OS X does not come with either a font or keyboard for Lao. The best font to use is Saysettha Unicode, and you can download a keyboard layout (LaoSTEA) from my iDisk.

Some diacritics may not display in totally correct locations on a Mac with the available fonts. Also Lao does not separate words by spaces, and OS X has no Lao dictionary, so linebreaking will not work automatically. When inputting you can use Shift + Space to put a zero-width space (U+200B) after words to help lines break correctly.

Saturday, April 28, 2007

I often field questions on how to get Hebrew to input and display correctly, but rarely does anyone ask me how to get rid of it. The primary example of the latter is a strange problem that occurs with iTunes, where Hebrew replaces normal English in some circumstances. You can see an example here.

I've never been able to figure out the details of why this occurs, but the cure, which someone else found by chance, is normally to remove any copies of the font Lucida Grande.ttf (NOT Lucida Grande.dfont) found on your machine.

Recently someone in the Apple forums had a problem with text he knew was in Russian but which would only display in Latin. It was just one word, "Dybvfybt". Since no encoding currently used for Cyrillic maps this script to ASCII, the text must have been made with some kind of custom font that did just that. But normally anyone making such a font would map the Cyrillic characters to their Latin equivalent, so the text should be recognizable as transliterated Russian, which it wasn't.

This stumped me, until I wondered whether someone had made a font designed especially to allow typing according to the standard Russian keyboard layout but on a Western keyboard. Instead of the QWERTY used in the U.S., the Russian layout goes YTsUKEN. Sure enough, typing Dybvfybt according to my U.S. keyboard, but with the layout set to Russian, produced a recognizable word, Внимание = Vnimanie = Attention.

It's the first time I've seen Cyrillic text encoded that way, and I haven't yet found an available font that supports it.

Tuesday, April 24, 2007

Kyrgyz is a Turkic language spoken in Kyrgyzstan and a few other areas. While both the Arabic and Latin scripts have been used to write it in the past, Cyrillic is now the standard. The alphabet is essentially the same as Russian but with 3 additional characters: Ң, Ү, Ө.

OS X comes with fonts that cover Kyrgyz, but with no keyboard layout. On my Dropbox you can find two versions: KyrgyzCYR, which is the same as used on Windows machines, and KyrgyzPH, which is modeled on the Apple Russian-Phonetic layout and may be easier for people used to QWERTY. The extra three characters are found on the Option and Option + Shift levels for н, у, and о.

https://dl.dropboxusercontent.com/u/46870715/k/KyrgyzCYR.keylayout

https://dl.dropboxusercontent.com/u/46870715/k/KyrgyzPH.keylayout

Switching from Cyrillic back to Latin (which was used for a number of years before 1940) is reportedly being considered.

Tuesday, April 17, 2007

It's not unusual to have to do publishing or design work in languages and scripts you do not understand. In some scripts, if you use the wrong font or app, you can easily generate nonsense without realizing it. So having a native speaker (or at least someone who knows the script) check things can be important. Below are some examples of common pitfalls you may run into. In particular, be aware that MS Word for Mac, even the 2008 version, does not yet support correct display of these scripts.

Arabic script, used for many other languages than just Arabic, can wind up disconnected or backwards.

Indic scripts, used for example in Hindi/Sanskrit, Gurmukhi, Gujarati, Tamil, and Tibetan can easily wind up with letters in the wrong order, overlapping, or uncombined (when combination is mandatory).

Thai and other S.E. Asian languages don't use spaces to separate words, so line breaking can occur in totally wrong places. Apple Cocoa apps can access a dictionary built into OS X that enables them to do Thai line breaking fairly well, but MS, Adobe, and other apps cannot.

Monday, April 16, 2007

Catalan is a co-official language along with Spanish in the region on the Eastern edge of Spain. It uses the same alphabet as Spanish except it has no ñ and adds the digraph l·l (ela geminada).

OS X 10.3 had a Catalan keyboard layout which was identical to Spanish-ISO except for the flag icon. This seems to be no longer present in my 10.4. From my iDisk you can download the Catalan folder, which contains the .keylayout file and the .icns file with the right flag. The layout is almost the same as Spanish-ISO. The middle dot · (U+00B7) which is the standard for producing l·l is at shift + 3. But also included are ŀ (U+0140) and Ŀ (U+013F) at option + o and option + shift + o, in case you want to use these versions for local printing or similar purposes.

Thursday, April 12, 2007

Kazakh is a Turkic language spoken in Kazakhstan and a few other areas. While both the Arabic and Latin scripts have been used to write it in the past, Cyrillic is now the standard. The alphabet is essentially the same as Russian but with 9 additional characters: Ә, Ғ, Қ, Ң, Ө, Ұ, Ү, Һ, І.

OS X comes with fonts that cover Kazakh, but with no keyboard layout. On my iDisk you can find two versions: KazakhCYR, which is the same as used on Windows machines, and KazakhPH, which is modeled on the Apple Russian-Phonetic layout and may be easier for people used to QWERTY.

Switching from Cyrillic back to Latin (which was used for a number of years before 1940) is reportedly under consideration by Kazakhstan.

Sunday, April 1, 2007

Those wishing to do serious desktop publishing, including books, in scripts like Arabic, Devanagari, or Tamil in OS X are handicapped by the fact that standard applications like Word, InDesign, and Quark cannot yet render OS X Unicode fonts for these languages correctly. Alternatives are the limited TextEdit or Pages (which is buggy for Arabic), or using Windows fonts with OpenOffice. There does exist a special ME version of InDesign for Arabic/Hebrew.

Recently I came across a DTP program called iCalamus that appears to combine a wide range of DTP capabilities with proper rendering of complex scripts. Users with this kind of requirement may want to download the free trial here and give it a try. Unfortunately iCalamus does not yet support direct input and editing of Indic -- you have to copy/paste your text from TextEdit or another app where this can be done. A somewhat similar app which does support direct input is Create.

Wednesday, March 28, 2007

OS X includes a special keyboard layout for the Hawaiian Language (ʻŌlelo Hawaiʻi), which puts the macron vowels which it uses (ā ē ī ō ū) on the Option key of the normal letter. This layout also replaces the normal apostrophe (') with the Unicode "modifier letter turned comma" ʻ (U+02BB) used by Hawaiian to represent the glottal stop (ʻokina). ( For the normal apostrophe you can type Option + ', and for the left single you you type for Option + ] )

If you put ʻŌlelo Hawaiʻi at the top of the list in System Preferences/International/Languages you may be surprised at how your files sort by name, since they will follow the order of the Hawaiian alphabet, with the English letters Hawaiian doesnʻt use tacked on at the end: a e i o u h k l m n p w ʻ b c d f g j q r s t v x y z.

Tuesday, March 20, 2007

Because of the different technologies used by Mac's and PC's for complex scripts (AAT and OpenType respectively), it has long been necessary to use different fonts on the two platforms for languages written in Arabic script. Also AAT fonts are uncommon, so Arabic script on a Mac (unless you used particular apps like Mellel or OpenOffice) was essentially limited to the Geeza Pro font supplied with OS X and two fonts (Scheherazade and Lateef) provided by SIL.

Thanks to the Iranian Mac User Group (IRMUG), there is now available a new set of Arabic script fonts, XBZar, which contain both technologies and should thus display the same on both platforms and in all applications that support Unicode. XBZar, which has regular, bold, italic, and bold-italic typefaces, can be downloaded here.

Tiger (but not Panther) has some bugs that cause problems with this font. If you have an Intel Mac, you must be running 10.4.9 for it to work right. On a PPC Mac, 10.4.2 is sufficient.

Friday, March 9, 2007

Etruscan is an ancient extinct language that preceded Latin in Italy and whose writing system is one of earliest examples of the English alphabet. This script is in Unicode under the name "Old Italic." MPH 2B Damase and Code2001 are free fonts that cover it, and a keyboard layout can be downloaded from my iDisk or here.

Etruscan was often written right-to-left, and the fonts mentioned above reflect this direction in the way the letters are oriented. To input this correctly you should use the application Mellel, set for RTL. An alternative is to use another Unicode-savvy app like TextEdit, but precede your text with the Unicode character 202E to force the correct direction. The keyboard on my iDisk has 202E on the ` key and 202C (used to end an RTL segment) on the = key. These should also be used in web pages, but not all browsers may display the correct direction.

Monday, March 5, 2007

Cuneiform is a script used to write a number of languages of the ancient Middle East, such as Sumerian, Akkadian, and Hittite. Since version 5.0 Sumero-Akkadian Cuneiform is part of Unicode, but until recently there have not been any fonts publically available. Now you can download one for the Hittite version of the script here.

Cuneiform was used for such a variety of languages over such a long period that the form of its glyphs showed considerable evolution over time and place. Thus different fonts are required to correctly display the forms used in Neo Sumerian (which appear in the Unicode charts), and those used in Old, Middle, and New Babylonian and Assyrian, Hittite or Elamite.

The only way to input Cuneiform at present is via the Character Palette or copy/paste.

Tuesday, February 27, 2007

Recently in the Apple forums a user complained that he couldn't do Romanian in MS Word 2004 using the standard fonts Arial and Times New Roman. At first I was puzzled, since MS specifically provides special enlarged versions of these fonts with Word that contain Arabic, Hebrew, Greek, Russian, and Vietnamese in addition to more common Latin languages. But sure enough, when I looked more closely, I found that Version 3.05 of Arial and Times New Roman are missing the letters s-comma and t-comma required by Romanian.

It turns out that updated versions of these fonts with the Romanian characters added are available to WinXP users via download from MS since Dec. 2006, and that the new versions are also automatically included in WinVista. Unless you can get copies of these, you'll have to use other fonts for Romanian in OS X -- Courier, Geneva, Helvetica, Hoefler, Lucida Grande, Monaco, Palatino, and Times should all work, along with various downloadable fonts that cover the whole Latin range, like Doulos, Gentium, Charis, and Everson Mono.

Monday, February 26, 2007

If you want to type Spanish on your Mac and are used to Windows, you may run into problems figuring out which keys to use for accented characters and the inverted punctuation marks. That's because, out of the 6 commonly-used Spanish keyboard layouts provided on the two platforms, only two (Mac Spanish ISO and Windows Spanish) are the same. How this situation came about, I have no idea. You can see graphics comparing the six here.

My understanding is that many Spanish users prefer the Windows US International or Latin American layouts. For the US International layout, a version can be downloaded from this page, and for Latin American (winlatinamerican.keylayout) you can go to my iDisk.

Of course you can also type Spanish on the normal US layout: The deadkey for the acute accent is Option + e and for the tilde it is Option + n, with ¿ at Option + Shift + / and ¡ at Option + 1.

Sunday, February 25, 2007

If you try to type Brazilian Portuguese, you might be puzzled by the OS X keyboard layout that comes with the Brazilian flag attached to it. It's neither Portuguese nor Brazilian, but simply the normal US layout, where you type accented characters using Option deadkeys.

As I understand it, Brazilians are used to typing with a layout called ABNT or with the layout called US International PC. You can get an ABNT layout (winbrazabnt.keylayout) here.

Saturday, February 24, 2007

If you are having trouble browsing Unicode Vietnamese web pages, like the BBC Vietnamese site, seeing boxes or question marks instead of characters with two diacritics, it probably means you need an extra font. The problem arises because certain Vietnamese sites stipulate the use of the Arial font in their html code, but the Arial (2.60) that comes with OS X 10.4 does not contain the precomposed Vietnamese characters in the Latin Extended Additional Unicode block. Most people do not notice a problem, because at some point they installed a trial or other version of MS Office2004, which puts a more complete version of Arial (3.05) in your Home/Library/Fonts folder. If you *do* have missing characters, try to find a copy of this font for your machine.

Wednesday, February 21, 2007

Yoruba is one of the major languages of Nigeria, and many speakers have also emigrated to the UK, Brazil, and the US. It is written with the normal English alphabet plus a few extra "dotted" letters: Ẹ/ẹ, Ọ/ọ, Ṣ/ṣ. Also tone marks are required, grave ` for low and acute ´ for high. Lucida Grande is probably the best font to use.

The dotted letters can be made with the US Extended keyboard layout, using Option + x, followed by the base letter. Tones can then be added using Option + Shift + ` and Option + Shift + e. You can also download a custom Yoruba keylayout from my iDisk. This puts the dotted letters on the 1, 2, and 3 keys and all the vowels are deadkeys. You type the vowel and then choose a) 4 for grave accent/low tone, b) 5 for acute accent/high tone, and c) any other key for no accent/mid tone. The Naira symbol is on the = key. This layout also has the special characters needed for Hausa and Igbo (Ɓ ɓ Ɗ ɗ Ƙ ƙ Ƴ ƴ Ṅ ṅ Ị ị Ụ ụ).

Sunday, February 18, 2007

Hausa is an official language in northern Nigeria and widely used in other Muslim areas of West Africa as well. The standard Latin orthography uses the normal English alphabet, plus the hooked letters Ɓ ɓ Ɗ ɗ Ƙ ƙ Ƴ ƴ. Of the fonts included with OS X, only Lucida Grande has these, but others can be downloaded. In addition, some publications have used a dot underneath instead of the hooked letters, and some dictionaries and teaching aids may also mark vowel length using a macron ¯ and vowel tones using grave ` (low), acute ´ (high), and circumflex ˆ (falling).

All of these can be made using the US Extended keyboard layout, a chart for which is here. The hooked letters are made using Option + Shift + . (period), followed by the base letter. Another solution is to use the custom Hausa keyboard layout from my iDisk. The hooked letters are located on the Option version of the base letter.

Hausa has also been written in Arabic script, and examples of that can be found on Nigerian banknotes.

Friday, February 16, 2007

Mac OS has long provided a language translation facility, first via Sherlock and, starting with OS X 10.4, the Translation Widget in Dashboard. In both cases there is no translation application on your machine: Sherlock and the Widget are simply interfaces to an online translation service, in this case Systran, which can also be accessed via any web browser.

Anyone who uses this to translate foreign text into his native language will quickly see the limitations. Relying on it to translate your native language into one you don't yourself know, in the expectation of making yourself correctly understood, is not realistic. The reasons for that lie in the inherent shortcomings of Machine Translation. Only humans can really do it right.

Monday, February 12, 2007

There are various languages, scripts, and characters which have not yet made it into the Unicode standard. If you are interested in where something stands in that regard, the best resource is the Proposed New Scripts page at Unicode.org. There you will currently find about 4 dozen in 4 categories -- Exploratory Stage, Committee Review, ISO Balloting, and Pre-Publication. Another useful page is the Roadmaps of proposed allocations of Unicode codepoints. If you want to see what characters might be added to existing scripts, the right place to look is the Character Pipeline.

Saturday, February 3, 2007

The Dravidian languages, which form a group apparently unrelated to any other, are spoken in Southern India and in Sri Lanka. OS X 10.4 comes with support for Tamil, and you can find info on how to use it here. For typing in the other main family members -- Kannada, Malayalam, and Telegu -- you need to download and install fonts and keyboards, using either the commercial kits available from XenoTypeTech or the free stuff listed below:

Web sites in these languages may still be using custom encodings instead of Unicode. In that case you will need to download whatever font they require (e.g. manorama for manoramaonline.com) and you may need to try different browsers as well.

Tuesday, January 30, 2007

Kurdish (كوردی) is the official language of the Kurdistan Region of Iraq, and ever-increasing amounts of Kurdish text are being produced in Arabic script. While OS X 10.4 does not include a Kurdish keyboard layout, there is one you an download on my iDisk. A large number of free kurdish fonts have recently been made available from KurdITGroup. Unfortunately these are for Windows, but you should be able to use them correctly on a Mac with the applications Mellel or OpenOffice. For other apps, the normal OS X default font, Geeza Pro, probably does not do totally correct Kurdish orthography, and it is better to use one of the SIL fonts, Lateef AAT or Scheherazade AAT.

Friday, January 26, 2007

For most people, the web browser is the most frequently used piece of software, and having it in one's native language would be nice. Different browsers offer different possibilities in that regard. Safari comes with the 15 localizations standard in OS X, and you may be able to find other non-official versions by searching MacUpdate.Opera offers the same 15. FireFox has the best selection, with over 3 dozen localizations available, including Arabic, Hebrew, Basque, Kurdish, Mongolian, Greek, Russian, Georgian, Lithuanian, Punjabi, and Turkish. OmniWeb has only Danish, Dutch, English, French, German, Japanese and Swedish. Camino has Danish, Dutch, French, German, Italian, Japanese, Korean, Lithuanian, Polish, Portuguese, Russian, Slovak, Spanish, and Swedish. Mozilla Suite has Traditional Chinese, French, German, Polish, Swedish, and Turkish. iCab has 8 of the OS X languages plus Russian. Netscape and AOL seem to be only in English.

Thursday, January 25, 2007

If you are a professional translator using a Mac, you may be interested in tools for CAT (Computer Assisted Translation). One of these is a specialized word processor that makes provision for keeping source and target text carefully organized and has a "translation memory" to automatically help the translator use earlier work to process new text. Two such applications for OS X users are AppleTrans and OmegaT. An excellent survey of the field can be found at the Wikipedia CAT page.

One important area for translation work is the localization of OS X applications, i.e. providing the means for the menus and dialogues of a program to be displayed in the user's native language. For this special tools are required to extract and replace the parts of the application containing the texts. Apple provides the programs AppleGlot and ADViewer for this purpose. There are also 3rd-party alternatives including LocFactoryEditor,iLocalize, and Localization Suite.

Monday, January 22, 2007

A long time ago in a galaxy far, far away, elementary education drilled correct spelling into you so throughly it was hard to forget it. These days computer spell-checking allows those brain cells to be used for other tasks. OS X comes with spell-checking for Australian, British, and Canadian English, German, Spanish, French, Italian, Dutch, Portuguese, and Swedish. (Note: Leopard adds Danish and Russian.) But what if you need a different language?

CocoAspell is one answer. First you install it, then download the dictionary you want from the list of several dozen. Decompress the file with Stuffit Expander, then just put the resulting folder in /Library/Application Support/cocoAspell/. Enable your dictionary by going to System Preferences/Spelling (a new item created by CocoAspell).

Friday, January 19, 2007

Japanese is no doubt one of the most complex of all languages to write, since it can use no less that four different scripts: Kanji (Chinese characters), Hiragana,Katakana, and Latin.

The Hiragana and Katakana syllabaries play an especially important role, because their Latin equivalents are used for Japanese computer input (with subsequent conversion to Kanji as appropriate) via a Latin keyboard, and they are also used to represent the pronunciation of non-Japanese words or possibly unfamiliar Kanji.

There are well over 150 kana syllables which can be created by the Mac Kotoeri Japanese Input Method, some fairly rare, and info on how to make all of them is buried in the Japanese-only Kotoeri Help. Anyone who needs this in more usable form can find a copy of the list here, which you can enlarge in your browser or drag onto your desktop to print for reference.

Sunday, January 14, 2007

Navajo uses the Latin letters of English plus a number of extras that make it not that easy to render in ordinary print: Łł Ńń Áá Éé Íí Óó Ąą Ęę Įį Ǫǫ Ą́ą́ Ę́ę́ Į́į́ Ǫ́ǫ́. The last four do not have any precomposed version in Unicode, which means some apps will probably not place the two required diacritics quite correctly. In addition, Navajo makes heavy use of the glottal stop character, ʼ , which should probably best be represented by the modifier letter apostrophe (U+02BC). Some people have used the ascii apostrophe (U+0027) or the right single quote (U+2019) instead, but these are punctuation marks rather than real letters and it is better to avoid them.

Typing Navajo can be done using the US Extended keyboard layout in OS X. The glottal stop ʼ is made via Option + i, then space.

To type Navajo on an iOS device, download the Keyman app from the app store.

Navajo characters (with the addition of ṉ) can also be used for the closely related language Western Apache.

Thursday, January 11, 2007

Wednesday, January 10, 2007

Watching the very impressive demo of the new Apple iPhone, I was wondering whether the version of OS X (as well as the email client and the browser) incorporated in it will support the same multilingual features that 10.4 does. I haven't seen any mention of this in reports from MacWorld or elsewhere so far. It would be cool to point the browser at a test site like this one and see what all can at least be displayed.

Monday, January 8, 2007

Tagbanwa is a language spoken by a few thousand people in the Philippines. Although its script became part of Unicode with version 3.2, only recently do we have a font, thanks to Samuel Thibault. You can also find an OS X keyboard for Tagbanwa on my iDisk.

Saturday, January 6, 2007

While the iPod still lacks the capability to display Arabic, Hebrew, Hindi, Thai, Vietnamese, and various other complex scripts, it is fully Unicode-savvy for the 28 languages that are listed in its tech specs. You can display all of them on a single page of plain text as long as it is encoded in UTF-16. If you want to be able to demonstrate this for yourself, download the file ipodlangs16.txt from my iDisk, and put it in the Notes folder of your iPod. This is a selection from the UTF-8 Sampler Page, which provides the phrase "I can eat glass and it doesn't hurt me" in many languages.

The iPod can also display multilingual text in UTF-8 encoding as long as the text starts with a BOM (Byte Order Mark). TextEdit does not create UTF-8 with a BOM, so you need to save the text with a program like TextWranger which has that option. If you are curious about what a BOM does, see here.

iPod notes can only be 4K in length, but you can break longer texts into pieces of the right size with links from page to page using a program like Book2Pod.

Monday, January 1, 2007

Info on the current relatively good state of OS X localization can be found here. MS Win XP has, by contrast, only been available to consumers in one language at a time. But Win Vista, which should be out at the end of January, 2007, promises a new Multilingual User Interface (MUI) which is likely to equal OS X capabilities and also go further by adding Arabic, Hebrew, Russian, Czech, Hungarian, Polish, Turkish, Greek, and possibly more. On the other hand, the new MUI will apparently only be provided with Vista Ultimate (retail price $400) and not with the cheaper Home Basic, Home Premium, and Business editions.

A list of various types of Vista language packs and input keyboards can be found here.

As for the Zune, I understand that its current software only does Latin script, which makes the device very limited indeed compared to the iPod.

I personally don't have much need to read/write Unicode in OS X's Unix command-line environment, but some people do find this useful or essential. Whether you can get it work satisfactorily depends on the languages and apps you are using and probably also on the fonts you have. Getting started involves setting Terminal's preferences correctly and creating .inputrc and .profile files in your Home directory to change the bash shell default behavior. Details on how to do this can be found here. That may be enough for some purposes. If not, some suggestions for further refinements are here.