Summary

This project aims at improving CJK support in Ubuntu.

Rationale

As of Breezy, since it has no default input method, normal CJK users can't write their native language on Ubuntu desktop environment. Additionally, the default configuration for various applications and the whole desktop is not so suitable for Asian users and users from certain countries. For example, default desktop font size is simply too small for CJK users, especially Chinese. To improve the experience for these users, some packages need to be patched, while others may need additional configuration.

Use cases

Chulsu installed Ubuntu onto his laptop and opened Firefox to see his favorite Korean web forum. Then, he found that first, "why this page looks diffrent than Firefox on Windows of my desktop", second, "how to input Korean to write my reply to the forum" and so on. He started to search Ubuntu Korean wiki and KLDP, and asked his questions. Spending several days, he just knew about how to install font packages, how to configure .fonts.conf under his home directory, how to install and use his Korean Input Method and so on. Now, he is thinking "why Linux is so difficult than Windows, but if all of these installed and configured when I installed Ubuntu that's the way to go."

Yeonhee loves to listen her favorite CDs when she is working on OpenOffice for her writing. For her one month trip to Jeju island, she wanted to convert them into MP3, but couldn't find a convert tool from Ubuntu installed on her new laptop. Anyway, she converted her favorite songs with MP3 music tag from her Windows, then opened Rhythmbox on Ubuntu laptop to listen them in case of testing. Now, she is looking at the song names aren't correctly shown up with Korean, "how I gonna go to my trip?"

Miyoung wanted to try Linux for her class, but she never used Linux before. Her classmate gave her Unbuntu CD so she was happy. But, on the way back to home, she felt some difficulties for installing the CD to her desktop which already had Windows installed, decided that "OK, I am going to search Ubuntu site for installing, what a great if I can find Korean guides are there, I should learn English..., does this CD support Korean?...".

Scope

Suggest that split this spec into several parts to concentrate and achieve one by one from most important one.

For SCIM as default Input Method for CJK users, suggest that use InputMethods/SCIM to gather development ideas about improvements and specifications

We need to workaround the SCIM ABI bug before we go ahead with this choice.

Default Input Method for CJK users, FontConfig and Enabling Embolden with patches are the topics we should focus first, in my opinion.

Keep this spec as main to see overall scope, progress and implementations, add sub-specifications here.

Use ttf-arphic-uming/ukai by default, since these are the only package that contain Hong Kong characters for all sizes.

Install xfonts-wqy for simplified Chinese installation; ttf-newsung is not needed since it has already been included into uming/ukai.

Regarding this fontconfig topic, Korean Linux users are discussing about default font for Distro instead of ttf-baekmuk, currently most favorite font is ttf-unfonts then ttf-alee. KoreanTeam will provide up-to-date BeautifyKoreanFonts once decision will be made for a font package.

In Japanese case, there is no completely free and high quality Japanese font. Ubuntu uses Kochi font which is DFSG free, but it is inferior of quality to commercial Japanese fonts. This font issue is barrier to expand use of completely free linux distribution in Japan.

Not exactly true. For Kochi Gothic & Mincho, you can change the fontconfig setting to display embedded bitmap at font size 12-17 & 20-21 (you can install the Fontforge package to verify this). Once the setting is adjusted, they will look much better at those sizes. Also, OpenOffice.org2 internally recognizes embedded bitmap and will look nice on these two font sets.

I think the majority of Japanese users prefer outlines, not bitmaps. In Fact, Fedora and Mandriva don't use embedded bitmaps for Japanese. These developpers choose outlines for Japanese. And Kochi's outline has worse quality than IPA and IPA Mona's.

Are you sure the decision not to use outline is an informed one among all three distro developers you mentioned? Because in general, very few people know embeddedbitmap fonts exist at all (even on Windows, because by default Windows uses embeddedbitmap). If as you said many Japanese users prefer outline font, sure go ahead. CJK Users can always switch it on again within /etc/fonts/fonts.conf. I do know the situation is quite opposite for Chinese users.

In Japan there is no clear tradition to select outlines or bitmaps unlike China and Korea.

I have distribute Japanese customized Ubuntu CD image for 9 months, and I had some requests to use outlines on legacy(GTK1 etc) applications and OpenOffice.org, but I had no requests to use bitmaps on GTK2 applications. So I think many Japanese users prefer beautiful outlines in the tradition of Mac OS X. BTW, I had misconception the default setting on SuSE. The default setting for CJK on SuSE is bitmaps now. I'm removing that from above.

On OpenOffice.org Kochi Gothick & Mincho look fine with bitmaps as above. But if you print out with these fonts, turn into a nightmare. You'll see poor Japanese charactors on your printed page.

There are "IPA Font" and "IPA Mona Font", they are high quality and free-for-use Japanese Font. Unfortunately this font is not completely free. The license but demands redistributing with one of softwares specified by the supplier. At present, Japanese Team supplies IPA Font and IPA Mona Font debian packages for Breezy. Check out this blog for more detail. Note: The article of this blog is no more than his guess. IPA Font and IPA Mona font packages distributed by the Japanese Team include one of software specified by the supplier based on the IPA Font license. Additionally, the packages listed in this blog are installer-helper package or installed from the backport repository by their GUI setting helper (like Easy Ubuntu).

Interesting find: Using Fontforge to open and explore the ttf files from IPA and IPA Mona font collection, it seems that these font sets carry embedded bitmap at various sizes: For "IPA Font" collection: All the fonts carry embedded bitmap at size 12, 14, 16. For "IPA Mona Font" collection:

For the rest of IPAMona collection, they all carry embedded bitmap at size 10-16

As a result of these findings, it is strongly recommended to turn on embedded bitmap in the fontconfig setting by default. It is now known that Uming, Kochi, IPA and IPAMona will all benefit from this, possibly including some other free CJK fonts. If in suspicion, inspection by Fontforge is encouraged. (the program ftdump from freetype2-demos package can also be used to inspect in which size is bitmap available)

I'm the author of IPA Mona Fonts. IPA Mona Fonts include bitmap fonts as above. But I think that it's good to use autolines for Japanese by default, because normal users prefer outlines. Small number of Japanese users love bitmaps, so I added bitmaps in some sizes for them. This is not for all Japanese Users.

Enable embolden font by default for CJK users

Build xft2, fontconfig, pango and cairo2 with embolden enabled

A good news with Dapper's libxft2 2.1.8, embolden enabled without rebuilding them

CJK users should be able to display their mp3 file ID3 tag correctly. Historically these tagging issue is a mess, everybody is using her own legacy encoding for mp3 tag because there is no support for non-western languages until very recent ID3 tag specification.

For applications which make use of GStreamer, setting GST_ID3_TAG_ENCODING can be an internim solution. There are more discussions on UTFEightCurrentProblems.

(?) Allow users to read/write CJK under console.

Or when this is impossible, change $LANGUAGE to C automatically so that users won't see lots of junk on console.

Status: The bug is officially fixed and verified, with the CWS module fakebold merged in the Sun's SRC680_m146 internal development build, to be released in OpenOffice.org 2.0.2 as a maintenance update. Ubuntu Dapper has fixed the issue by packaging version 2.0.2 as mentioned in Malone #23342.

Note Bug 18285 does not require freetype 2.1.10, it was a mistake on my part . You can install OpenOffice.org 2.0.2 Linux binaries from the official page (by converting RPM to DEB using alien) and experience the effect of virtual style even on Ubuntu Hoary/Breezy right away.

Configure firefox for print CJK correctly

This can probably be done in per-language basis, in firefox language packs.

Implementation

Code

Data preservation and migration

Packages affected

input methods:

As of 2005-12-29, nabi (0.15-2) supports im-switch, no extra setup required for Korean Input if you install nabi and im-switch. If you install nabi under en_US locales, you can set nabi with "im-switch -s nabi" which creates ~/.xinput.d/en_US . For other locales, it should work same.

font packages:

As of 2005-12-08, ttf-arphic-uming/ukai packages are moved to main, and original Arphic fonts are obsoleted.

Improve automatic configuration; currently ONLY those who know this package exists and ONLY those who understand the ins and outs can configure input method settings.

scim and skim support im-switch with Dapper.

language-selector:

It should set appropriate environment variables like $LANGUAGE and $LANG according to real life usage, and not just dummy settings. For example, Hong Kong people are using Taiwan translation mostly, but they may have their own; thus the correct setting is LANGUAGE=zh_HK:zh_TW.

Add a variable, say $CONSOLE_NOT_LOCALIZED, and define it for each language. In particular, set it to "yes" for all CJK languages, so that during bash startup it could redefine $LANGUAGE to C under console. (and console ONLY!)