This is a static version of Enabling Japanese at the Gentoo Wiki. A pervious version is archived here. This contents of this page was updated on 3 Febuary 2006.

Enabling Japanese

Support is available in the Desktop Environments forum. Make sure to include all the appropriate versions of things - like kde-3.3.4.

Of all languages to learn, Japanese is known as one of the most challenging - not because of the spoken language, but the written language. The objective of this HOWTO is to make your gentoo box work with that written language. For this, there are two sections: Japanese Fonts, and Japanese Input. Those setting up input should, of course, set up their fonts first. New installations will want to make sure they have the proper USE flags set, as outlined below.

---

Japanese Fonts

You simply want to read the stuff, say, in Mozilla Firefox. You need to install fonts - A good sign that you have not installed the proper fonts is that the following characters appear as boxes with numbers inside: 日本語フォント

emerge media-fonts/kochi-substitute For Japaneseemerge media-fonts/arphicfonts For Chineseemerge media-fonts/baekmuk-fonts For Korean

It never hurts to get them all.
There are other cjk and unicode fonts available in the portage tree, to be found with emerge search fonts, with some notible exceptions: Bitstream Cyberbit, available in an ebuild outside of portage, due to questions in licensing. Arial Unicode MS is another great font, which you may or may not have access to. There have been reports of errors in emulators while using this font, but this same procedure can be followed for any Microsoft-provided truetype fonts you may find:

Code:

emerge cabextract

Find a copy of aruniupd.exe - online availability changes.

Code:

cabextract aruniupd.exe

For system-wide installation use

Code:

cp *.ttf /usr/share/fonts/

for local installation (no root access)

Code:

cp *.ttf ~/.fonts/

Then

Code:

fc-cache -fv

Programs will probably have to be restarted to access new fonts.
Arial Unicode MS is now available to your system. Web browsers like Firefox should probably have this mentioned in their settings. Specifically, in Mozilla Firefox, look at See Preferences >> General >> Fonts & colors >> Fonts for: Japanese

According to some docs I've read, Java 1.5 is supposed to support 'fallback fonts' without having to add them explicitely to fonts.properties. So all you have to do is to create a .../jre/lib/fonts/fallback/ directory and put at least one unicode font with Japanese support in there (or, since these fonts tend to get very big, just a symlink to an existing font in your /usr/share/fonts/ directory).

Japanese Input
Fonts are not enough for you? Good. Let's prep your system for input support. It should be noted that this process is quite similar for Chinese, Korean, and a host of other languages.

Setting Locale
Using japanese characters means using character sets outside the normal POSIX range; Unicode characters. To input them, you need to allow their use on your system.

All of the entries should be either blank or say "POSIX", unless your locale has been previously set. If so, you need to figure out where. ; )

Code:

locale -a

de_DE.utf8
en_GB.utf8
en_US.utf8
fr_FR.utf8
ja_JP.utf8

Gives a list of all the unicode locales availble on your system. This list can be expanded or limited by editing your needed locales, should you be missing an entry. Uou are choosing the language you want your menus to be in, NOT the one you are currently setting up input for. For example, a Frenchmen wanting to write japanese would choose fr_FR.utf8 from this list.
Now, continuing with the Frenchman example:

Ok, one answer: /etc/env.d/02locale is used because of precident, and outlined as such in Using UTF-8 in Gentoo, a good thing to read if you have issues at this point or later.

Setting USE flags

Next, you need to add the following USE flags to your make.conf, if they do not already exist:

cjk - standing for 'Chinese Japanese Korean' - gives support for Hanzi-inspired characters ( two byte, kanji, the reason you get al those accented 'a's).
nls - 'native language support' - supposedly for enabling other languages in your interface, the nls flag could be used by some ebuilds as an 'other language support'; Enabled this as a one of many safeguards to ensure that Japanese locality is compiled in.
immqt-bc - lets Qt handle different input methods.
-immqt - This is explicitly disabled because it conflicts with immqt-bc. Setting this flag would require recompiling all programs that depends on Qt3, and has broken in the past. THis recomendation will change with Qt4.
unicode - Unicode is the pot every character is thrown in (except cursive Hebrew, apparently ^.^; )

With these flags set in your /etc/make.conf, you should make sure all your currently portage-installed packages have the correct support built in. New systems should make sure to do this early (if not recompiling all packages), to avoid rebuilding as much software packages as possible.

Code:

emerge world --newuse

Input Methods

Now, Japanese has both kana and kanji - you need a dictionary to give you possible kanji. Anthy is different from other systems available because it does not require any services to be started.

Code:

emerge anthy

Now that the dictionary is installed, an additional input method will be built.
UIM, the Universal Input Manager, is what routes keyboard input.

Code:

emerge uim

On its own, UIM is enough (under gtk+) to handle Japanese input. You can check this from the text entry context menu of most gtk+ programs (excluding firefox), in which UIM-anthy will be one of the new choices. UIM, in fact, becomes the defauilt gtk+ input method once installed - and it has a Gnome control panel available if you are satisfied with switching methods via keyboard. (qt requires an export QT_IM_MODULE=uim statement)

Graphical Input Method selection
SCIM, the Smart Common Input Method, provides a taskbar icon and menu for switching between input methods. It is especially good for computers with more than two methods available - or for people that prefer mouse access.

Code:

emerge scim-uim

Qt needs an aditional step to use scim - emerge scim-qtimm. GTK+-only users do not need to do this though.
Now that everything is installed, we just need to tell everything to use scim. The following can go in /etc/xprofile for all users, or your own ~/.xprofile.

Wrapping up
To actually use your input method, you will at have to env-update; source /etc/profile and restart X11; you may possibly have to reboot.
Once you have done so, start up a text editing program like kwrite or gedit. A keyboard icon will appear in the system tray, that lets you select from your different input methods.

Once you are using an input method, like uim-anthy, there several modes to choose from: raw input, hiragana, katakana, half-width katakana, and a typewriter-like variation of the latin alphabet. Start typing in Hiragana mode, and you text will be converted as the appropriate kana are found. The spacebar brings up a list of possible kanji and cycles through it, and hitting enter accepts and uses the replacement. More keyboard combinations are at uim-anthy.

"To enable UTF-8 on the console, you should edit /etc/rc.conf and set UNICODE="yes", and also read the comments in that file"
"Alternate WMs" Reference
GMplayer just doesn't, okay?

If you get letters that are inconsistant with the font you expected, you are not using raw input mode. Try some other modes.
The SCIM button can seem to flash or temporatily dissapear. This is because scim keeps settings per program - firefox input could be in Japanese while Gedit is in another language.
Gjiten & Kiten (part of kedu) are japanese dictionary programs, using EDICT. Gjiten is more comprehensive, but requires you to manualy install dictionaries. Nihongo Benkyo is another possibility, Bug 112894 for ebuilds

Actually, this should work under Gnome, and any other setup with a standard system tray. I had it working under my (custom) Xfce4 setup for a while, although uim-anthy is all I really need. But many people like buttons.

i'm using SCIM for a long time now (for inputting japanese, chinese and korean). but.. there is just one problem with it. and as i read your howto i think you might have that problem too.

setting LC_CTYPE=ja_JP.UTF-8 causes my JAVA-apps to run in japanese, too!! i have no explanation why LC_CTYPE changes the interface-language, but myjava-apps are now in japanese. at least those, that have a japanese translation available! only when i set LC_CTYPE back to de_DE i have the usual german/english interface. but this again makes it impossible to use SCIM in that special application i start with de_DE as LC_CTYPE.

One question though... Where might the text-config be so I can change the char map for this sucker? The key input options (even when I can change them) sort of suck for the scim config. I can't get any of the good function keys to work under XFCE and my options are sort of limited to ALT, SHIFT, CRTL, SPACE, and RELEASE.

hiroki: As far as I know, setting LC_CTYPE differently is only needed for Openoffice-1.x, to get around a known issue. You should not need to set it in any other case (which would include openoffice-ximian, which I would recomend if at all possible). Your input methods tend to ignore these values, since they are for other languages already.

For future refernce, what input methods do you prefer?

yaneurabeya: Um... What? My japanese isn't what it could be. But for key mapping, you need to understand this: scim is only the means to choose your input method. Yes, it does have some methods of its own, but if you are using the uim-anthy or uim-canna of this tutorial, you will have to look up uim configuration.

hiroki: As far as I know, setting LC_CTYPE differently is only needed for Openoffice-1.x, to get around a known issue. You should not need to set it in any other case (which would include openoffice-ximian, which I would recomend if at all possible). Your input methods tend to ignore these values, since they are for other languages already.

well, I use OpenOffice (not ximianized version). And I additionally have an OpenOffice-2-beta installed. So I'm going to try whether OpenOffice2 still needs this LC_CTYPE-flag or not. If so I'll have to keep it and watch all my JAVA-apps running in Japanese [urg] and of not, I'llkick it out. A Japanese interface is not that bad, it's just that some apps use fonts, that cannot display Japanese characters and then show lots of ugly boxes
Otherwise it would be OK.

Sudrien wrote:

For future refernce, what input methods do you prefer?

I use SCIM in order to access the following input methods:
Japanese -> SKK
Chinese -> SmartPinyin, WuBi (When I only know the Japanese reading of a character and therefore need to type it just by knowing it shape/components)
Korean -> Romaja

PS:
harharrrrr, OpenOffice2 doesn't need LC_CTYPE to be set in order to allow SCIM to work.. yeeeehaaaaa

PS2:
sorry, it was all wrong... I cannot use SCIM without LC_CTYPE set anymore... neither in OOO nor in i.e. Xterm...

PS3:OK, now it's enough!
I just simply unset LC_CTYPE, I guess that was wrong. So setting it to de_DE.UTF-8 [and LC_ALL, too, and LANG, too] helped. workes fine, for all apps! yippieh!!!

PS4:
OMG! Won't this end! I discovered that with LC_CTYPE=de_DE.UTF-8 I cannot type Japanese [or generally speaking: use SCIM] when launching an xterm from another xterm. So typing "xterm" [enter] in an already running xterm will end in a new xterm that cannot use SCIM ><

OK
I don't know why, but over here it does not work without LC_CTYPE=ja_JP.utf8
I have nooo idea why. If LC_CTYPE is not modified from the default (POSIX?) or set to de_DE.utf8 I cannot type Japanese [use SCIM], I can only use it in GTK-apps. (but not in Xterm, or qt-apps, etc.)

Well, I think the LC_CTYPE thing is not specific to OpenOffice but to any non-Gtk+2 non-Qt application.

However, I have a problem with these X applications, for example Java applications. When I hit Ctrl+space the SCIM bar actually appears but I can only select English/European, not Japanese. It is working fine in both Gtk+2 and Qt applications, and I have LC_CTYPE set to ja_JP.UTF-8._________________See me on Jabber: erwan@im.loisant.org

T-5h: I needed to type five sentences in japanese (I'm using UIM)
T-4h: Aha, so gcc-3.4.3 is the source of anthy's dementness! WTF? Butterfly effect totally sucks to this Like, "I fart here" VS "dark force conquers hundred planets" is more connected than this crap. Oh well. I'd love to use canna, if I may, oh the great UIM.
T-3h: UIM does not support canna (my .uim file:

uim-im-whatever offers only anthy (plus skk plus other things obviously from second reality)
T-2.5h: A little lunch would be wise.
T-2h: SCIM is nice but I don't need to click through 5 menus to change the keyboard, thank you. Ctrl+Shift+Space is enough for me.
T-1h: SCIM supports anthy and UIM, which supports only anthy.
NOW: UIM still does not support canna. AAAAAAAAAAAAAAAAAAAAAARRRRRRRRRRRRRRGHHHHHHHH!!!!!!!!!!!!!!!_________________I thought what I'd do was, I'd pretend I was one of those deaf-mutes or should I?