Pages

Episode 478: Invoice, etc. Need to Display Mr.森鷗外 (Mori Ōgai)

Monday, April 11, 2016

I wonder how many characters a Japanese person can read and write.

We have 48 pairs of "hiragana" and "katakana" (syllabaries), but the number of "kanji" (logographic) is counted to approximately 10 thousand. It is also said 30 thousand including characters that are not used in day-to-day life. Japanese people can describe any species of fish in one character. Our feeling for "limitation of 140 characters" of Twitter is greatly different from western people. (They say Chinese has 80 thousand.)

And in Japan, a lot of time is devoted to "learning kanji" in school education.

Almost Japanese kids would become able to read and write approx. 10 thousand characters through 6 years of elementary school. Then they learn another 10 thousand in 3 years of junior high school, and in high school and college or university, they encounter more complicated kanji. Eventually, general member of society is said to be able to effortlessly read about 3000 kinds of kanji. ( Even I probably am able to WRITE kanji barely 10 thousand of them, honestly.)

Although people who were born and raised in Japan would not care, this kind of story goes down well for European language speakers.

Well, such Japanese people have been struggling with "Computer processing for Japanese language" in various ways.

Not only the disadvantage of "too many characters", we encountered troubles from various causes, such as, "no word separator (space)", or "Several readings for the same character", or "Many characters in different meaning exist for the same reading", or even "the shape of the same character varies according to vertical or horizontal writing."

The following Workflow is for testing of the outputs to PDF.

Even though more than 10 thousand characters are designed in Fonts in Japanese, there are possibilities of occurrences of trouble that "particular character does not exist in the Font" upon embedding business data onto [Template PDF] automatically at a Step in the middle of Workflow. This is a Testing Flow to detect such a trouble in advance. The most important point is whether "Font data" has been embedded in the [PDF Template] file or not.

[PDF generation test]

For Japanese, "Typographical errors" is a daily occurrence, so we do not care a little mistake. Even subtitles in a TV program are wrong sometimes.
However, it is still worrisome when a trouble occurs on computer screen, such as "character omission (skipping)" or "replacement with Chinese character". All the more so if it was a PDF file to submit to a customer. Even though the number of occurrence of trouble has been reduced considerably by the standardization in 1998 and 2004, troubles concerning "localization of Japanese language" seem to be continued for a while.

By the way, a person can live on with kanji so-called "Level 1" and "level 2" (6355 characters). It can be said that Kanjis on "Level 3" or "Level 4" are required because those have been used in the names of person and places. Mori Ōgai, the novelist, is "森鷗外", definitely not "森鴎外".