It seems like these two statements on page 399 of Chapter 2 are contradictory :

First

"Simplified and Traditional Chinese. There are currently two main varieties of writtenChinese: “simplified Chinese” (jiântîzì), used in most parts of the People’s Republic ofChina (PRC) and Singapore, and “traditional Chinese” (fántîzì), used predominantly inthe Hong Kong and Macao SARs, Taiwan, and overseas Chinese communities. The processof interconverting between the two is a complex one. This complexity arises largely becausea single simplified form may correspond to multiple traditional forms, such as U+53F0 台,which is a traditional character in its own right and the simplified form for U+6AAF 檯,U+81FA 臺, and U+98B1 颱."

Second

"There are two PRC national standards, GB 2312-80 and GB 12345-90, which are intendedto represent simplified and traditional Chinese, respectively. The character repertoires ofthe two are the same, but the simplified forms occur in GB 2312-80 and the traditionalones in GB 12345-90."

GB/T 12345-90 is best described as the traditional analog of GB 2312-80. In other words, for any hanzi in GB 2312-80 that is considered to be simplified, GB/T 12345-90 has, at the same code point, the traditional form. There are approximately 2,000 such cases, which is about one-third of GB 2312-80. In other words, approximately two-thirds of the hanzi in GB 2312-80 and GB/T 12345-90 are the same.

Lets us use Row-Cell 16-10 (0x302A) from both legacy character sets as an example:

The G0 source is GB 2312-80, and the G1 source is GB/T 12345-90. Note that both share the same character code (hexadecimal 302A), but their sources are different. They have different Unicode code points.

The Unihan Database further reflects their status as simplified/traditional pairs:

"Simplified and Traditional Chinese. There are currently two main varieties of writtenChinese: “simplified Chinese” (jiântîzì), used in most parts of the People’s Republic ofChina (PRC) and Singapore, and “traditional Chinese” (fántîzì), used predominantly inthe Hong Kong and Macao SARs, Taiwan, and overseas Chinese communities. The processof interconverting between the two is a complex one. This complexity arises largely becausea single simplified form may correspond to multiple traditional forms, such as U+53F0 台,which is a traditional character in its own right and the simplified form for U+6AAF 檯,U+81FA 臺, and U+98B1 颱."

As far as I can understand this paragraph, for instance, is saying that the character U+53F0 台 is a simplified form of the character U+6AAF 檯.

Let´s then obtain the legacy characters for these two Unicode characters U+53F0 and U+6AAF, the same way you did for the characters U+853C and U+85F9 in your example :

U+53F0 kIRS_GSource G0-4C28U+6AFF kIRS_GSource G1-786D

which tells me that U+53F0 is not a simplified form of U+6AFF, which in a different way, confirms the inconsistency that I pointed out on my first post.

Part of the confusion is that U+53F0 itself is considered a traditional form. In other words, all four traditional characters—U+53F0, U+6AAF, U+81FA, and U+98B1—folded into a single simplified form, U+53F0, which happens to be the same as one of the four traditional forms. This is precisely why U+53F0 is listed among the kTraditionalVariant values for itself:

U+53F0 kTraditionalVariant U+53F0 U+6AAF U+81FA U+98B1

When the simplified/traditional relationship is pure one-to-one, which is the vast majority of cases, it is as I explained in my first reply to your thread.

Sorry for taking some time to answer your last two posts, but I was trying during this time to get some acquaintance with the Unihan Database, as this was up to now, a complete novelty to me.

I can understand your last two responses. They make sense and I thank you for your input. But I'm still struggling with some points, that I would very much appreciate if you could clarify :

1. Take for example the character U+53F0. As can be seen here, the properties kGB0 and kGB1 for this character are, respectivelly, 4408 and 8875, which corresponds to the hexadecimals 0x4C28 and 0x786B. Now, kIRG_GSource for the character U+53F0 shows, as expected, the value G0-4C28. Why doesn't it show G1-786B as well ?2. I tried for a long time to find on the internet an official site for the PRC standards GB 2312-80 and GB 12345-90. I did find these two unofficial sites for the GB 2312-80, http://www.chinese-tools.com/resources/gb2312-80-table.html and ftp://ftp.oreilly.com/examples/cjkvinfo/AppE/gb2312.pdf, in English, but I couldn't find anything for the standard GB 12345-90. I'd like to know, for example, the glyph for the character code 8875 (kuten form) on this standard, which by the way, is not listed on GB 2312-80. How should I proceed ?3. I repeat below the second statement on my first question on this thread :

"There are two PRC national standards, GB 2312-80 and GB 12345-90, which are intended to represent simplified and traditional Chinese, respectively. The character repertoires of the two are the same, but the simplified forms occur in GB 2312-80 and the traditional ones in GB 12345-90."

Given that the characters U+6AAF, U+81FA and U+98B1 are in GB 12345-90, but not in GB 2312-80, is it correct to say that the character repertoires are the same for these two character sets ?

Because some simplified forms are also considered traditional forms, and that multiple traditional forms folded into the same simplified forms, the relationship is not one-to-one.

Thus, for GB/T 12345-90, the repertoire is not the same as GB 2312-80. Instead, it is more accurate to state that the repertoires are largely the same and parallel, differing only in that GB/T 12345-90 provides the traditional form in the same relative GB 2312-80 code point.

GB/T 12345-90 includes 103 additional hanzi in rows 88 and 89 (GB 2312-80 includes hanzi in rows 16 through 87). These 103 additional hanzi are somewhat difficult to explain. Some of them are cases in which the simplified form is itself also a traditional form, meaning that two traditional forms folded into one form, and the form that was at the GB 2312-80 code point was moved to row 88 or 89, and replaced by the other traditional form. I believe that these account for 41 of these 103 hanzi.

The answer to Question #1 above, I suspect, is that only a single value is allowed for the kIRG_GSource field. The kIRG_GSource field is not unique in this regard; the kIRG_JSource has similar occurrences, because there is a significant overlap between the JIS X 0212 and JIS X 0213 standards.

Who is online

Users browsing this forum: No registered users and 1 guest

Quick-mod tools:

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum