Are there any browser or unicode display implementation that support Unicode Ideographic Description Characters in text format? I know there's one to create new Han characters saved in PNG format at http://kenliu.name/cgi/hanziimage.cgi

Also if I come across scan of old classical Chinese books that make use of Han characters not found in the Unihan repository, how do we go about proposing to add them into Unicode which will enable digitization of such books? Or how do we use Unicode Ideographic Description Characters to digitize it?

Ideographic description characters are not for creating novel characters on the fly, they are for describing novel characters. The Ideographic description characters are graphic characters, like a Latin letter, and cannot be interpreted by a Unicode compliant application as a control character.

How do we submit proposal to expand the scope of Ideographic Description Characters to be control characters as well? Are there any Unicode characters that already set precedent of being both a graphic and control character, and is there any design limitation for such proposal or would it make more sense that a different code point represent control characters for Ideographic Description? I think this would provide a good framework to allow web browsers and software for implementing dynamic display of rare or private characters that are not already in Unihan, and may help decrease the need for ever expanding character set of Unicode.

The ideographic description characters will never be interpreted as control characters. To change them to control characters would do the following things: it would contradict the semantics on which the characters were encoded; it would contradict the semantics on which the characters have been used and would corrupt any files already using the characters; interpreting any characters as control characters allowing CJK ideographs on the fly would introduce multiple encodings - for some characters, hundreds of representations - for those characters and lead to unprecedented and unmitigatable security problems (spoofing) and a full-on meltdown of the ideograph encoding model.

How does your semantic and security concerns apply to the way Ideographic Description Characters is being used to control the graphic rendering of CJK characters into new characters at http://kenliu.name/cgi/hanziimage.cgi ? Is this script violating the intended use of Ideographic Description Characters?

Also have anyone attempted to take the unihan database and provide all possible Ideographic Description Character sequencing for each existing precomposed unihan character? This may be useful for software who wish to provide a choice list of any existing precomposed unihan character to the end user when they enter a Ideographic Description Character sequence to a software as an alternate input method.

Thanks for referencing Unicode 6.2 section 12.2. I found this paragraph in there:

Quote:

Rendering. Ideographic Description characters are visible characters and are not to be treated as control characters. Thus the sequence U+2FF1 U+4E95 U+86D9 must have a distinct appearance from U+4E95 U+86D9.

An implementation may render a valid Ideographic Description Sequence either by rendering the individual characters separately or by parsing the Ideographic Description Sequence and drawing the ideograph so described. In the latter case, the Ideographic Description Sequence should be treated as a ligature of the individual characters for purposes of hit testing, cursor movement, and other user interface operations. (See Section 5.11, Editing and Selection.)

Based on above it looks like software can choose to implement the display of Ideographic Description Sequence but because Ideographic Description characters are visible characters, does that mean in above example "U+2FF1" must be displayed visibly in addition to drawing the ideograph parsed from U+4E95 U+86D9?

For example, if a browser wants to draw the ideograph from an Ideographic Description Sequence, can the U+2FF1 character be hidden from view when rendering, but only be visible when viewing the web page source, similar to how HTML code is treated?

The intent of the language is that the IDCs must have some impact on the visual representation of the text. The easiest way to do this is to display them as individual characters, which is what pretty much everybody does. However, if a rendering engine uses them to actually build the represented character, that's OK, too. In that case, you don't have to explicitly display them; they're being displayed implicitly.

If you have characters from old documents which are still unencoded, the best thing to do is to use the contact form (http://www.unicode.org/reporting.html). You can also fill in a formal proposal, but since CJKV ideographs are handled differently from other characters, some of the information there isn't relevant. In any event, you should be prepared to provide scans of the characters in use together with complete bibliographic information. You may be contacted via email to work out the details. If the characters you want are variants of encoded characters, it would be better (and faster) to represent them via IVSs. See http://www.unicode.org/reports/tr37/ for details.

Ultimately, characters meeting with UTC approval will be added to UAX #45 (http://www.unicode.org/reports/tr45/) and submitted to the IRG for processing. The window has closed on Extension F proposals, so they would have to wait for work to begin on Extension G.

When will work on Extension G begin? Is it possible to get some provisional codepoints for the characters if we draft up a formal proposal with all supporting details. Or would we have to put them in the PUA for now.

This (parseIDS.html) is a small JavaScript tool for offline environment.If you have sufficient network bandwidth to Japan and don't care the web pages without English instruction, http://www.chise.org/ids-find would be better (parseIDS.html searches only the component ideographs and ignores the structure described by IDC). However, I'm not sure if CHISE uses ids.txt that IRG receives from the maintainer (Kawabata).

Who is online

Users browsing this forum: No registered users and 0 guests

Quick-mod tools:

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum