At Mon, 22 Oct 2001 00:11:19 -0700,
Yves Arrouye <yves@realnames.com> wrote:
> Isn't ISO-8859-1 actually the one that has "holes" in C0/C1 that exhibit
> this very behavior?
There is no hole in ISO-8859-1 <-> Unicode mapping table provided by
unicode.org (see
http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT). C0/C1
characters are mapped to C0/C1 characters. No undefined characters in
the table. And I believe that Java (at least Sun's implementation)
uses the same table.
> I thought that was the case, and windows-1252 was the
> one that used C1 for platform-specific character (see
> http://www-124.ibm.com/cvs/icu/charset/data/xml/windows-1252-2000.xml?rev=1.
> 1&content-type=text/x-cvsweb-markup where apparently U+0081 is mapped to
> 0x81 in windows-1252).
Is it data for ICU4C? Interesting that it doesn't agree with the table
by unicode.org (see
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT).
Again Sun's java seems to use the above table. You can see it by
running a program below.
public class CharConversionTest
{
static public void main(String[] args)
throws Exception
{
byte[] str = new byte[256];
for(int i = 0; i < str.length; i++)
{
str[i] = (byte)i;
}
String converted = new String(str, "Cp1252");
for(int i = 0; i < converted.length(); i++)
{
System.out.println("0x" + Integer.toHexString(i) + " -> U+"
+ Integer.toHexString(converted.charAt(i)));
}
}
}
-------------------
Shigemichi Yazawa
yazawa@globalsight.com