Philip Jenvey wrote:
> On Feb 23, 2010, at 3:49 PM, Chris Clark wrote:
>
>
>> ....
>> C:\jython2.5.1\Lib\encodings>C:\jython2.5.1\jython
>> Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)
>> [Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02
>> Type "help", "copyright", "credits" or "license" for more information.
>>
>>>>> import shift_jis
>>>>>
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> File "shift_jis.py", line 7, in <module>
>> import _codecs_jp, codecs
>> ImportError: No module named _codecs_jp
>>
>>
> #1066 is the main bug for this issue -- we just currently lack support for the asian codecs like shiftjis. The ImportError in sample #2 is a symptom of that. The same ImportError happens when you attempt to use the codec but it's masked as a LookupError.
>
> Supporting these via the JVM's nio codecs is definitely doable but nobody's gotten around to it yet.
>
Thanks for the heads up.
Is http://java.sun.com/j2se/1.4.2/docs/guide/nio/ the package you are
referring to? I'm not a big Java guy but I may start hacking on a Python
layer on top of this as an experiment/proof-of-concept. Presumably
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/package-summary.html
is what needs wrapping?
Chris

Chris Clark wrote:
> Philip Jenvey wrote:
>
>> #1066 is the main bug for this issue -- we just currently lack support for the asian codecs like shiftjis. The ImportError in sample #2 is a symptom of that. The same ImportError happens when you attempt to use the codec but it's masked as a LookupError.
>>
>> Supporting these via the JVM's nio codecs is definitely doable but nobody's gotten around to it yet.
>>
>>
>
> Is http://java.sun.com/j2se/1.4.2/docs/guide/nio/ the package you are
> referring to? I'm not a big Java guy but I may start hacking on a Python
> layer on top of this as an experiment/proof-of-concept. Presumably
> http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/package-summary.html
> is what needs wrapping?
>
I had some time this afternoon whilst waiting for some builds to
complete... So I started experimenting on using nio from Python along
with a quick attempt at a shift_jis
I'm seeking feedback on a very INCOMPLETE demo that is attached. Sample
session:
C:\users\clach04\python\jython_character_encoding>c:\jython2.5.1\jython.bat
Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02
Type "help", "copyright", "credits" or "license" for more information.
>>> x=''
>>> x.decode('shift_jis') # at this point there is a shift_jis.py in curdir
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: unknown encoding 'shift_jis'
>>> import shift_jis # register the local module/encoding
>>> x.decode('shift_jis')
u''
>>>
There is no support for errors (or less strict conversion options),
there are imports in the middle of the script and you have to import the
encoding you need (and right now there is only one but it is easy to do
multiple with a template). I'm beginning to wonder if it would simply be
cleaner to use the CPython gencodec.py script and generate input to it
by using the CPython encodings. I've done this for some Windows (single
byte) encodings that are not supported by Python by auto-generating
tables from Windows codepages like cp708. The tables would be pretty big
though :-)
I'm really looking for "yes nio from Python approach is worth pursuing"
or "this is stupid, you should stop now" comments. I'm pretty sure
performance wise this approach is not a good idea but it is infinitely
faster than "doesn't work at all" :-)
Here is a slightly more real example:
C:\users\clach04\python\jython_character_encoding>c:\jython2.5.1\jython.bat
Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02
Type "help", "copyright", "credits" or "license" for more information.
>>> import shift_jis # register the local module/encoding
>>> x = u"\u3042" # '3042 HIRAGANA LETTER A'
>>> x.encode('shift_jis')
'\x82\xa0'
>>> # hey! Looks like it matches
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=82&s=ALL#layout
Finally, does anyone know how IronPython handles CJK (or do they simply
make use of .NET strings)?
Chris