Re: [Jython-users] Converting between a byte string and a text
string?

2011/4/17 Alex Grönholm <alex.gronholm@...>:
> 18.04.2011 03:22, Dan Stromberg kirjoitti:
>> How does one, in Jython 2.5.2, convert from a byte string to a text
>> string, and vice versa?
> The same way as you do in all other Pythons: 'blah'.decode(encoding).
I'm writing a deduplicating backup program that works well so far on CPython
2.x, CPython 3.x, PyPy 1.4.1 and recent PyPy trunk builds, but it tracebacks
on Jython 2.5.2 with something that felt related to string semantics.
However, it appears to really be an issue of what type is returned by
open(fn, 'r').read(length) and os.read(os.open(fn, O_RDONLY), length).
I made some progress by adding 'b' to my python open()'s, but when reading
using os.read(), how does one convince jython to return a str instead of a
unicode type? It seems to mostly return a unicode object, but sometimes to
return a str object - from the same open. Jython on Linux doesn't appear to
have an os.O_BINARY.
I've been using os.open+os.read, because it appears to return bytes on both
CPython 2.x (including PyPy) and CPython 3.x, but that doesn't appear to be
the case in Jython 2.5.2.
>> I'd like to support Jython in my opensource python2x3 module, but
>> Jython's string handling seems different enough from that of other
>> Pythons that I'm not clear on how to do so. I found an article saying
>> that if you do a binary read in Jython, you'll get a binary str that
>> just keeps the high bytes zeroed
> Link? Sounds a little odd.
Finding the original link I read is proving somewhat time consuming, but
here's something a bit similar that sounds more promising than what I read
before. Apparently str behavior changed in jython 2.5, so perhaps the
original link I read was out of date:
http://jythonpodcast.hostjava.net/jythonbook/chapter2.html
Prior to the 2.5.0 release of Jython, there was only one string type. The
string type in Jython supported full two-byte Unicode characters and all
functions contained in the string module are Unicode-aware. If the u’’
string modifier was specified, it was ignored by Jython. Since the release
of 2.5.0, strings in Jython are treated just like those in CPython, so the
same rules will apply to both implementations. It is also worth noting that
Jython uses character properties from the Java platform. Therefore
properties such as isupper and islower, which we will discuss later in the
section, are based upon the Java properties.
>> , but I didn't notice anything about
>> converting from one (always zero high bytes to nonzero high bytes, for
>> EG) to the other.
>>
>> Python2x3's at http://stromberg.dnsalias.org/svn/python2x3/trunk - and
>> I'm including a copy at the bottom of this message.
> The worst problem in writing cross-version code is entering unicode/byte
> literals.
> Does Python2x3 solve this somehow?
python2x3.string_to_binary() addresses this to some extent. You give it a
str literal (or other str), and it converts it to bytes on 3.x (assuming
latin-1), and leaves it as str on 2.x. It's more typing than adding a b
prefix, but it seems to work fine.