Python Programming Language

non standard path characters

A kind user reports having problems running the reportlab tests because his path has non-ascii characters in it eg

.....\Mes documents\Mes Tlchargements\Firefox\...

somewhere in the tests we look at the path and then try and convert to utf8 for display in pdf.

Is there a standard way to do these path string conversions?

Paths appear to come from all sorts of places and given the increasing use of zip file packaging it doesn't seem appropriate to rely on the current platform as a single choice for the default encoding. -- Robin Becker

I thing you should change the code page before to run the test, doing something like :

c:\> chcp 850 c:\> ....\python.exe ......\test.py

look for the good code page for you, maybe 850, 437 or 1230 or 1250 should work

> A kind user reports having problems running the reportlab tests because his path > has non-ascii characters in it eg

> .....\Mes documents\Mes Tlchargements\Firefox\...

> somewhere in the tests we look at the path and then try and convert to utf8 for > display in pdf.

> Is there a standard way to do these path string conversions?

> Paths appear to come from all sorts of places and given the increasing use of > zip file packaging it doesn't seem appropriate to rely on the current platform > as a single choice for the default encoding. > -- > Robin Becker

Robin Becker wrote: > A kind user reports having problems running the reportlab tests because > his path has non-ascii characters in it eg

> .....\Mes documents\Mes Tlchargements\Firefox\...

> somewhere in the tests we look at the path and then try and convert to > utf8 for display in pdf.

> Is there a standard way to do these path string conversions?

> Paths appear to come from all sorts of places and given the increasing use > of zip file packaging it doesn't seem appropriate to rely on the current > platform as a single choice for the default encoding.

Zip files contain a bit flag for the character encoding (cp430 or utf-8), see the ZipInfo object in module zipfile and the link (on that page) to the file format description. But I think some zip programs just put the path in the zipfile, encoded in the local code page, in which case you have no way of knowing.

--

Regards, Tijs

Tijs wrote: > Robin Becker wrote: ....... > Zip files contain a bit flag for the character encoding (cp430 or utf-8), > see the ZipInfo object in module zipfile and the link (on that page) to the > file format description. > But I think some zip programs just put the path in the zipfile, encoded in > the local code page, in which case you have no way of knowing.

thanks for that. I guess the problem is that when a path is obtained from such an object the code that gets the path usually has no way of knowing what the intended use is. That makes storage as simple bytes hard. I guess the correct way is to always convert to a standard (say utf8) and then always know the required encoding when the thing is to be used. -- Robin Becker

> thanks for that. I guess the problem is that when a path is obtained > from such an object the code that gets the path usually has no way of > knowing what the intended use is. That makes storage as simple bytes > hard. I guess the correct way is to always convert to a standard (say > utf8) and then always know the required encoding when the thing is to be > used.

Inside the program itself, the best things is to represent path names as Unicode strings as early as possible; later, information about the original encoding may be lost.

If you obtain path names from the os module, pass Unicode strings to listdir in order to get back Unicode strings. If they come from environment variables or command line arguments, use locale.getpreferredencoding() to find out what the encoding should be.

If they come from a zip file, Tijs already explained what the encoding is.

Always expect encoding errors; if they occur, chose to either skip the file name, or report an error to the user. Notice that listdir may return a byte string if decoding fails (this may only happen on Unix).