> - PKG-INFO (METADATA in distutil2), that already uses a trick to support
> Unicode, but your change would replace it in a better way;
Which "trick"?
> - MANIFEST, which with your fix would gain the ability to handle non-ASCII
> paths, which is a feature or a bugfix depending on your point of view;
Wait. Non encodable bytes is a separated issue. I would like to work on the
first problem: distutils in Python3 uses open() without encoding argument and
so the encoding depends on the user's locale. Said differently: if you produce
a file with distutils on a computer, you cannot be sure that the file can be
read with the same version of Python on other computer (if the locale encoding
is different). Eg. Windows uses mbcs encoding whereas utf-8 is the preferred
encoding on Linux.
What is the encoding of the MANIFEST file?
> - .def files, used by the compilers for the C linking step; I don’t know if
> it’s appropriate to allow UTF-8 there.
I don't know these files.
> - RPM spec files, which use ASCII or UTF-8 according to
> http://en.opensuse.org/openSUSE:Specfile_guidelines#Specfile_Encoding but
> it’s not confirmed in
> http://www.rpm.org/max-rpm/s1-rpm-build-creating-spec-file.html (linked
> from the LSB site), so there’s no guarantee this works for all RPM
> platforms. This sort of platform-specific thing is the reason why RPM
> support has been removed in distutils2.
UTF-8 is a superset of ASCII. If you use utf-8 but only write ascii
characters, your output file will be written to utf-8... but it will be also
encoded to ascii. It's magical :-)
> - record and .pth files created by the install command.
.pth contain directory names which can be non-ASCII.
> I agree that there is something to be fixed, but I don’t know if they can
> be fixed in distutils. Unicode in PKG-INFO is unrelated to files, whereas
> there are files or directories in MANIFEST, spec, record and .pth.
You can use non-ASCII characters for other topics than filenames. Eg. in a
description of a package :-)
> If this is going to be fixed, write_file should not use UTF-8 unconditionally
> but grow a keyword argument IMO, so that use cases requiring ASCII
> continue to work.
As written before, UTF-8 is a superset of ASCII. If you read a file using utf-8
encoding, you will be able to read ascii files. But if you use utf-8 and write
non-ascii characters, old version of distutils using ascii or other encoding
will not be able to read these files.
Anyway, I think that in most cases, all files only contain ASCII text. So it
doesn't really matter.
About the keyword solution: yes, it would be a smooth way to fix this issue.
> When you say “patch *all* functions reading files”, I guess you mean all
> functions that read distutils files, i.e. MANIFEST and PKG-INFO.
I don't know distutils to answer to my own question.