> http://msdn.microsoft.com/en-us/library/system.text.encoding(VS.80).aspx
>
> Looks like we could add a few more aliases for other encodings as well.
I wouldn't trust this table. Microsoft is on record of implementing the
code pages with slight variations compared to other references for some
encodings (in particular the Asian ones). So unless there is an actual
documented need for a certain alias (and preferably a demonstration that
Microsoft's interpretation of the code page is the same as Python's),
I would advise against adding such aliases.

Martin v. Löwis wrote:
>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
>> http://msdn.microsoft.com/en-us/library/system.text.encoding(VS.80).aspx
>>
>> Looks like we could add a few more aliases for other encodings as well.
>
> I wouldn't trust this table. Microsoft is on record of implementing the
> code pages with slight variations compared to other references for some
> encodings (in particular the Asian ones). So unless there is an actual
> documented need for a certain alias (and preferably a demonstration that
> Microsoft's interpretation of the code page is the same as Python's),
> I would advise against adding such aliases.
Fair enough.
Could someone with some IronPython/.NET foo check whether the
code pages are the same as the Python codecs ?
The above page has some sample code to get started and IronPython
provides easy access to both the .NET codecs and the Python ones.
Thanks,
--
Marc-Andre Lemburg
eGenix.com
________________________________________________________________________
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/

This report is really about the issues reported in #1602 and #7441, i.e.
where console output fails if the terminal encoding is 65001. Rather
than adding the alias, I would prefer to find out why terminal output
fails in that code page.

re Martin's question, I can offer the indirect wisdom of Michael Kaplan
in this blog post:
http://blogs.msdn.com/michkap/archive/2008/03/18/8306597.aspx
where he mentions that the easiest way to output unicode text in the
Windows console, is:
int main(void) {
_setmode(_fileno(stdout), _O_U16TEXT);
wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n");
return 0;
}
_setmode being the special call needed.
I haven't tested with any _O_U8TEXT (if such a thing exists), I don't do
Windows anymore, therefore I can't provide a patch.
It also seems that Python —when stdin/stdout/stderr is under control of
a Windows console— doesn't use plain *printf functions. The example code
I offered in one of the other issues (dumb stdout doing plain .write as
UTF-8) runs and displays fine.

What we could do is add new codecs based on the .NET tables for cp65000 et al.
However, before doing this, I'd like to know where these code page settings can occur and what exact names are used for them. If they only appear in .NET and IronPython, I don't think it's worth adding extra codecs for the MS UTF variants.

Oops, false alarm. python -c "import os; print repr(os.getcwdu())" works as expected, so the exception is part of issue 1602.
(My command about there being no need to distinguish this codepage from UTF-8 stands.)

Different tests proved that cp65001 can *not* be set as an alias to utf-8, and that's why I'm closing this issue.
Anyway, I don't think that cp65001 is configured by default on any Windows setup. It is only set by the user, using the chcp command, to try to display unicode characters in the Windows console: but it is not possible to display any unicode character in this console (see issue #1602). And chcp command should not be used in the Windows console because it does not only change the ANSI code page: it changes also the console code page, which is wrong (the console still expect text encoded to the previous code page).
It is possible to implement a codec for cp65001 using utf-8 existing codec in surrogatepass mode, or by using MultiByteToWideChar() / WideCharToMultiByte() with codepage=CP_UTF8. But I don't think that we need cp65001 at all.
If you need cp65001 for a good reason and you would like to implement a cp65001 Python codec, open a new issue.
If you consider that we should use _O_U8TEXT or _O_U16TEXT, open another new issue.
_O_U8TEXT or _O_U16TEXT might improve unicode support if Python output is redirected to a pipe, but I don't think that it would help to display unicode character in the Windows console. I also fear that it breaks existing code and any function not aware of this special mode.