On 5/25/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> a default character set that is allowed. The only character set that
> makes sense as a default, ignoring previously-existing environment
> variables (which don't necessarily help us), is ascii.
This is ignoring the movement in the last 5-10 years that happened in
both the operating systems, filesystems and even language space.
Now, the "standard" allowed charset in all of the above environments
is Unicode.
> Why? Primarily because ascii identifiers are what are allowed today,
> and have been allowed for 15 years. But there is this secondary data
And guess what, they will still be allowed tomorrow... (tongue-in-cheek)
If you look at the typical use case for programs written in python
(usually also in rough order of experience)
A) directly in interpreter (i love that)
B) small-ish one-off scripts
C) middle size scripts
D) multi-module programs made by a single person
E) large-ish programs made by a group of people
Out of these, really only people belonging to category E) are
expressing an opinion that identifiers should stay ASCII forever.
Those should be the same people who have a strong source code
compliance policy, unit test, lint-izatoin etc...
Unicode support out of the box without constraint strongly benefits
category A-D. (just for the funny story, I was asking the opinion of
my colleague this morning who is a beginner in Visual Basic.NET about
Japanese identifiers, and he was shocked to hear that Python does not
accept Japanese identifiers today out of the box... VB.NET apparently
does and entry level programmers here DO (ab?)use this). Unicode is an
accepted norm isn't it? (even if some extremists in Japan long argue
of the superiority of the local encoding over Unicode but apart on 2ch
this is an old story now)
I think Martin's and my point is that to get people to level E) there
is no reason to put any charset restriction on level A ->D. And when
you are at level E), it is difficult to argue that making a one-time
test at source code checkin time is a bad practice.
Regards,
Guillaume