This is a patch to update the Unicode database. It's mostly the imported
data, but there were two code changes:
- 5.1 changes the "mirrored" property for a character (U+0F3A), and the
delta-to-3.2 code did not support that. I added a field into
hange_record to support that kind of change.
- 5.1 also added a character (U+1d79) whose upper-case version is far
off (U+A77D), triggering a complaint that the delta can't be represented
in 16 bits. I fixed that adding a flag into the ctype record indicating
that deltas aren't used for that record.
Fredrik, can you please review these changes?

The patch looks fine to me (assuming that I didn't miss something
critical hidden among the large table diffs).
(I'd probably named the "NODELTA" flag after what it is rather than what
it isn't, but I cannot think of a short replacement right now, so let's
leave it as it is.)

I have now committed the change as r66362 (including the missing
documentation updates), and ported it to 3.0 as r66363 (where I had to
change the flag value and regenerate the data, as the flag 0x100 was
already taken).

2008/9/10 Martin v. Löwis <report@bugs.python.org>:
> I have now committed the change as r66362 (including the missing
> documentation updates), and ported it to 3.0 as r66363 (where I had to
> change the flag value and regenerate the data, as the flag 0x100 was
> already taken).
That's unfortunate -- perhaps the 2.6 flag and data can be brought in line,
to make future merges easier?

> That's unfortunate -- perhaps the 2.6 flag and data can be brought in
> line, to make future merges easier?
I thought of that, however, merging the databases themselves would still
not be possible: the 3.0 database has the flags set in many records,
which causes merge conflicts (as the 2.x database has different flag
values). So regenerating the database is necessary, anyway.
In future changes, it might be useful to have new flags with the same
values, so that such patches can be merged without conflicts in the
generator.

Code point 0x0370 is now a printable character.
r66381 corrected the failures by simply changing it to 0x0378, until the
next unicodedata upgrade...
I wonder if there is a value that is guaranteed to stay non-printable.

2008/9/10 Amaury Forgeot d'Arc <report@bugs.python.org>:
> Code point 0x0370 is now a printable character.
> r66381 corrected the failures by simply changing it to 0x0378, until the
> next unicodedata upgrade...
> I wonder if there is a value that is guaranteed to stay non-printable.
The control characters?

> The control characters?
Indeed, also the private-use characters. test_unicode explicitly
comments that the test is about unassigned characters, although
I don't understand the purpose of that test (it then also tests
a surrogate character, which is also guaranteed to remain
unprintable).
One of the characters that is guaranteed to remain unassigned is
U+FFFE (and its mirrors in other planes, e.g. U+1FFFE, ...).
This guarantee is made to support the BOM. Along with U+FFFF,
these are non-characters. #765036 once suggested that Python should
refuse to represent them at all, but that proposal was rejected.