L2/10-311
From: Mark Davis
Date: Tue, Jul 27, 2010 at 21:27
Subject: Extension of UTS#46 data
Please add the following to the agenda and document registry.
=====
The tables for UTS #46 are generated using the UseSTD3ASCIIRules=ON. Among other things, that means that full-width variants of ASCII characters are disallowed. For examples, look at most of the "disallowed" values in:
http://unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
As it turns out, quite a number of applications need to support IDNA with UseSTD3ASCIIRules=OFF (that is, behaving like IDNA2003 with that option OFF). This requires an enhancement of the UTS #46 tables to allow for that behavior.
This document proposes to make the change below in the 6.0 data tables (with some corresponding changes in the 6.0 text), and also to make a data file for 5.2 available.
========
Proposal
In the data file for UTS #46, replace those disallowed invalid items by two new different types: valid-NSTD3, and mapped-NSTD3. This would only affect characters currently marked as disallowed. Any application that wanted the current behavior would simply treat (or map) valid-NSTD3 and mapped-NSTD3 to disallowed.
The first value would indicate that the character would be disallowed under STD3 and otherwise valid.
002C ; disallowed # COMMA
=>
002C ; valid-NSTD3 # COMMA
The second would indicate that the character would be disallowed under STD3, but otherwise mapped:
00A0 ; disallowed # NO-BREAK SPACE
=>
00A0 ; mapped-NSTD3 ; 0020 # NO-BREAK SPACE
Of course, some characters would stay disallowed.
0378 ; disallowed #
In particular, characters whose whose decompositions include a period but are not label separators in IDNA2003 would remain disallowed.
FE52 ; disallowed-NSTD3 # SMALL FULL STOP