Network Working Group Y. YONEYA
Internet-Draft JPRS
Intended status: Informational T. Nemoto
Expires: February 08, 2014 Keio University
August 07, 2013
Mapping characters for PRECIS classesdraft-ietf-precis-mappings-03
Abstract
The framework for preparation and comparison of internationalized
strings ("PRECIS") defines several classes of strings for preparation
and comparison. In the framework, case mapping is defined because
many protocols handle case-sensitive or case-insensitive string
comparison and therefore preparation of the string is mandatory. As
described in the mapping for Internationalized Domain Names in
Applications (IDNA) and the PRECIS problem statement, mappings for
internationalized strings are not limited to case, but also width
mapping and mapping of delimiters and other specials can be taken
into consideration. This document provides guidelines for authors of
protocol profiles of the PRECIS framework and describes several
mappings that can be applied between receiving user input and passing
permitted code points to internationalized protocols. The mappings
described here are expected to be applied as Additional mapping in the
PRECIS framework.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 08, 2014.
Copyright Notice
YONEYA & Nemoto Expires February 08, 2014 [Page 1]

Internet-Draft precis mapping August 2013
other than case and width is also important to increase chance of
strings match as users expect. This document provides guidelines for
authors of protocol profiles of the PRECIS framework and describes
mappings that can be applied between receiving user input and passing
permitted code points to internationalized protocols. The mappings
described in this document are expected to be applied as Additional
mapping in the PRECIS framework.
2. Protocol dependent mappings
The PRECIS framework defines several protocol-independent mappings.
The additional mappings defined in this document are protocol-
dependent, i.e., they depend on the rules for a particular
application protocol.
2.1. Delimiter mapping
Some application protocols define delimiters for use in such
protocols, but the delimiters are different for each protocols.
Therefore, the delimiter mapping table should be based on a well-
defined mapping table for each protocol.
Delimiter mapping is supposed to map delimiter characters that have
compatible characters to canonical characters. For example, '@' in
mail address or ':' and '/' in URI has width compatible character.
And '+', '-', '<' and '>' may be such character. Another example is
the FULL STOP character (U+002E) which is a delimiter in the visual
presentation of domain names. Some IMEs generate semantic or width
compatible character of FULL STOP such as IDEOGRAPHIC FULL STOP
(U+3002) when a user types FULL STOP on the keyboard. Such FULL STOP
compatible characters need to be mapped to the FULL STOP before
passing the string to the protocol.
2.2. Special mapping
Aside from delimiter characters, certain protocols have characters
which need to be mapped in ways that are different from the rules
specified in the PRECIS framework (e.g., mapping non-ASCII space
characters to ASCII space). In this document, these mappings are
called "special mappings". They are different for each protocol.
Therefore, the special mapping table should be based on a well-
defined mapping table for each protocol. Examples of special mapping
are the following;
o White spaces are mapped to SPACE (U+0020)
o Some characters such as control characters are mapped to nothing
(Deletion)
YONEYA & Nemoto Expires February 08, 2014 [Page 3]

Internet-Draft precis mapping August 2013
As examples, EAP [RFC3748], SASLprep [RFC4013], IMAP4 ACL [RFC4314]
and LDAPprep [RFC4518] define the rule that some codepoints for non-
ASCII space are mapped to SPACE (U+0020).
2.3. Local case mapping
Local case mapping is case folding that depends on language and
context. For example, the mapping of LATIN CAPITAL LETTER I (U+0049)
depends on the language context of the user: if the language is
Turkish (or one of several other languages), the character should be
mapped into LATIN SMALL LETTER DOTLESS I (U+0131) as this character's
lower case equivalent.
To solve such problems for PRECIS framework, this document defines
characters that need local case mapping based on the
Specialcasing.txt [Specialcasing] file in section 3.13 of The Unicode
Standard [Unicode]. Local case mapping targets only characters that
get two different results to perform just casefolding that is defined
in the Casefolding.txt [Casefolding] and perform special casefolding
that is defined in the Specialcasing.txt then casefolding, because
PRECIS framework have casefolding.
There are two types casefoldings defined as Unconditional Mappings
and Conditional Mappings in the Specialcasing.txt file. Conditional
mappings have Language-Insensitive Mappings that target characters
whose full case mappings do not depend on language, but do depend on
context. Language-Sensitive Mappings that these are characters whose
full case mappings depend on language and perhaps also context.
Of these mappings, characters with Unconditional Mappings or with
Language-Insensitive Mappings in Conditional Mappings target are
mapped into same codepoint(s) with just casefolding or special
casefolding then casefolding. But characters with Language-Sensitive
Mappings in Conditional Mappings targets are mapped into different
codepoints. Therefore this document defines characters that are a
part of characters of Lithuanian(lt), Turkish(tr) and
Azerbaijanian(az) that Language-Sensitive Mappings targets as targets
for local case mapping.
The following are the methods to calculate codepoints that local case
mapping targets. Here Casefolding() means casefolding described in
the Casefolding.txt file [Casefolding] and Specialcasing() means
specialcasing described in the Specialcasing.txt file
[Specialcasing].
If Casefolding(Specialcasing(cp)) != Casefolding(cp)
Then cp is a target
Else cp is not a target;
YONEYA & Nemoto Expires February 08, 2014 [Page 4]

Internet-Draft precis mapping August 2013
Application developers should calculate codepoints that local case
mapping targets by using the latest Casefolding.txt and
Specialcasing.txt. Appendix B "Code points list for local case
mapping" lists codepoints in Unicode 6.2 calculated by this method.
3. Order of operations
The mappings described in this document are expected to be applied as
Additional mapping in the PRECIS framework. Basically, the mappings
described in this document describes could be applied in any order.
However, this section specifies a particular order to minimize the
effect of codepoint changes introduced by the mappings. This mapping
order is very general and was designed to be acceptable to the widest
user community.
1. Delimiter mapping
2. Special mapping
3. Local case mapping
4. Security Considerations
As well as Mapping Characters for IDNA2008 [RFC5895], this document
suggests creating mappings that might cause confusion for some users
while alleviating confusion in other users. Such confusion is not
covered in any depth in this document.
5. IANA Considerations
This document has no actions for the IANA.
6. Acknowledgment
Martin Duerst suggested a need for the case folding about the mapping
(map final sigma to sigma, German sz to ss,.).
Joe Hildebrand, John Klensin, Marc Blanchet, Pete Resnick and Peter
Saint-Andre, et al. gave important suggestion for this document
during at WG meeting.
7. References
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
Internationalized Strings ("stringprep")", RFC 3454,
December 2002.
YONEYA & Nemoto Expires February 08, 2014 [Page 5]