Update The Reference To The Unicode Standard

Changelog

r1 - 2018-06-07

Do not remove the reference to ISO/IEC 10646-1:1993,
as it should remain for D.18 to make sense. Remove Fallback Reference section,
as it no longer applies. Add Core discussion of not using Unicode Reference
until such an algorithm outside of 10646 is proposed.

Abstract

The reference to ISO/IEC 10646 in the C++ Standard should be
updated to the stable base standard or any successor standard.

References

Preferred New Reference

The Unicode Consortium, the entity responsible for the Unicode standard, documents the preferred citations for the Unicode Standard. The current standard is version 11.0. While we believe the existing reference should be changed to:

The Unicode Standard, Version 11.0 or later

For existing purposes, the C++ Standard is only concerned with character
codes and encoding forms. To standardise any Unicode text
processing, the algorithms and character data will need to be
referenced. We initially believed that we might as well add such a
reference now. However, we have decided to only focus on updating
the ISO/IEC 10646 reference.

Immediate Effects

The ISO/IEC 10646 Unicode Standard that the C++ Standard refers to predates UTF-16 and
UTF-32, instead defining UCS2 and UCS4. Moving to a newer standard
would make the former terms well defined in the C++ Standard. It has
been argued that the ECMAScript standard referred to uses a newer
Unicode standard, in which those terms are defined, so those terms are
defined for the C++ Standard by transitive reference. If that argument
is accepted, then moving to the newer version makes the intent explicit.

In addition, in 1996, as part of amendments 5, 6 and 7, the original set
of Hangul characters were removed and added at a new location, as well
as Tibetan characters added again. This places the current citation in
the standard of "ISO/IEC 10646-1:1993" in conflict with the version
imported by way of the ECMAScript standard. In practice, all
implementors adopt the later version for conversion operations.

Keeping with the discussion with Core, an undated Unicode reference will
only be introduced at the time when a paper actually introducing those
algorithms is proposed. This paper will focus on fixing the
ISO/IEC 10646 reference.

UCS2 and UCS4 in codecvt facets

The last proposal to update the Unicode Standard reference, P0417R1, was
entangled with deprecation of UCS2 and UCS4. The remaining references
are in the now deprecated codecvt facets [depr.locale.stdcvt.req]. There
is resistance to changing those to UTF-16 and UTF-32, since,
particularly for UCS2, there are real changes in behavior. UTF-32 can be
viewed as UCS4. UTF-16 can not be similarly viewed as UCS2. Since there
may be users of the facility depending on the behavior as it was when
standardized this paper does not propose changing them, but instead
leaves a normative reference to the old ISO/IEC 10646-1:1993 standard
that is only used for those facilities.

Keeping from discussion with Core, we keep a normative, dated reference
to ISO/IEC 10646-1:1993 and then have an unqualified reference to
ISO/IEC 10646 in general to specify the latest. ISO/IEC 10646 is a
well-behaved standard that will not break the standard upon update. It
is also impossible to observe the difference between UCS4 and UTF-32
for any C++ implementation, therefore the references to UCS4 have
been updated to UTF-32, while UCS2 has been left in place due to
being semantically and observably different from UTF-16.

__STDC_ISO_10646__ macro

The macro __STDC_ISO_10646__ in [cpp.predefined] can be
left unchanged. The ISO/IEC 10646 version will be the latest version.

Proposed Changes

Add the wording high-lighted in
green. Remove the wording high-lighted in red.