]>
Language Tags and Locale Identifiers for the
World Wide Weblangtags&status;&DD;&MM;&year;http://www.w3.org/TR/2006/WD-ltli-20060612/XMLhttp://www.w3.org/TR/ltli/http://www.w3.org/TR/2006/WD-ltli-20060419/Felix SasakiW3C

Based on and , this document
describes mechanisms for identifying or selecting the language of
content or locale preferences used to process information using Web
technologies. It
describes how document formats, specifications, and implementations
should handle language tags, as well as data
structures that extend these tags to describe international preferences.

This section describes the status of this document at the time
of its publication. Other documents may supersede this document.
A list of current W3C publications and the latest revision of
this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is an updated Public Working Draft of "Language and Locale Identifiers for the World Wide Web (LTLI)".

This document
describes mechanisms for identifying or selecting the language of
content or locale preferences used to process information using Web
technologies. It
describes how document formats, specifications, and implementations
should handle language tags, as well as data
structures that extend these tags to describe international preferences.

This document was developed by the
Internationalization Core Working Group, part of the W3C Internationalization Activity. The Working Group expects to advance this Working Draft to Recommendation Status. A complete list of changes to this document is available.

Send your comments to www-i18n-comments@w3.org. Use "[Comments on ltli WD]" in the subject line of your email, followed by a brief subject. The archives for this list are publicly available.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

en

This is the first version of this document.

Introduction

This section is informative.

Scope of this Specification

This document
describes mechanisms for identifying or selecting the language of
content or locale preferences used to process information using Web
technologies. It
describes how document formats, specifications, and implementations
should handle the language tags described by , as well as data
structures that extend these tags to describe international preferences
(see
sec. 3.1 in ).

Identification of language and locale has a broad range of applications within the World Wide Web. Existing standards which make use of language identification includes the xml:lang attribute in , the lang and hreflang atttributes in , or the language property in . Locale identification is used for example within the CLDR project, cf. .

The current best practice when developing specifications for language
identification is to refer to , using a formulation like RFC 3066 or its successor. Recently a successor for has been developed,
called . This specification takes as the basis
for language identification, and as the basis for
matching of language identifiers ("tags").

The current practice in many standards is to identify language in terms of , using formulations like RFC 3066 or its successor. Recently a successor for has been developed, called . This specification takes as the basis for language identification, and as the basis for matching of language tags.

refers to language identification only. Locales can be identified in several ways. One method is by inference from
language tags. For example, an implementation could map a language tag
from an existing protocol, such as HTTP's Accept-Language header, to its
locale model. Locales may also be identified directly by using the language
tag syntax in data items (elements, attributes, headers, etc.) that
explicitly serve the purpose of locale identification.

Currently, this specification refers to and directly. Since and are expected to become the new BCP 47 before this working draft becomes a recommendation, a later draft of this specification will refer to BCP 47 directly.

Out of Scope

This specification will not deal with formats for locale data or actual locale data. One possible source of locale data and data formats is
.

Application Scenario: Web Services Internationalization

In order to enable multi-locale operation of Web services and to create the ability for locale negotiation, this specification describes a standardized method for identifying locales and locale and/or language tags on the Web, including non-normative guidelines for implementation. This is called out in Requirement R005 of . The mechanism for language and locale identification which is defined in this specification will be used in a future version of the description of Web services Internationalization in .

Further application scenarios of this specification encompass for example the standards mentioned in . The scenarios can be divided in four areas:

Definition of values for language tags

Definition of values for locale identifiers

Definition of matching schemes for language tags

Definition of matching schemes for locale identifiers

As for matching of language tags, many specifications already define operations using matching. An example is the language pseudo-class :lang defined in sec. 5.11.4 of . It matches elements based on their language. This specification formulates requirements on such operations, based on .

Locale versus Natural Language

This document defines locale identifiers for use in Web technologies. Historically, natural language identifiers have been used as locale identifiers by some programming languages or operating environments, which is natural since locale identifiers usually share certain core features related to
natural language and country/region. This specification defines locale
identifiers that specific locale implementations can map to their
proprietary features in order to create functional, interoperable
applications.

The minimal requirement is the ability to specify the natural language; thus there is industry convergence on the use of as the core of a locale identifier. For example, uses as the core of a locale identifier, and provides syntax for extensions for non-linguistic information, such as preferred currency or timezone.

A major difference between language tags and locale identifiers is the meaning of the region code. In both language tags and locales, the region code indicates variation in language (as with regional dialects) or presentation and format (such as number or date formats). In a locale, the region code is also sometimes used to indicate the physical location, market, legal, or other governing policies for the user.

The language tag may be available in several places. In HTTP, there is an Accept-Language header field which can be used. MIME has a Content-Language header which contains a language tag. In XML, there is an attribute which can be defined for elements called xml:lang. xml:lang marks all the contents and attribute values of the corresponding element as belonging to the language identified. What that means for processing those contents varies from application to application.

For more detailed information on the behavior of xml:lang, see .

Notation and Terminology

This section is normative.

Language Tags and Matching of Language Tags

This document uses the terms language tag and subtag which are defined in .

In addition, this document uses the following terms, which are defined in :

language range

basic language range (see sec. 2.1 of )

extended language range (see sec. 2.2 of )

language priority list (see sec. 2.3 of )

Basic versus extended language range and language priority list

de-de is a basic language range. It matches e.g. the language tag de-DE-1996, but not the language tag de-Deva.

de-*-DE is an extended language range. It matches all of the
following tags:

de-DE

de-DE-x-goethe

de-Latn-DE-1996

"en; fr; zh-Hant" is a language priority list. It would be read as English before French before Chinese as
written in the Traditional script. Note that the syntax shown is only an example, since it depends on the protocol, application, or
implementation that uses the list.

Conformance

This section is normative

This section explains the conditions that specifications have to fulfill to be able to claim conformance to this specification.

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in .

Language Tags and Locale Values

This section is normative

The following requirements are formulated for specifications who deal with language tags and locale values or matching schemes.

Specifications that make use of language tags or locale values MUST meet the conformance criteria defined for "well-formed" processors, as defined in sec. 2.2.9 of .

Specifications that make use of language tags or locale values MAY validate these values. If they do so, they MUST meet the conformance criteria defined for "validating" processors, as defined in sec. 2.2.9 of .

Specifications that define operations on language tags or locale values using matching Must use either a basic language range or an extended language range.

Specifications that define operations on language tags or locale values using matching MUST specify whether the resulting language priority list contains a single result (lookup as defined in ), or a possible empty set of results (filtering as defined in ).

Many specifications which have been created before and are conformant to these criteria. The purpose of the criteria is to provide a stable source for requirements for language and locale identification.

Guidelines for the Interoperable Implementation of this Specification

This section is informative.

This section will be written in a subsequent working draft.

Normative ReferencesTags for the Identification of Languages. IETF Best Common Practice. BCP 47 is currently represented by .S. Bradner. Key Words for use in RFCs to Indicate Requirement Levels. IETF March 1997. Available at http://www.ietf.org/rfc/rfc2119.txt.
Addison Phillips, Mark Davis. Tags for the Identification of Languages. IETF Internet-Draft, 14 October 2005. See http://www.ietf.org/internet-drafts/draft-ietf-ltru-registry-14.txt.Addison Phillips, Mark Davis Matching of Language Tags. IETF Internet-Draft, June 2006. See http://www.ietf.org/internet-drafts/draft-ietf-ltru-matching-14.txt.Martin Dürst, Michael Suignard. Internationalized Resource Identifiers (IRIs). IETF January 2005. Available at http://www.ietf.org/rfc/rfc3987.txt.References
Common Locale Data Registry (CLDR). Available at http://unicode.org/cldr/.Bert Bos, Tantek Çelik, Ian Hickson, Håkon Wium Lie. Cascading Style Sheets, level 2 revision 1. W3C Working Draft 13 June 2005. Available at http://www.w3.org/TR/2005/WD-CSS21-20050613/. The latest version of CSS 2.1 is available at http://www.w3.org/TR/CSS21/.Dave Ragget, Arnaud Le Hors, Ian Jacobs, eds. HTML 4.01 Specification. W3C Recommendation 24 December 1999. Available at http://www.w3.org/TR/1999/REC-html401-19991224/. The latest version of HTML 4.01 is available at http://www.w3.org/TR/html401/.Mark Davis.
Locale Data Markup Language (LDML), Unicode Technical Standard #35.
Available at http://unicode.org/reports/tr35/tr35-5.html. The latest version of LDML is available at http://unicode.org/reports/tr35/.
H. Alvestrand, editor. Tags for the Identification of Languages, IETF January 2001. Available at http://www.ietf.org/rfc/rfc3066.txt.
Addison Phillips, Mary Trumble. Web Services Internationalization (WS-I18N). W3C Working Draft 14 September 2005. Available at http://www.w3.org/TR/2005/WD-ws-i18n-20050914/. The latest version of WS i18n is available at http://www.w3.org/TR/ws-i18n/.Addison Phillips. Requirements for the Internationalization of Web Services. W3C Working Group Note 16 November 2004. Available at http://www.w3.org/TR/2004/NOTE-ws-i18n-req-20041116/. The latest version of Ws i18n Req is available at http://www.w3.org/TR/ws-i18n-req/.Debasish Banerjee, Martin Dürst, Mike McKenna, Addison Phillips, Takao Suzuki, Tex Texin, Mary Trumble, Andrea Vine, Kentaro Noji. Web Services Internationalization Usage Scenarios. W3C Working Group Note 30 July 2004. Available at http://www.w3.org/TR/2004/NOTE-ws-i18n-scenarios-20040730/. The latest version of WS i18n Scenarios is available at http://www.w3.org/TR/ws-i18n-scenarios/.Tim Bray, Jean Paoli, C.M. Sperberg-McQueen, et al., eds. Extensible Markup Language (XML) 1.0 (Third Edition), W3C Recommendation 04 February 2004. Available at http://www.w3.org/TR/2004/REC-xml-20040204/. The latest version of XML 1.0 is available at http://www.w3.org/TR/REC-xml/.
Sharon Adler et al., eds. Extensible Stylesheet Language (XSL) Version 1.0. W3C Recommendation 15 October 2001. Available at http://www.w3.org/TR/2001/REC-xsl-20011015/. The latest version of XSL 1.0 is available at http://www.w3.org/TR/xsl/.Revision Log

The following log records changes that have been made to this document since the publication in April 2006.

The informative introductory section has been rewritten thoroughly, including the description of the scope of the document, of application scenarios and of the separation locale versus natural language.

Terms which rely on and are not defined anymore, but only reference these documents, see . In addition, examples for these terms have been created.

The requirements for language and locale values have been taken out of the conformance section and are now placed in the separate .