ISO 639-3:2007, Codes for the representation of names of
languages – Part 3: Alpha-3 code for comprehensive coverage of
languages, is an international standard for language codes in the ISO
639 series. It defines three-letter codes for identifying languages.
The standard was published by ISO on 1 February 2007.[1]
ISO 639-3 extends the
ISO 639-2 alpha-3 codes with an aim to cover all
known natural languages. The extended language coverage was based
primarily on the language codes used in the
EthnologueEthnologue (volumes 10-14)
published by SIL International, which is now the registration
authority for ISO 639-3.[2] It provides an enumeration of languages as
complete as possible, including living and extinct, ancient and
constructed, major and minor, written and unwritten.[1] However, it
does not include reconstructed languages such as
Proto-Indo-European.[3]
ISO 639-3 is intended for use as metadata codes in a wide range of
applications. It is widely used in computer and information systems,
such as the Internet, in which many languages need to be supported. In
archives and other information storage, they are used in cataloging
systems, indicating what language a resource is in or about. The codes
are also frequently used in the linguistic literature and elsewhere to
compensate for the fact that language names may be obscure or
ambiguous.
Because it provides comprehensive language coverage, giving equal
opportunity for all languages, and because of its wide adoption in
information technologies,
ISO 639-3 provides an important technology
component addressing the digital divide problem.

Language codes[edit]
Main article: List of
ISO 639-3 codes
ISO 639-3 includes all languages in
ISO 639-1 and all individual
languages in ISO 639-2.
ISO 639-1 and
ISO 639-2 focused on major
languages, most frequently represented in the total body of the
world's literature. Since
ISO 639-2 also includes language collections
and Part 3 does not,
ISO 639-3 is not a superset of ISO 639-2. Where B
and T codes exist in ISO 639-2,
ISO 639-3 uses the T-codes.
Examples:

language
639-1
639-2 (B/T)
639-3
type
639-3
code

English
en
eng
individual
eng

German
de
ger/deu
individual
deu

Arabic
ar
ara
macro
ara

individual
arb + others

Chinese
zh
chi/zho[4][5]
macro
zho

Mandarin
individual
cmn

Cantonese
individual
yue

Minnan
individual
nan

As of April 2012[update], the standard contains 7776 entries.[6]
The inventory of languages is based on a number of sources including:
the individual languages contained in 639-2, modern languages from the
Ethnologue, historic varieties, ancient languages and artificial
languages from the Linguist List,[7] as well as languages recommended
within the annual public commenting period.
Machine-readable data files are provided by the registration
authority.[6] Mappings from
ISO 639-1 or
ISO 639-2 to
ISO 639-3 can be
done using these data files.
ISO 639-3 is intended to assume distinctions based on criteria that
are not entirely subjective.[8] It is not intended to document or
provide identifiers for dialects or other sub-language variations.[9]
Nevertheless, judgments regarding distinctions between languages may
be subjective, particularly in the case of oral language varieties
without established literary traditions, usage in education or media,
or other factors that contribute to language conventionalization.
Code space[edit]
Since the code is three-letter alphabetic, one upper bound for the
number of languages that can be represented is 26 × 26 × 26 = 17576.
Since
ISO 639-2 defines special codes (4), a reserved range (520) and
B-only codes (23), 547 codes cannot be used in part 3. Therefore, a
stricter upper bound is 17576 − 547 = 17029.
The upper bound gets even stricter if one subtracts the language
collections defined in 639-2 and the ones yet to be defined in ISO
639-5.
Macrolanguages[edit]
Main article:
ISO 639 macrolanguage
There are 56 languages in
ISO 639-2 which are considered, for the
purposes of the standard, to be "macrolanguages" in ISO 639-3.[10]
Some of these macrolanguages had no individual language as defined by
ISO 639-3 in the code set of ISO 639-2, e.g. 'ara' (Generic Arabic).
Others like 'nor' (Norwegian) had their two individual parts ('nno'
(Nynorsk), 'nob' (Bokmål)) already in ISO 639-2.
That means some languages (e.g. 'arb', Standard Arabic) that were
considered by
ISO 639-2 to be dialects of one language ('ara') are now
in
ISO 639-3 in certain contexts considered to be individual languages
themselves.
This is an attempt to deal with varieties that may be linguistically
distinct from each other, but are treated by their speakers as two
forms of the same language, e.g. in cases of diglossia.
For example:

See[11] for the complete list.
Collective languages[edit]
See also:
ISO 639-2 § Collective language codes, and ISO 639-5
"A collective language code element is an identifier that represents a
group of individual languages that are not deemed to be one language
in any usage context."[12] These codes do not precisely represent a
particular language or macrolanguage.
While
ISO 639-2 includes three-letter identifiers for collective
languages, these codes are excluded from ISO 639-3. Hence
ISO 639-3 is
not a superset of ISO 639-2.
ISO 639-5 defines 3-letter collective codes for language families and
groups, including the collective language codes from ISO 639-2.
SpecialSpecial codes[edit]
Four codes are set aside in
ISO 639-2 and
ISO 639-3 for cases where
none of the specific codes are appropriate. These are intended
primarily for applications like databases where an ISO code is
required regardless of whether one exists.

mis
uncoded languages

mul
multiple languages

und
undetermined languages

zxx
no linguistic content / not applicable

mis (originally an abbreviation for 'miscellaneous') is intended for
languages which have not (yet) been included in the ISO standard.
mul is intended for cases where the data includes more than one
language, and (for example) the database requires a single ISO code.
und is intended for cases where the language in the data has not been
identified, such as when it is mislabeled or never had been labeled.
It is not intended for cases such as Trojan where an unattested
language has been given a name.
zxx is intended for data which is not a language at all, such as
animal calls.[13]

In addition, 520 codes in the range qaa–qtz are 'reserved for local
use'. For example, the
Linguist List uses them for extinct languages.
Linguist List has assigned one of them a generic value:

This is used for proposed intermediate nodes in a family tree that
have no name.
Maintenance processes[edit]
The code table for
ISO 639-3 is open to changes. In order to protect
stability of existing usage, the changes permitted are limited to:[14]

modifications to the reference information for an entry (including
names or categorizations for type and scope),
addition of new entries,
deprecation of entries that are duplicates or spurious,
merging one or more entries into another entry, and
splitting an existing language entry into multiple new language
entries.

The code assigned to a language is not changed unless there is also a
change in denotation.[15]
Changes are made on an annual cycle. Every request is given a minimum
period of three months for public review.
The
ISO 639-3 Web site has pages that describe "scopes of
denotation"[16] (languoid types) and types of languages,[17] which
explain what concepts are in scope for encoding and certain criteria
that need to be met. For example, constructed languages can be
encoded, but only if they are designed for human communication and
have a body of literature, preventing requests for idiosyncratic
inventions.
The registration authority documents on its Web site instructions made
in the text of the
ISO 639-3 standard regarding how the code tables
are to be maintained.[18] It also documents the processes used for
receiving and processing change requests.[19]
A change request form is provided, and there is a second form for
collecting information about proposed additions. Any party can submit
change requests. When submitted, requests are initially reviewed by
the registration authority for completeness.
When a fully documented request is received, it is added to a
published Change Request Index. Also, announcements are sent to the
general LINGUIST discussion list at
Linguist List and other lists the
registration authority may consider relevant, inviting public review
and input on the requested change. Any list owner or individual is
able to request notifications of change requests for particular
regions or language families. Comments that are received are published
for other parties to review. Based on consensus in comments received,
a change request may be withdrawn or promoted to "candidate status".
Three months prior to the end of an annual review cycle (typically in
September), an announcement is set to the LINGUIST discussion list and
other lists regarding Candidate Status Change Requests. All requests
remain open for review and comment through the end of the annual
review cycle.
Decisions are announced at the end of the annual review cycle
(typically in January). At that time, requests may be adopted in whole
or in part, amended and carried forward into the next review cycle, or
rejected. Rejections often include suggestions on how to modify
proposals for resubmission. A public archive of every change request
is maintained along with the decisions taken and the rationale for the
decisions.[20]
Criticism[edit]
Linguists Morey, Post and Friedman raise various criticisms of ISO
639, and in particular ISO 639-3:[15]

The three-letter codes themselves are problematic, because while
officially arbitrary technical labels, they are often derived from
mnemonic abbreviations for language names, some of which are
pejorative. For example, Yemsa was assigned the code [jnj], from
pejorative "Janejero". These codes may thus be considered offensive by
native speakers, but codes in the standard, once assigned, cannot be
changed.
The administration of the standard is problematic because SIL is a
missionary organization with inadequate transparency and
accountability. Decisions as to what deserves to be encoded as a
language are made internally. While outside input may or may not be
welcomed, the decisions themselves are opaque, and many linguists have
given up trying to improve the standard.
Permanent identification of a language is incompatible with language
change.
Languages and dialects often cannot be rigorously distinguished, and
dialect continua may be subdivided in many ways, whereas the standard
privileges one choice. Such distinctions are often based instead on
social and political factors.
ISO 639-3 may be misunderstood and misused by authorities that make
decisions about people's identity and language, abrogating the right
of speakers to identify or identify with their speech variety. Though
SIL is sensitive to such issues, this problem is inherent in the
nature of an established standard, which may be used (or mis-used) in
ways that ISO and SIL do not intend.

Martin Haspelmath agrees with four of these points, but not the point
about language change.[21] He disagrees because any account of a
language requires identifying it, and we can easily identify different
stages of a language. He suggests that linguists may prefer to use a
codification that is made at the languoid level since "it rarely
matters to linguists whether what they are talking about is a
language, a dialect or a close-knit family of languages." He also
questions whether an ISO standard for language identification is
appropriate since ISO is an industrial organization, while he views
language documentation and nomenclature as a scientific endeavor. He
cites the original need for standardized language identifiers as
having been "the economic significance of translation and software
localization," for which purposes the
ISO 639-1 and 639-2 standards
were established. But he raises doubts about industry need for the
comprehensive coverage provided by ISO 639-3, including as it does
"little-known languages of small communities that are never or hardly
used in writing and that are often in danger of extinction".
Usage[edit]

Ethnologue
Linguist List
OLAC: the Open Languages Archive Community[22]
Microsoft Windows 8:[23] Supports all codes in
ISO 639-3 at the time
of release.
Wikimedia foundation: New language-based projects (e.g.s in
new languages) must have an identifier from ISO 639-1, -2, or -3.[24]
Other standards that rely on ISO 639-3:

BCP 47: Best Current Practice 47,[25] which includes RFC 5646
RFC 5646, which superseded RFC 4646, which superseded RFC 3066.
(Therefore, all standards which depend on any of these 3 IETF
standards now use ISO 639-3.)