Abstract

XML Schema: Datatypes is part 2 of the specification of
the XML Schema language. It defines facilities for defining datatypes
to be used in XML Schemas as well as other XML specifications. The
datatype language, which is itself represented in XML, provides a superset of the
capabilities found in XML
document type definitions (DTDs) for specifying datatypes on elements
and attributes.

Status of this Document

This section describes the status of this document at the
time of its publication. Other documents may supersede this document.
A list of current W3C publications and the latest revision of this
technical report can be found in the
W3C technical reports index at
http://www.w3.org/TR/.

This ↓is a
Public Working
Draft of↓↑
W3C Candidate Recommendation
specifies
↑
W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes.
It is here made available for
review by W3C members and the
public. This version of this document was created on
30 April 2009.

For those primarily interested in the changes since version 1.0,
the Changes since version 1.0 (§I) appendix, which summarizes both changes
already made and also those in prospect, with links to the relevant
sections of this draft, is the recommended starting point.
An
accompanying version of this document displays in color all changes to
normative text since version 1.0; another shows changes since the
previous Working Draft.

The major changes since version 1.0 include:

Support for XML 1.1 has been added. It is now implementation
defined whether datatypes dependent on definitions in
[XML] and [Namespaces in XML] use the definitions
as found in version 1.1 or version 1.0 of those specifications.

A new primitive decimal type has been defined, which retains
information about the precision of the value. This type is aligned
with the floating-point decimal types which
↓will
be part of the next edition of IEEE
754↓↑are included in
[IEEE 754-2008]↑.

In order to align this specification with those being prepared by
the XSL and XML Query Working Groups, a new datatype named
anyAtomicType which
serves as the base type definition for all primitive atomic
datatypes has been introduced.

The conceptual model of the date- and time-related types has been
defined more formally.

A more formal treatment of the fundamental facets of the primitive
datatypes has been adopted.

More formal definitions of the lexical space of most types have
been provided, with detailed descriptions of the mappings from lexical
representation to value and from value to ·canonical representation·.

The validation rule
Datatype Valid (§4.1.4) has been recast in more declarative form.
A paraphrase of the constraint in procedural terms, which corrects
some errors in the previous versions of this document, has been added
as a note.

The rules governing partial
implementations of infinite datatypes have been clarified.

Various changes have been made in order to align the relevant
parts of this specification more closely with other relevant
specifications, including especially the corresponding
sections of [XSD 1.1 Part 1: Structures].

Changes since the previous public Working Draft include the following:

↑

To reduce confusion and avert a widespread misunderstanding,
the normative references to various W3C specifications now state
explicitly that while the reference describes the particular edition
of a specification current at the time this specification is
published, conforming implementations of this specification
are not required to ignore later editions of the other
specification but instead may support later editions, thus
allowing users of this specification to benefit from corrections to other
specifications on which this one depends.

The discussion of whitespace handling in
whiteSpace (§4.3.6) makes clearer that when
the value is collapse, ·literals· consisting
solely of whitespace characters are reduced to the
empty string; the earlier formulation has been misunderstood
by some implementors.

The historical list of leap seconds given
in earlier versions of this document has been removed (see issue
6554).

↑

The publication form of this document now includes a
detailed prose description of the type hierarchy diagram
in section Built-in Datatypes and Their Definitions (§3). We thank the
W3C Web Accessibility Initiative's
Protocols and Formats Working Group
for their comments and assistance in this connection.

Several other editorial corrections and improvements have been made.

Comments on this document should be made in
W3C's public installation of Bugzilla, specifying "XML Schema" as the
product. Instructions can be found at
http://www.w3.org/XML/2006/01/public-bugzilla. If access to
Bugzilla is not feasible, please send your comments to the W3C XML
Schema comments mailing list,
www-xml-schema-comments@w3.org
(archive)
and note explicitly that
you have not made a Bugzilla entry for the comment.
Each Bugzilla entry and email message should contain only one
comment.

The
↓end of the Last Call review period
is 3 August 2009↓↑Candidate Recommendation
review period for this document extends until
3 August 2009↑; comments received after that date will be
considered if time allows, but no guarantees can be offered.

Although feedback based on any
aspect of this specification is welcome, there are certain aspects of
the design presented herein for which the Working Group is
particularly interested in feedback. These are designated
'priority feedback' aspects of the design, and
identified as such in editorial notes at appropriate points in this
draft.
↑Any feature mentioned in a
priority feedback note is a "feature
at risk": the feature may be retained as is or
dropped, depending on the feedback received from readers,
schema authors, schema users, and implementors.↑

Publication as a Candidate Recommendation does not imply
endorsement by the W3C Membership. This is a draft document and may be
updated, replaced or obsoleted by other documents at any time. It is
inappropriate to cite this document as other than work in
progress.

↑

The W3C XML Schema Working Group intends to
request advancement of this specification and publication as a
Proposed Recommendation
(possibly with editorial
changes, and possibly removing features identified as being
at risk) as soon after 3 August 2009 as the following
conditions are met.

A test suite is available which tests each required and optional
feature of XSD 1.1.

Each feature of the specification has been implemented successfully
by at least two independent implementations.

The Working Group has responded formally to all issues raised
against this document during the Candidate Recommendation period.

At the time this Candidate Recommendation was published, no interoperability
or implementation report had yet been prepared.

The presentation of this document has been augmented to
identify changes from a previous version, controlled by dg-statusquo-color-200901.xml, which shows the status-quo text without adornment. Three kinds of changes are highlighted:
↑new, added text↑,
↑changed text↓, and
↓deleted text↓.

Do not allow any changes to XML transfer syntax except those
required by version control hooks and bug fixes

The overall aim as regards compatibility is that

All schema documents conformant to version 1.0 of this
specification should also conform to version 1.1, and should have the
same validation behavior across 1.0 and 1.1 implementations (except
possibly in edge cases and in the details of the resulting
PSVI);

The vast majority of schema documents conformant to version
1.1 of this specification should also conform to version 1.0, leaving
aside any incompatibilities arising from support for versioning, and
when they are conformant to version 1.0 (or are made conformant by the
removal of versioning information), should have the same validation
behavior across 1.0 and 1.1 implementations (again except possibly in
edge cases and in the details of the resulting PSVI);

1.2 Purpose

The [XML] specification defines limited
facilities for applying datatypes to document content in that documents
may contain or refer to DTDs that assign types to elements and attributes.
However, document authors, including authors of traditional
documents and those transporting data in XML,
often require a higher degree of type checking to ensure robustness in
document understanding and data interchange.

The table below offers two typical examples of XML instances
in which datatypes are implicit: the instance on the left
represents a billing invoice, the instance on the
right a memo or perhaps an email message in XML.

The invoice contains several dates and telephone numbers, the postal
abbreviation for a state (which comes from an enumerated list of
sanctioned values), and a ZIP code (which takes a definable regular
form). The memo contains many of the same types of information:
a date, telephone number, email address and an "importance" value
(from an enumerated list, such as "low", "medium" or "high").
Applications which process invoices and memos need to raise exceptions
if something that was supposed to be a date or telephone number does
not conform to the rules for valid dates or telephone numbers.

In both cases, validity constraints exist on the content of the
instances that are not expressible in XML DTDs. The limited
datatyping facilities in XML have prevented validating XML processors
from supplying the rigorous type checking required in these
situations. The result has been that individual applications
writers have had to implement type checking in an ad hoc manner.
This specification addresses the need of both document authors and
applications writers for a robust, extensible datatype system for XML
which could be incorporated into XML processors. As discussed
below, these datatypes could be used in other XML-related standards as
well.

1.3 Dependencies on Other Specifications

Other specifications on which this one depends
are listed in References (§K).

Conforming implementations of this specification
may provide either
the 1.1-based datatypes or the 1.0-based datatypes, or both. If both
are supported, the choice of which datatypes to use in a particular
assessment episode should be under user control.

Note:
When this specification is used to check the datatype validity of XML
input, implementations may provide the heuristic of using the 1.1
datatypes if the input is labeled as XML 1.1, and using the 1.0 datatypes if
the input is labeled 1.0, but this heuristic should be subject to
override by users, to support cases where users wish to accept XML 1.1
input but validate it using the 1.0 datatypes, or accept XML 1.0 input
and validate it using the 1.1 datatypes.

This specification
makes use of the EBNF notation used in the
[XML] specification. Note
that some constructs of the EBNF notation used here
resemble the regular-expression syntax defined in this specification
(Regular Expressions (§G)), but that they are not
identical: there are differences.
For a fuller description of the EBNF notation, see
Section
6. Notation of the [XML] specification.

1.4 Requirements

The [XML Schema Requirements] document spells out
concrete requirements to be fulfilled by this specification,
which state that the XML Schema Language must:

allow creation of user-defined datatypes, such as
datatypes that are derived from existing datatypes and which
may constrain certain of its properties (e.g., range,
precision, length, format).

1.5 Scope

This specification
defines datatypes that can be used in an XML Schema.
These datatypes can be specified for element content that would be
specified as #PCDATA
and attribute values of various types in a
DTD. It is the intention of this specification that it be usable
outside of the context of XML Schemas for a wide range of other
XML-related activities such as [XSL] and
[RDF Schema].

1.6 Terminology

The terminology used to describe XML Schema Datatypes is defined in
the body of this specification. The terms defined in the following
list are used in building those definitions and in describing the
actions of a datatype processor:

(Of strings or names:)
Two strings or names being compared must be
identical. Characters with multiple possible representations in
ISO/IEC 10646 (e.g. characters with both precomposed and
base+diacritic forms) match only if they have the same representation
in both strings. No case folding is performed.

(Of strings and rules
in the grammar:)
A string matches a grammatical production
if and only if it belongs to the language
generated by that production.

It is recommended that schemas, schema documents, and
processors behave as described, but there
can be valid reasons for them not to; it is important that the
full implications be understood and carefully weighed before
adopting behavior at variance with the recommendation.

A failure of a
schema or schema
document
to conform to the rules of this specification.

Except as otherwise specified,
processors must distinguish
error-free (conforming) schemas and schema documents
from those with errors;
if a schema
used in type-validation or a schema document
used in constructing a schema
is in error, processors must
report the fact;
if more than one is in error, it is ·implementation-dependent·
whether more than one is reported as being in error.
If more than one of the constraints given in
this specification is violated, it
is ·implementation-dependent· how many of the violations, and which, are
reported.

Note: Failure of an XML element or attribute to be
datatype-valid against a particular
datatype in a particular schema is not in itself a failure
to conform to this specification and thus,
for purposes of this specification, not an error.

1.7 Constraints and Contributions

This specification provides three different kinds of normative
statements about schema components, their representations in XML and
their contribution to the schema-validation of information items:

Constraints expressed by schema components which information items
·must· satisfy to be schema-valid. Largely to
be found in Datatype components (§4).

2 Datatype System

This section describes the conceptual framework behind the datatype system defined in this
specification. The framework has been influenced by the
[ISO 11404] standard on language-independent datatypes as
well as the datatypes for [SQL] and for programming
languages such as Java.

The datatypes discussed in this specification are for the most part well known abstract
concepts such as integer and date. It is not
the place of this specification to thoroughly define these abstract concepts; many
other publications provide excellent definitions. However, this specification will attempt to describe the
abstract concepts well enough that they can be readily recognized and
distinguished from other abstractions with which they may be
confused.

Note: Only those operations and relations needed for schema processing
are defined in this specification. Applications using these datatypes
are generally expected to implement appropriate additional functions
and/or relations to make the datatype generally useful. For
example, the description herein of the float datatype
does not define addition or multiplication, much less all of the
operations defined for that datatype in ↓[IEEE 754-1985]↓↑[IEEE 754-2008]↑ on
which it is based.
For some datatypes (e.g.
language or anyURI) defined in part by
reference to other specifications which impose constraints not part of
the datatypes as defined here, applications may also wish to check
that values conform to the requirements given in the current version
of the relevant external specification.

Note: This specification only defines the operations and relations needed
for schema processing. The choice of terminology for
describing/naming the datatypes is selected to guide users and
implementers in how to expand the datatype to be generally
useful—i.e., how to recognize the "real world"
datatypes and their variants for which the datatypes defined herein
are meant to be used for data interchange.

Along with the ·lexical mapping· it is
often useful to have an inverse which provides a standard
·lexical representation· for each value. Such
a ·canonical mapping· is not required for
schema processing, but is described herein for the benefit of users of
this specification, and other specifications which might find it
useful to reference these descriptions normatively.
For some datatypes, notably
QName and NOTATION, the mapping from
lexical representations to values is context-dependent; for these
types, no ·canonical mapping· is defined.

Note: This specification sometimes uses the shorter form "type"
where one might strictly speaking expect the longer form
"datatype" (e.g. in the phrases
"union type", "list type",
"base type", "item type", etc.
No systematic distinction is intended between
the forms of these phrase with "type" and
those with "datatype";
the two forms are used interchangeably.

The distinction between "datatype"
and "simple type definition", by contrast,
carries more information: the datatype is characterized by its
·value space·, ·lexical space·, ·lexical mapping·, etc., as
just described, independently of the specific facets or
other definitional mechanisms used in the simple type
definition to describe that particular ·value space·
or ·lexical space·. Different simple type definitions
with different selections of facets can describe the
same datatype.

2.2 Value space

[Definition:] The value spaceof a
datatype is the set of values for that
datatype. Associated with each value space are
selected operations and relations necessary to permit proper schema
processing. Each value in the value space of a
·primitive· or ·ordinary·
datatype is
denoted by one or more character strings in its ·lexical space·,
according to ·the lexical
mapping·; ·special·
datatypes, by contrast, may include "ineffable"
values not mapped to by any lexical representation.
(If the mapping is restricted during a
derivation in such a way that a value has no denotation, that value is
dropped from the value space.)

The value spaces of datatypes are abstractions,
and are defined in Built-in Datatypes and Their Definitions (§3)
to the extent needed to clarify them for readers. For example,
in defining the numerical datatypes, we assume some general numerical
concepts such as number and integer are known. In many cases we
provide references to other documents providing more complete
definitions.

Note:The value spaces and the values therein are
abstractions. This specification does not prescribe any
particular internal representations that must be used when
implementing these datatypes. In some cases, there are
references to other specifications which do prescribe specific
internal representations; these specific internal representations must
be used to comply with those other specifications, but need not be
used to comply with this specification.

In addition, other applications are expected to define additional
appropriate operations and/or relations on these value spaces (e.g.,
addition and multiplication on the various numerical datatypes'
value spaces), and are permitted where appropriate to even redefine
the operations and relations defined within this specification,
provided that for schema processing the relations and operations
used are those defined herein.

The ·value space· of a datatype can
be defined in one of the following ways:

defined by restricting the ·value space· of an already
defined datatype to a particular subset with a given set of properties
[see derived]

defined as a combination of values from one or more already
defined ·value space·(s) by a specific construction procedure [see
·list· and ·union·]

The relations of identity
and
equality
are required for each
value space. An
order relation is specified for some value spaces, but not
all.
A very few datatypes have other relations or
operations prescribed for the purposes of this specification.

2.2.1 Identity

The identity relation is always defined. Every value space
inherently has an identity relation. Two things are
identical if and only
if they are actually the same thing: i.e., if there is no way
whatever to tell them apart.

Note: This does not preclude implementing datatypes by using more than
one internal representation for a given value, provided
no mechanism inherent in the datatype implementation (i.e., other than
bit-string-preserving "casting" of the datum to a different
datatype) will distinguish between the two representations.

In the identity relation defined herein, values from different
·primitive· datatypes' ·value spaces· are made artificially
distinct if they might otherwise be considered identical. For
example, there is a number two in the decimal datatype and a number two in the float datatype. In the identity relation defined herein,
these two values are considered distinct. Other applications
making use of these datatypes may choose to consider values such as
these identical, but for the view of ·primitive· datatypes'
·value spaces· used herein, they are distinct.

WARNING: Care must be taken when identifying
values across distinct primitive datatypes. The
·literals· '0.1' and '0.10000000009' map
to the same value in float (neither 0.1 nor 0.10000000009 is in the value space, and
each literal is mapped to the
nearest value, namely 0.100000001490116119384765625), but map to
distinct values in decimal.

Given a list A and a list B, A and B
are the same list if they are the same sequence of atomic values.
The necessary and sufficient conditions for this identity are
that A and B have the same length and that the items of A
are pairwise identical to the items of B.

Note: It is a consequence of the rule just given for list identity
that there is only one empty list. An empty list declared as
having ·item type·decimal and an empty
list declared as having ·item type·string
are not only equal but identical.

2.2.2 Equality

Each ·primitive· datatype has prescribed an equality relation for
its value space. The equality relation for most datatypes is the
identity relation. In the few cases where it is not,
equality
has been carefully defined so that for
most operations of
interest to the datatype, if
two values are equal and one is substituted for the other as an
argument to any of the operations, the results will always also be
equal.

On the other hand, equality need not cover the entire value space
of the datatype (though it usually does). In
particular, NaN
is not equal to itself in the precisionDecimal,
float, and double datatypes.

Note: In the prior version of this specification (1.0), equality was
always identity. This has been changed to permit the datatypes
defined herein to more closely match the "real world"
datatypes for which they are intended to be used as transmission
formats.

For example, the float datatype has an equality
which is not the identity ( −0 = +0 , but
they are not identical—although they were identical
in the 1.0 version of this specification), and whose domain excludes
one value, NaN, so that NaN ≠ NaN .

For another example, the dateTime datatype
previously lost any time-zone offset information in the
·lexical representation· as the value was converted to
·UTC·;
now the time zone offset
is retained and two values representing the same "moment in
time" but with different remembered time zone offsets are now
equal but not identical.

In the equality relation defined herein, values from different
primitive data spaces are made artificially unequal even if they might
otherwise be considered equal. For example, there is a number
two in the decimal datatype and a number
two in the float datatype. In the
equality relation defined herein, these two values are considered
unequal. Other applications making use of these datatypes may
choose to consider values such as these equal;
nonetheless, in the equality relation defined herein, they are unequal.

Two lists A and B are equal if and
only if they have the same length and their items are pairwise equal.
A list of length one containing a value V1 and an atomic value
V2 are equal if and only if V1 is equal to V2.

For the purposes of this specification, there is one equality
relation for all values of all datatypes (the union of the various
datatype's individual equalities, if one consider relations to be
sets of ordered pairs). The equality relation is
denoted by '=' and its negation by
'≠', each used as
a binary infix predicate:
x = y and
x ≠ y . On the other
hand, identity relationships are always described in
words.

2.2.3 Order

For some
datatypes, an order relation is prescribed
for use in checking
upper and lower bounds of the ·value space·. This order may be
a partial order, which means that there may be values in
the ·value space· which are neither equal, less-than, nor
greater-than. Such value pairs are
incomparable. In many cases,
no order
is prescribed; each pair of values is either
equal or ·incomparable·.
[Definition:] Two
values that are neither equal, less-than, nor greater-than are
incomparable. Two values
that are not ·incomparable· are
comparable.

In this specification, this less-than order relation is denoted by
'<' (and its inverse by '>'),
the weak order by '≤' (and its inverse by
'≥'), and the resulting ·incomparable· relation by
'<>', each used as a binary infix predicate:
x < y ,
x ≤ y ,
x > y ,
x ≥ y , and
x <> y .

Note: The weak order "less-than-or-equal" means
"less-than" or "equal" and one
can tell which. For example, the duration P1M (one month) is not
less-than-or-equal P31D (thirty-one days) because P1M is not less than
P31D, nor is P1M equal to P31D. Instead, P1M is ·incomparable· with P31D.) The formal
definition of order for duration (duration (§3.3.7))
ensures
that this is true.

For
purposes of this specification, the value spaces of primitive datatypes are
disjoint, even in cases where the
abstractions they represent might be thought of as having
values in common. In the order
relations defined in this specification, values from
different value spaces are
·incomparable·. For example, the numbers two
and three are values in both the
decimal
datatype and the float datatype. In the order relation defined
here,
the two in the decimal datatype
is
not less than the three in the float datatype;
the two values are
incomparable. Other
applications making use of these
datatypes may choose to consider values such as these comparable.

Note: Comparison of values from different ·primitive· datatypes
can sometimes be an error and sometimes not, depending on context.

When made for purposes of checking an enumeration constraint,
such a comparison is not in itself an error, but since
no
two values from different ·primitive··value spaces· are
equal, any
comparison of ·incomparable· values will invariably be false.

Specifying an upper or lower bound which is of the wrong primitive
datatype (and therefore ·incomparable· with the values of the datatype
it is supposed to restrict) is, by contrast, always an error.
It is a consequence of the rules for
·facet-based restriction· that in conforming simple type definitions, the
values of upper and lower bounds, and enumerated values, must be
drawn from the value space of the ·base type·, which necessarily means
from the same ·primitive· datatype.

[Definition:] A sequence of zero or more
characters in the Universal Character Set (UCS) which may or may not
prove upon inspection to be a member of the ·lexical space· of a given
datatype and thus a ·lexical representation· of a given value in that datatype's
·value space·, is referred to as a literal. The
term is used indifferently both for character sequences which are
members of a particular ·lexical space· and for those which are
not.

Note: One should be aware that in the context of XML
schema-validity
assessment,
there are ·pre-lexical· transformations of the
input character string
(controlled by the
whiteSpace facet and any implementation-defined
·pre-lexical·
facets)
which result in the intended ·literal·.
Other systems utilizing this specification may or may not implement
these transformations. If they do not, then input character
strings that would have been transformed into correct
lexical representations, when taken "raw", may not be
correct ·lexical
representations·.

Note: There are currently no facets with such an impact. There may
be in the future.

For example, '100' and '1.0E2' are two
different ·lexical representations· from the float
datatype which
both denote the same value. The datatype system defined in this
specification provides mechanisms for schema designers to control the
·value space· and the corresponding set of acceptable
·lexical representations· of those values for a datatype.

Note:·Canonical
representations· are provided where feasible for the use of
other applications; they are not required for schema processing
itself. A conforming schema processor implementation is
not required to implement ·canonical
mappings·.

It is useful to categorize the datatypes defined in this
specification along various dimensions, defining terms which
can be used to characterize datatypes and the Simple Type Definitions
which define them.

2.4.1.1 Atomic Datatypes

Note: Atomic values are sometimes regarded, and described, as "not
decomposable", but in fact the values in several datatypes
defined here are described with internal structure, which is appealed
to in checking whether particular values satisfy various constraints
(e.g. upper and lower bounds on a datatype). Other specifications
which use the datatypes defined here may define operations which
attribute internal structure to values and expose or act upon that
structure.

For ·list· datatypes the ·lexical space·
is composed of space-separated
·literals·
of the
·item type·.
Any
·pattern· specified when a new datatype is
derived from a ·list· datatype
applies
to the members of the ·list· datatype's
·lexical space·, not to the members of the ·lexical space·
of the ·item type·. Similarly,
enumerated
values are compared to the entire ·list·, not to
individual list items,
and assertions apply to the entire ·list· too.
Lists are identical if and only if they have the
same length and their items are pairwise identical; they are
equal if and only if they have the same length and their items
are pairwise equal. And
a list of length one whose item is an atomic value V1 is
equal
to an atomic value V2
if and only if V1 is equal to V2.

It will be observed that the ·lexical mapping· of a union, so
defined, is not necessarily a function: a given ·literal· may map to
one value or to several values of different ·primitive· datatypes, and
it may be indeterminate which value is to be preferred in a particular
context. When the datatypes defined here are used in the context of
[XSD 1.1 Part 1: Structures], the xsi:type attribute defined by that
specification in section xsi:type can be used to indicate
which value a ·literal· which is the content of an element should map
to. In other contexts, other rules (such as type coercion rules) may
be employed to determine which value is to be used.

A prototypical example of a ·union· type is the
maxOccurs attribute on the
element element
in XML Schema itself: it is a union of nonNegativeInteger
and an enumeration with the single member, the string "unbounded", as shown below.

[Definition:] If a datatype M is in the
·transitive membership· of a ·union·
datatype U, but not one of U's ·member types·,
then a sequence of one or more ·union· datatypes necessarily exists,
such that the first is one of the ·member types· of U, each
is one of the ·member types· of its predecessor in the sequence, and
M is one of the ·member types· of the last in the sequence.
The ·union· datatypes in this sequence are said to
intervene between M and U. When
U and M are given by the context, the datatypes
in the sequence are referred to as the intervening unions.
When M is one of the ·member types· of U,
the set of intervening unions is the empty set.

The order in which the ·member types· are specified in the
definition (that is, in the case of
datatypes defined in a schema document, the order of the
<simpleType> children of the <union> element, or the order
of the QNames in the memberTypes attribute) is
significant. During validation, an element or attribute's value is
validated against the ·member types· in the order in which they appear
in the definition until a match is found. As noted above,
the evaluation order can be overridden with the use of
xsi:type.

Example

For example, given the definition below, the first instance of the <size> element
validates correctly as an integer (§3.4.13), the second and third as
string (§3.3.1).

Note: As normatively specified elsewhere,
conforming processors must support all the
primitive datatypes defined in this specification; it is
·implementation-defined· whether other primitive datatypes are
supported.

For example, in this specification, float is a
·primitive· datatype based on
a well-defined mathematical concept
and
not
defined in terms of other datatypes, while
integer is ·constructed·
from the more general datatype decimal.

The properties of the ·special· and the
standard
·primitive· datatypes are defined by this
specification. A Simple Type Definition is present for each of these
datatypes in every valid schema; it serves as a representation of the
datatype, but by itself it does not capture all the relevant
information and does not suffice (without knowledge
of this specification) to define the datatype.

Since each datatype has exactly one ·base type·,
and every datatype other
than anySimpleType
is derived directly or
indirectly from anySimpleType, it follows that
the ·base type· relation arranges all
simple types into a tree structure, which is conventionally
referred to as the derivation hierarchy.

2.4.4 Built-in vs. User-Defined Datatypes

[Definition:] User-defined datatypes are those
datatypes that are defined by individual schema designers.

The ·built-in· datatypes are intended to be
available automatically whenever this specification is implemented or
used, whether by itself or embedded in a host language. In the
language defined by [XSD 1.1 Part 1: Structures],
the ·built-in· datatypes are automatically
included in every valid schema. Other host languages should specify
that all of the datatypes decribed here as built-ins are automatically
available; they may specify that additional datatypes are also made
available automatically.

Note:·Implementation-defined· datatypes, whether ·primitive· or ·ordinary·,
may sometimes
be included automatically in any schemas processed
by that implementation; nevertheless, they are not built in
to every schema, and are thus not included
in the term 'built-in', as that term is
used in this specification.

The mechanism for making ·user-defined·
datatypes available for use is not defined in this specification; if
·user-defined· datatypes are to be available, some such mechanism
must be specified by the host language.

[Definition:] A
datatype which is not available for use is said to be
unknown.

Note: From the schema author's perspective, a reference to
a datatype which proves to be ·unknown· might reflect
any of the following causes, or others:

1An error has been made in giving the name of the datatype.

2The datatype is a ·user-defined· datatype which has not been made
available using the means defined by the host language (e.g.
because the appropriate schema document has not been
consulted).

Conceptually there is no difference between the ·ordinary··built-in·
datatypes included in this specification and the ·user-defined·
datatypes which will be created by individual schema designers.
The ·built-in··constructed· datatypes
are those which are believed to be so common that if they were not
defined in this specification many schema designers would end up
reinventing them. Furthermore, including these
·constructed· datatypes in this specification serves to
demonstrate the mechanics and utility of the datatype generation
facilities of this specification.

Additionally, each facet definition element can be uniquely
addressed via a URI constructed as follows:

the base URI is the URI of the XML Schema namespace

the fragment identifier is the name of the facet

For example, to address the maxInclusive facet, the URI is:

http://www.w3.org/2001/XMLSchema#maxInclusive

Additionally, each facet usage in a built-in
Simple Type Definition
can be uniquely addressed via a URI constructed as follows:

the base URI is the URI of the XML Schema namespace

the fragment identifier is the name of the
Simple Type Definition, followed
by a period ('.') followed by the name of the facet

For example, to address the usage of the maxInclusive facet in
the definition of int, the URI is:

http://www.w3.org/2001/XMLSchema#int.maxInclusive

3.1 Namespace considerations

The ·built-in· datatypes defined by this specification
are designed to be used with the XML Schema definition language as well as other
XML specifications.
To facilitate usage within the XML Schema definition language, the ·built-in·
datatypes in this specification have the namespace name:

http://www.w3.org/2001/XMLSchema

To facilitate usage in specifications other than the XML Schema definition language,
such as those that do not want to know anything about aspects of the
XML Schema definition language other than the datatypes, each
↓non-·special·↓·built-in·
datatype is also defined in the namespace whose URI is:

http://www.w3.org/2001/XMLSchema-datatypes

↓

Note: The use of the XMLSchema-datatypes namespace and the
definitions therein are deprecated as of
XML Schema 1.1.

3.2.1.1 Value space

Note: It is a consequence of this definition, together with the
definition of the ·lexical mapping· in the next section, that some
values of this datatype have no ·lexical representation· using the
·lexical mappings· defined by this specification. That is, the
"potential" ·value space· and the "effable"
or "nameable" ·value space· diverge for this datatype.
As far as this specification is concerned, there is no operational
difference between the potential and effable ·value spaces· and the
distinction is of mostly formal interest. Since some host languages
for the type system defined here may allow means of construction
values other than mapping from a ·lexical representation·, the
difference may have practical importance in some contexts. In those
contexts, the term ·value space· should unless otherwise qualified be
taken to mean the potential ·value space·.

The ·lexical mapping· of anySimpleType is the union
of the ·lexical mappings· of
all ·primitive· datatypes and all list datatypes.
It will be noted that this mapping is not a function: a given
·literal· may map to one value or to several values of different
·primitive· datatypes, and it may be indeterminate which value is to
be preferred in a particular context. When the datatypes defined here
are used in the context of [XSD 1.1 Part 1: Structures], the
xsi:type attribute defined by that specification in section
xsi:type can be used
to indicate which value a ·literal· which is the content of an element
should map to. In other contexts, other rules (such as type coercion
rules) may be employed to determine which value is to be used.

The ·lexical mapping· of anyAtomicType is the union
of the ·lexical mappings· of
all ·primitive· datatypes.
It will be noted that this mapping is not a function: a given
·literal· may map to one value or to several values of different
·primitive· datatypes, and it may be indeterminate which value is to
be preferred in a particular context. When the datatypes defined here
are used in the context of [XSD 1.1 Part 1: Structures], the
xsi:type attribute defined by that specification in section
xsi:type can be used
to indicate which value a ·literal· which is the content of an element
should map to. In other contexts, other rules (such as type coercion
rules) may be employed to determine which value is to be used.

3.3.1.1 Value Space

The ·value space·
of string is the set of finite-length sequences of
zero or more
characters (as defined in
[XML]) that ·match· the
Char production from [XML].
A character is an atomic unit of
communication; it is not further specified except to note that every
character has a corresponding
Universal Character Set (UCS) code point, which is an integer.

3.3.1.3
Facets

string has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.3.3 decimal

[Definition:] decimal
represents
a
subset of the real numbers, which
can be represented by
decimal numerals.
The ·value space· of decimal
is the set of numbers that can be obtained by
dividing
an integer by a non-negative
power of ten, i.e., expressible as
i / 10n
where i and n are integers
and
n ≥ 0.
Precision is not reflected in this value space;
the number 2.0 is not distinct from the number 2.00.
(The datatype precisionDecimal may be used
for values in which precision is significant.)
The order relation on decimal
is the order relation on real numbers, restricted
to this subset.

3.3.3.1 Lexical
Mapping

decimal
has
a lexical representation
consisting of a
non-empty finite-length
sequence of
decimal
digits (#x30–#x39) separated
by a period as a decimal indicator.
An optional leading sign is allowed.
If the sign is omitted,
"+"
is assumed. Leading and trailing zeroes are optional.
If the fractional part is zero, the period and following zero(es) can
be omitted.
For example:
'-1.23',
'12678967.543233', '+100000.00',
'210'.

The lexical space of decimal is the set of
lexical representations which match the grammar given above, or
(equivalently) the regular expression

(\+|-)?([0-9]+(\.[0-9]*)?|\.[0-9]+)

The mapping from lexical representations to values is the usual
one for decimal numerals; it is given formally in
·decimalLexicalMap·.

The definition
of the ·canonical representation· has the
effect of prohibiting certain options from the
Lexical
Mapping (§3.3.3.1).
Specifically,
for integers, the decimal point and fractional part are prohibited.
For other values,
the preceding optional
"+"
sign is prohibited. The decimal point is required.
In
all cases, leading and
trailing zeroes are prohibited subject to the following: there
must be at least one digit to the right and to the left of the decimal
point which may be a
zero.

3.3.3.3 Datatypes based on decimal

3.3.4 precisionDecimal

[Definition:] The precisionDecimal
datatype represents the
numeric value and (arithmetic) precision of decimal numbers which retain
precision; it also
includes
values for positive and negative infinity and
for
"not a number", and it differentiates
between "positive zero" and "negative
zero".
This datatype is introduced
to provide a variant of decimal that closely corresponds
to the floating-point decimal datatypes described by
↓the expected
forthcoming revision of IEEE/ANSI
754↓↑[IEEE 754-2008]↑.
Precision of values is retained
and values
are included
for two zeroes, two infinities, and not-a-number.

↑

Note: Users wishing to implement useful operations for this datatype
(beyond the equality and order specified herein) are urged to
consult [IEEE 754-2008].

↑

↓

Precision is sometimes given in absolute,
sometimes in relative terms.
[Definition:] The arithmetic precision of a value is
expressed in absolute quantitative terms,
by indicating
how many digits to the right of the decimal point are significant.
"5" has an arithmetic precision of 0, and
"5.01" an arithmetic precision of 2.

↓

↑

Informally, the precision of
a value is denoted by the number of decimal digits after
the decimal point in its
·lexical
representations·.
The numbers 2 and 2.00, although numerically equal, have
different precision (0 and 2 respectively). The
precision of a value is derived from the
·lexical representation· using rules
described in Lexical Mapping (§3.3.4.2).
Values retain their precision, but that precision
plays no part in any operations defined in this
specification other than identity:
specifically, it does not affect equality or ordering
comparisons. Precision may play a role in
arithmetic operations, but that is outside the scope
of this specification.

The precisionDecimal datatype is a ‘feature at risk’.
It may be retained or dropped at the end of the Candidate Recommendation
period.
The determination of whether to retain it or remove it will depend
both on the existence of two independent implementations in
the context of this specification and on
the degree of uptake of [IEEE 754-2008] in the industry.
Possible outcomes include
retention of precisionDecimal,
dropping precisionDecimal,
rearrangement of the type hierarchy in this area,
and merger with the existing decimal datatype.

Note: As explained below, 'NaN' is the lexical
representation of the precisionDecimal value whose ·numericalValue·
property
has the ·special value·notANumber. Accordingly, in English text we
use 'NaN' to refer to that value. Similarly we
use 'INF' and '−INF' to refer
to the two values whose
·numericalValue· properties have
the ·special values·positiveInfinity and
negativeInfinity. These three precisionDecimal values are also
informally called "not-a-number", "positive
infinity", and "negative infinity". The latter two
together are called "the infinities".

Two numerical precisionDecimal values
are ordered (or equal) as their
·numericalValue· values are ordered (or equal).
(This means
that
two zeroes with
different ·sign·s
are equal;
negative zeroes are not ordered less than positive zeroes.)

INF is equal only to itself, and is greater than
−INF and all numerical precisionDecimal values.

−INF is equal only to itself, and is less than
INF and all numerical precisionDecimal values.

Note: As specified elsewhere, enumerations test values for equality with one of the
enumerated values. Because NaN ≠ NaN, including NaN in an enumeration
does not have the effect of accepting NaNs as instances of the enumerated
type; a union with a NaN-only datatype (which may be derived using the
pattern "NaN") can be used instead.

3.3.4.2 Lexical Mapping

precisionDecimal's lexical space is the set of all
decimal numerals with or without a decimal
point, numerals in scientific (exponential) notation, and
the character strings 'INF',
'+INF', '-INF',
and 'NaN'.

3.3.5 float

[Definition:] The
float datatype
is
patterned after the IEEE
single-precision 32-bit floating point datatype
↓[IEEE 754-1985]↓↑[IEEE 754-2008]↑.
Its
value space is a subset of the
rational numbers. Floating point numbers are often used to
approximate arbitrary real numbers.

3.3.5.1 Value Space

The ·value space· of float contains the
non-zero numbers m × 2e ,
where m is an integer whose absolute value is less than 224,
and e is an integer between −149 and 104, inclusive. In addition to
these values, the ·value space· of float also contains
the following ·special values·: positiveZero,
negativeZero, positiveInfinity,
negativeInfinity, and notANumber.

Note: As explained below, the
·lexical representation·
of the float
value notANumber is 'NaN'. Accordingly, in English
text we generally use 'NaN' to refer to that value. Similarly,
we use 'INF' and '−INF' to refer to the two
values positiveInfinity and negativeInfinity,
and '0' and '−0' to refer to
positiveZero and negativeZero.

Equality is identity, except that 0 = −0 (although
they are not identical) and NaN ≠ NaN
(although NaN is of course identical to itself).

0 and −0 are thus equivalent
for purposes of enumerations and
identity constraints, as well as for minimum and maximum values.

For the basic values, the order relation
on float is the order relation for rational numbers. INF is greater
than all other non-NaN values; −INF is less than all other non-NaN
values. NaN is ·incomparable· with any value
in the ·value space· including itself. 0 and −0
are greater than all the negative numbers and less than all the positive
numbers.

Note: The Schema 1.0 version of this datatype did not differentiate between
0 and −0 and NaN was equal to itself. The changes were
made to make the datatype more closely mirror ↓[IEEE 754-1985]↓↑[IEEE 754-2008]↑.

Note: As specified elsewhere, enumerations test values for equality with one of the
enumerated values. Because NaN ≠ NaN, including NaN in an enumeration
does not have the effect of accepting NaNs as instances of the enumerated
type; a union with a NaN-only datatype (which may be derived using the
pattern "NaN") can be used instead.

3.3.5.2 Lexical Mapping

The ·lexical space· of float is
the set of all decimal numerals with or without a decimal
point, numerals in scientific (exponential) notation, and
the ·literals·
'INF', '+INF',
'-INF',
and 'NaN'

Since IEEE allows some variation in rounding of values, processors
conforming to this specification may exhibit some variation in their
·lexical mappings·.

The ·lexical mapping··floatLexicalMap· is
provided as an example of a simple algorithm that yields a conformant mapping,
and that provides the most accurate rounding possible—and is thus useful
for insuring inter-implementation reproducibility and inter-implementation
round-tripping. The simple rounding
algorithm used in ·floatLexicalMap· may be more efficiently
implemented using the algorithms of [Clinger, WD (1990)].

Note: The Schema 1.0 version of this datatype did not permit rounding
algorithms whose results differed from [Clinger, WD (1990)].

3.3.6 double

[Definition:] The double
datatype is
patterned after the
IEEE double-precision 64-bit floating point datatype
↓[IEEE 754-1985]↓↑[IEEE 754-2008]↑.
Each floating
point datatype has a value space that is a subset of the
rational numbers. Floating point numbers are often used to
approximate arbitrary real numbers.

Note: The only significant differences between float and double are
the three defining constants 53 (vs 24), −1074 (vs −149),
and 971 (vs 104).

3.3.6.1 Value Space

The ·value space· of double contains the
non-zero numbers m × 2e ,
where m is an integer whose absolute value is less than 253,
and e is an integer between −1074 and 971, inclusive. In addition to
these values, the ·value space· of double also contains
the following ·special values·: positiveZero,
negativeZero, positiveInfinity,
negativeInfinity, and notANumber.

Note: As explained below, the
·lexical representation·
of the double
value notANumber is 'NaN'. Accordingly, in English
text we generally use 'NaN' to refer to that value. Similarly,
we use 'INF' and '−INF' to refer to the two
values positiveInfinity and negativeInfinity,
and '0' and '−0' to refer to
positiveZero and negativeZero.

Equality is identity, except that 0 = −0 (although
they are not identical) and NaN ≠ NaN
(although NaN is of course identical to itself).

0 and −0 are thus equivalent for purposes of enumerations,
identity constraints, and minimum and maximum values.

For the basic values, the order relation
on double is the order relation for rational numbers. INF is greater
than all other non-NaN values; −INF is less than all other non-NaN
values. NaN is ·incomparable· with any value
in the ·value space· including itself. 0 and −0
are greater than all the negative numbers and less than all the positive
numbers.

Note: The Schema 1.0 version of this datatype did not differentiate between
0 and −0 and NaN was equal to itself. The changes were
made to make the datatype more closely mirror ↓[IEEE 754-1985]↓↑[IEEE 754-2008]↑.

Note: As specified elsewhere, enumerations test values for equality with one of the
enumerated values. Because NaN ≠ NaN, including NaN in an enumeration
does not have the effect of accepting NaNs as instances of the enumerated
type; a union with a NaN-only datatype (which may be derived using the
pattern "NaN") can be used instead.

3.3.6.2 Lexical Mapping

The ·lexical space· of double is
the set of all decimal numerals with or without a decimal
point, numerals in scientific (exponential) notation, and
the ·literals·
'INF', '+INF',
'-INF', and 'NaN'

Since IEEE allows some variation in rounding of values, processors
conforming to this specification may exhibit some variation in their
·lexical mappings·.

The ·lexical mapping··doubleLexicalMap· is
provided as an example of a simple algorithm that yields a conformant mapping,
and that provides the most accurate rounding possible—and is thus useful
for insuring inter-implementation reproducibility and inter-implementation
round-tripping. The simple rounding
algorithm used in ·doubleLexicalMap· may be more efficiently
implemented using the algorithms of [Clinger, WD (1990)].

Note: The Schema 1.0 version of this datatype did not permit rounding
algorithms whose results differed from [Clinger, WD (1990)].

3.3.7 duration

[Definition:] duration
is a datatype that represents
durations of time. The concept of duration being captured is
drawn from those of [ISO 8601], specifically
durations without fixed endpoints. For example,
"15 days" (whose most common lexical representation
in duration is "'P15D'") is
a duration value; "15 days beginning 12 July
1995" and "15 days ending 12 July 1995" are
not duration
values. duration can provide addition and
subtraction operations between duration values and
between duration/dateTime value pairs,
and can be the result of subtracting dateTime
values. However, only addition to dateTime
is required for XML Schema processing and is
defined in
the function ·dateTimePlusDuration·.

3.3.7.1 Value Space

Duration values can be modelled as
two-property tuples. Each value consists of an integer number of
months and a decimal number of seconds. The
·seconds· value must not be negative if the
·months· value is positive and must not be
positive if the ·months· is negative.

If all four resulting dateTime value pairs are ordered
the same way (less than, equal, or greater than), then the original
pair of duration values is ordered the same way;
otherwise the original pair is ·incomparable·.

Note: These four values are chosen so as to maximize
the possible differences in results that could occur,
such as the difference when adding P1M and P30D:
1697-02-01T00:00:00Z + P1M < 1697-02-01T00:00:00Z + P30D ,
but
1903-03-01T00:00:00Z + P1M > 1903-03-01T00:00:00Z + P30D ,
so that P1M <> P30D .
If two duration values are ordered the same way
when added to each of these four dateTime values,
they will retain the same order when added
to any other dateTime
values. Therefore,
two duration values are incomparable if and only
if they can ever result in different orders when added to anydateTime value.

Under the definition just given,
two duration values are equal if and only if they are identical.

Note: There are many ways to implement duration,
some of which do not base the implementation on the two-component
model. This specification does not prescribe any particular
implementation, as long as the visible results are isomorphic to those
described herein.

The expression '.*[YMDHS].*' matches only
strings in which at least one field occurs.

The expression '.*[^T]' matches
only strings in which 'T' is not the final character, so that
if 'T' appears, something follows it. The first rule
ensures that what follows 'T' will be an hour,
minute, or second field.

The intersection of these three regular expressions is equivalent to
the following (after removal of the white space inserted here for
legibility):

3.3.8.1 Value Space

Note: In version 1.0 of this specification, the
·year· property was not permitted to have the value
zero. The year before the year 1
in the proleptic Gregorian calendar, traditionally referred to as
1 BC or as
1 BCE, was represented by a
·year· value of −1, 2 BCE by −2, and so
forth. Of course, many, perhaps most,
references to 1 BCE (or 1 BC) actually refer not
to a year in the proleptic Gregorian calendar but to a year in the
Julian or "old style" calendar; the two correspond
approximately but not exactly to each other.

In this version of this specification,
two changes are made in order to agree with existing usage.
First, ·year· is permitted to have the value zero.
Second, the interpretation of
·year· values is changed accordingly: a ·year· value of zero represents 1 BCE, −1
represents 2 BCE, etc. This representation simplifies interval
arithmetic and leap-year calculation for dates before the common
era (which may be why astronomers
and others interested in such calculations with the proleptic
Gregorian calendar have adopted it), and is consistent with the
current edition of [ISO 8601].

Note that 1 BCE, 5 BCE, and so on (years 0000, -0004, etc. in the
lexical representation defined here) are leap years in the proleptic
Gregorian calendar used for the date/time datatypes defined here.
Version 1.0 of this specification was unclear about the treatment of
leap years before the common era.
If existing
schemas or data specify dates of 29 February for any years before the
common era, then some values giving
a date of 29 February which were valid under a plausible
interpretation of XSD 1.0 will be invalid under this specification,
and some which were invalid will be valid. With that possible
exception, schemas and data valid
under the old interpretation remain valid under the new.

Constraint: Day-of-month Values

The ·day· value
must be
no more than 30 if ·month·
is one of 4, 6, 9, or 11;
no more than 28
if ·month· is 2 and
·year· is not divisible 4,
or is divisible by 100 but not by 400;
and no more than 29 if ·month·
is 2 and ·year·
is divisible by 400, or by 4 but not by 100.

Note: Since the order of a dateTime
value having a ·timezoneOffset·
relative to another value whose
·timezoneOffset· is absent is determined
by imputing time zone offsets of both +14:00
and −14:00 to the
value with no time zone offset, many such
combinations will be
·incomparable· because the two imputed
time zone offsets yield different orders.

Although dateTime and other
types related to dates and times have only a partial order, it
is possible for datatypes derived from dateTime to have
total orders, if they are restricted (e.g. using the
pattern facet) to the subset of values with, or
the subset of values without, time zone offsets. Similar restrictions
on other date- and time-related types will similarly produce
totally ordered subtypes. Note, however, that
such restrictions do not affect the value shown, for a given
Simple Type Definition, in the ordered facet.

Note: Order and equality are essentially the same for
dateTime in this version of this specification as
they were in version 1.0. However, since values
now distinguish time zone offsets, equal
values with different ·timezoneOffset·s
are not identical, and values with extreme
·timezoneOffset·s may no longer be equal
to any value with a smaller ·timezoneOffset·.

Within a dateTimeLexicalRep, a dayFragmust not
begin with the digit '3' or be '29'
unless the value to
which it would map would satisfy the value constraint on
·day· values
("Constraint: Day-of-month Values") given above.

In such representations:

yearFrag is a numeral consisting
of at least four decimal digits, optionally preceded by a minus sign;
leading '0' digits are prohibited except to bring the
digit count up to four.
It represents the ·year· value.

timezoneFrag, if present, specifies an
offset between UTC and local time.
Time zone offsets are a count of minutes (expressed in
timezoneFrag as a count of hours and minutes) that are added
or subtracted from UTC time to get the "local" time.
'Z' is an alternative representation of the time zone offset
'00:00',
which is, of course, zero minutes from UTC.

For example, 2002-10-10T12:00:00−05:00
(noon on 10 October 2002, Central Daylight
Savings Time as well as Eastern Standard Time
in the U.S.) is equal to 2002-10-10T17:00:00Z,
five hours later than 2002-10-10T12:00:00Z.

Note: For the most part, this specification adopts the distinction between
'timezone' and 'timezone offset' laid
out in [Timezones].
Version 1.0 of this specification did not make this distinction,
but used the term 'timezone' for the time zone
offset information associated with date- and time-related datatypes.
Some traces of the earlier usage remain visible in this and other
specifications. The names
timezoneFrag
and explicitTimezone
are such traces ;
others will be found in the names of functions defined in
[XQuery 1.0 and XPath 2.0 Functions and Operators], or in references in this specification to
"timezoned" and "untimezoned" values.

The dateTimeLexicalRep production
is equivalent to this regular expression
once whitespace is removed.

dateTime has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

A calendar (or
"local time") day with a larger positive
time zone offset begins earlier than the same calendar day with
a smaller (or negative)
time zone offset. Since the time zone offsets allowed spread over 28 hours,
it is
possible for the period denoted by a given calendar day with one
time zone offset to be completely disjoint from the period denoted by
the same calendar day with a different offset
— the earlier day ends before the
later one starts.
The moments in time
represented by a single calendar day are spread over a 52-hour
interval, from the beginning of the day in the +14:00 time zone offset to the
end of that day in the −14:00 time zone offset.

Note: The relative
order of two time values, one of which has a ·timezoneOffset· of absent is determined by imputing
time zone offsets of both +14:00 and −14:00 to the value without an offset. Many such combinations will be ·incomparable· because the two imputed time zone offsets yield
different orders. However, for a given untimezoned value,
there will always be timezoned values at one or both ends of the
52-hour interval that are ·comparable· (because the interval of
·incomparability· is only 24
hours wide).

Date values with different
time zone offsets that were identical in the 1.0 version
of this specification, such as 2000-12-12+13:00 and 2000-12-11−11:00, are
in this version of this specification equal (because they begin at the
same moment on the time line) but are not identical (because they have
and retain different time zone offsets).

3.3.9.2 Lexical Mappings

The lexical representations for time
are "projections" of
those of dateTime, as follows:

3.3.9.3
Facets

time has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.3.10 date

[Definition:] date
represents top-open intervals of exactly one day in length on the timelines of
dateTime, beginning on the beginning moment of each
day, up to but not including the beginning
moment of the next day). For nontimezoned values, the top-open
intervals disjointly cover the nontimezoned timeline,
one per day. For timezoned
values, the intervals begin at every minute and therefore overlap.

3.3.10.1 Value Space

The ·day· value must be
no more than 30 if ·month·
is one of 4, 6, 9, or 11, no more than 28
if ·month· is 2 and
·year· is not divisble 4,
or is divisible by 100 but not by 400,
and no more than 29 if ·month·
is 2 and ·year·
is divisible by 400, or by 4 but not by 100.

Note: In version 1.0 of this specification, date values
did not retain a time zone offset explicitly, but for
offsets
not too far from
zero
their time zone offset could be recovered based on
their value's first moment on the timeline. The
date/timeSevenPropertyModel retains all time zone offsets.

A day is a calendar (or "local
time") day offset from ·UTC·
by the appropriate interval;
this is now true for all ·day·
values, including those with time zone offsets outside the range
+12:00 through -11:59 inclusive:

2000-12-12+13:00 < 2000-12-12+11:00
(just as 2000-12-12+12:00 has always been less than
2000-12-12+11:00, but in version 1.0
2000-12-12+13:00 > 2000-12-12+11:00 ,
since 2000-12-12+13:00's "recoverable
time zone offset" was −11:00)

Within a
dateLexicalRep,
a dayFragmust not
begin with the digit '3' or be '29'
unless the value to
which it would map would satisfy the value constraint on
·day· values
("Constraint: Day-of-month Values") given above.

The dateLexicalRep production
is equivalent to this
regular expression:

3.3.10.3 Facets

date has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.3.11 gYearMonth

Note: Because month/year combinations in one calendar only rarely correspond
to month/year combinations in other calendars, values of this type
are not, in general, convertible to simple values corresponding to month/year
combinations in other calendars. This type should therefore be used
with caution in contexts where conversion to other calendars is desired.

gYearMonth has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.3.12 gYear

gYear
represents Gregorian calendar years.

Note:
Because years in one calendar only rarely correspond to years
in other calendars, values of this type
are not, in general, convertible to simple values corresponding to years
in other calendars. This type should therefore be used with caution
in contexts where conversion to other calendars is desired.

3.3.12.3
Facets

gYear has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.3.13 gMonthDay

gMonthDay represents whole calendar
days that recur at the same point in each calendar year, or that occur
in some arbitrary calendar year. (Obviously,
days beyond 28 cannot occur in all Februaries; 29 is nonetheless
permitted.)

This datatype can be used, for example, to record
birthdays; an instance of the datatype could be used to say that
someone's birthday occurs on the 14th of September every year.

Note:
Because day/month combinations in one calendar only rarely correspond
to day/month combinations in other calendars, values of this type do not,
in general, have any straightforward or intuitive representation
in terms of most other calendars. This type should therefore be
used with caution in contexts where conversion to other calendars
is desired.

Note: In version 1.0 of this specification, gMonthDay values
did not retain a time zone offset explicitly, but for time zone offsets not too far from
·UTC· their time zone offset could be recovered based on
their value's first moment on the timeline. The
date/timeSevenPropertyModel retains all time zone offsets.

A day is a calendar (or "local time") day
offset from ·UTC·
by the appropriate interval;
this is now true for all ·day·
values, including those with time zone offsets outside the range
+12:00 through -11:59 inclusive:

--12-12+13:00 < --12-12+11:00
(just as --12-12+12:00 has always been less than
--12-12+11:00, but in version 1.0
--12-12+13:00 > --12-12+11:00 , since
--12-12+13:00's "recoverable
time zone offset" was −11:00)

3.3.13.2 Lexical
Mapping

The lexical representations for
gMonthDay are "projections"
of those of dateTime, as follows:

Within a gMonthDayLexicalRep, a dayFragmust not
begin with the digit '3' or be '29'
unless the value to
which it would map would satisfy the value constraint on
·day· values
("Constraint: Day-of-month Values") given above.

gMonthDay has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.3.14 gDay

[Definition:] gDay
represents
whole days within an arbitrary month—days that recur at the same
point in each (Gregorian) month. This datatype is used to represent a specific day of the month.
To indicate, for example, that an employee gets a paycheck on the 15th of each month. (Obviously, days
beyond 28 cannot occur in all months; they are nonetheless permitted, up to 31.)

Note: Because days in one calendar only rarely
correspond to days in other calendars,
gDay
values do not, in general, have any straightforward or
intuitive representation in terms of most
non-Gregorian
calendars.
gDay
should therefore be used with caution in contexts where conversion to
other calendars is desired.

Note:
Time zone offsets do not cause wrap-around at the end of the month:
the last day of a
given month with a time zone offset of
−13:00 may start after the first
day of the next month
with offset +13:00, as
measured on the global timeline,
but nonetheless
---01+13:00 < ---31−13:00 .

3.3.14.2 Lexical Mapping

The lexical representations for gDay are
"projections"
of those of dateTime, as follows:

3.3.14.3
Facets

gDay has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.3.15 gMonth

gMonth
represents whole (Gregorian) months
within an arbitrary year—months that recur at the same point in
each year. It might be used, for example, to say what
month annual Thanksgiving celebrations fall in different countries
(--11 in the United States, --10 in Canada, and possibly other months in
other countries).

Note:
Because months in one calendar only rarely correspond
to months in other calendars, values of this type do not,
in general, have any straightforward or intuitive representation
in terms of most other calendars. This type should therefore be
used with caution in contexts where conversion to other calendars
is desired.

3.3.15.3
Facets

gMonth has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.3.16 hexBinary

3.3.16.1 Value Space

The ·value space· of
hexBinary
is the set of ↓possibly empty↓
finite-length sequences of ↑zero or more↑
binary octets. The
length of a value is the number of octets.

3.3.16.2 Lexical Mapping

hexBinary's ·lexical space·
consists of strings of hex (hexadecimal) digits, two consecutive digits
representing each octet in the corresponding value (treating the octet
as the binary representation of a number between 0 and 255). For
example, '0FB7' is a ·lexical representation· of the
two-octet value 00001111 10110111.

3.3.17 base64Binary

[Definition:] base64Binary represents arbitrary
Base64-encoded binary
data.
For base64Binary data the entire binary stream is encoded
using the Base64 Encoding
defined in [RFC 3548], which is derived from the encoding
described in [RFC 2045].

3.3.17.1 Value Space

The ·value space· of
base64Binary is the set of ↓possibly
empty↓ finite-length sequences of
↑zero or more↑
binary octets. The
length of a value is the number of octets.

3.3.17.2 Lexical
Mapping

The ·lexical representations· of
base64Binary
values are limited to the 65 characters of the Base64 Alphabet defined in
[RFC 3548],
i.e., a-z, A-Z,
0-9, the plus sign (+), the forward slash (/) and the
equal sign (=), together with
the space character
(#x20). No other characters are allowed.

For compatibility with older mail gateways, [RFC 2045]
suggests that Base64 data should have lines limited to at most 76
characters in length. This line-length limitation is not
required by [RFC 3548]
and is not mandated in the ·lexical representations· of
base64Binary
data. It
must not
be enforced by XML Schema processors.

Note: The above definition of the ·lexical space· is more restrictive than
that given in [RFC 2045] as regards whitespace —
and less restrictive than [RFC 3548].
This is
not an issue in practice. Any string compatible with
either
RFC can occur in an element or attribute
validated by this type, because the ·whiteSpace·
facet of this type is fixed to collapse, which means that all
leading and trailing whitespace will be stripped, and all internal
whitespace collapsed to single space characters, before
the above grammar is enforced. The
possibility of ignoring whitespace in Base64 data is foreseen in
clause 2.3 of [RFC 3548], but for the reasons given there
this specification does not allow implementations to ignore
non-whitespace characters which are not in the Base64
Alphabet.

The canonical ·lexical representation·
of a
base64Binary
data value is the Base64 encoding of the value which matches the
Canonical-base64Binary production in the following grammar:

Note on encoding: [RFC 2045] and
[RFC 3548] explicitly
reference US-utf-8 encoding. However,
decoding of base64Binary data in an XML entity is to be performed on the
Unicode characters obtained after character encoding processing as specified by
[XML].

3.3.17.3
Facets

base64Binary and all datatypes
derived from it by restriction have the
following ·constraining facets· with fixed values; these
facets must not be changed from the values shown:

3.3.18 anyURI

[Definition:] anyURI represents an
Internationalized Resource Identifier Reference
(IRI). An anyURI value can be absolute or relative, and may
have an optional fragment identifier (i.e., it may be
an
IRI Reference). This type should be used
when
the value fulfills the role of
an IRI,
as defined in [RFC 3987] or its successor(s) in the IETF
Standards Track.

Note: IRIs may be used to locate resources
or simply to identify them. In the case where they are used to locate
resources using a URI, applications should use
the mapping from
anyURI
values to URIs given
by the ↓URI↓
reference escaping procedure defined in
↑[LEIRI] and in↑
Section
3.1 Mapping
of IRIs to URIs of [RFC 3987]
or its successor(s) in the IETF Standards Track.
This means that a wide range of internationalized resource identifiers
can be specified when an
anyURI
is called for, and still be understood as URIs per
[RFC 3986]
and its successor(s).

Note: For an anyURI value to be
usable in practice as an IRI, the result of applying to it
the algorithm defined in Section 3.1 of [RFC 3987]
should
be a string which is a legal URI according
to [RFC 3986]. (This is true at the time this document is published;
if in the future
[RFC 3987] and [RFC 3986] are replaced by other specifications
in the IETF Standards Track, the relevant constraints will be those
imposed by those successor specifications.)

Each URI scheme imposes specialized syntax rules
for URIs in that scheme, including restrictions on the syntax of
allowed fragment identifiers. Because it is impractical for processors
to check that a value is a context-appropriate URI reference,
neither the syntactic constraints defined by the definitions of individual
schemes nor the generic syntactic constraints defined by
[RFC 3987] and [RFC 3986] and their
successors are part of this datatype as defined here.
Applications which depend on anyURI values
being legal according to the rules of
the relevant specifications
should make arrangements to check values against the appropriate
definitions of IRI, URI, and specific schemes.

Note: Spaces are, in principle, allowed in the ·lexical space· of
anyURI,
however, their use is highly discouraged
(unless they are encoded by '%20').

Note: The definitions of URI in the current
IETF specifications define certain URIs as equivalent to each other.
Those equivalences are not part of this datatype as defined here:
if two "equivalent" URIs or IRIs are different character
sequences, they map to different values in this datatype.

3.3.18.3
Facets

anyURI and all datatypes
derived from it by restriction have the
following ·constraining facets· with fixed values; these
facets must not be changed from the values shown:

The mapping from lexical space to value space for a particular
QName·literal· depends on the namespace bindings in scope where the literal occurs.

When
QNames
appear in an XML context, the bindings to be used in
the ·lexical mapping· are those in the [in-scope namespaces] property of the
relevant element.
When this datatype is used in a non-XML host language,
the host language must specify what namespace bindings
are to be used.

The host language, whether XML-based or otherwise, may specify whether
unqualified names are bound to the default namespace (if any)
or not; the host language may also place this under user control.
If the host language does not specify otherwise,
unqualified names are bound to the default namespace.

Note: The default treatment of
unqualified names parallels that specified in
[Namespaces in XML] for element names (as opposed to that specified
for attribute names).

The use of
·length·, ·minLength· and
·maxLength·
on QName or
datatypes derived from QName is
deprecated. These
facets are meaningless for these datatypes, and all instances are
facet-valid with respect to them.
Future versions of this specification may
remove these facets for these
datatypes.

The exception is that in the derivation of a new type the
·literals· used to enumerate the allowed values may be (and in
the context of [XSD 1.1 Part 1: Structures] will be)
validated directly against NOTATION; this amounts to
verifying that the value is a QName and that the
QName is the
name of a NOTATION declared in the current schema.

↑

For compatibility (see Terminology (§1.6))
NOTATION
should be used only on attributes
and should only be used in schemas with no
target namespace.

3.4.1 normalizedString

[Definition:] normalizedString
represents white space normalized strings.
The ·value space· of normalizedString is the
set of strings that do not
contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters.
The ·lexical space· of normalizedString is the
set of strings that do not
contain the carriage return (#xD),
line feed (#xA)
nor tab (#x9) characters.
The ·base type· of normalizedString is string.

3.4.1.1
Facets

normalizedString has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.1.2 Derived datatypes

3.4.2 token

[Definition:] token
represents tokenized strings.
The ·value space· of token is the
set of strings that do not
contain the
carriage return (#xD),
line feed (#xA) nor tab (#x9) characters, that have no
leading or trailing spaces (#x20) and that have no internal sequences
of two or more spaces.
The ·lexical space· of token is the
set of strings that do not contain the
carriage return (#xD),
line feed (#xA) nor tab (#x9) characters, that have no
leading or trailing spaces (#x20) and that have no internal sequences
of two or more spaces.
The ·base type· of token is normalizedString.

3.4.2.1
Facets

token has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

This is the set of strings
accepted by the grammar given in
[RFC 3066],
which is now obsolete; the current specification of language
codes is more restrictive. The ·base type· of
language
is token.

Note: The regular expression above provides the only normative
constraint on the lexical and value spaces of this type. The
additional constraints imposed on language identifiers by
[BCP 47]
and its successor(s), and in particular their requirement that language
codes be registered with IANA or ISO if not given in ISO 639, are
not part of this datatype as defined here.

Note:[BCP 47] specifies
that language codes "are to be treated as case insensitive; there
exist conventions for capitalization of some of
the
subtags, but these MUST NOT be taken
to carry meaning."
Since the language datatype is
derived from string, it inherits from
string a one-to-one mapping from lexical
representations to values. The literals 'MN' and
'mn' (for
Mongolian)
therefore correspond to distinct values and
have distinct canonical forms. Users of this specification should be
aware of this fact, the consequence of which is that the
case-insensitive treatment of language values prescribed by
[BCP 47]
does not follow from the definition of
this datatype given here; applications which require
case-insensitivity
should make appropriate adjustments.

Note: The empty string is not a member of the ·value space·
of language. Some constructs which normally
take language codes as their values, however, also allow the
empty string. The attribute xml:lang defined by
[XML] is one example; there, the empty string
overrides a value which would otherwise be inherited, but
without specifying a new value.

One way to define the desired set of possible values is
illustrated by the schema document for the XML namespace
at http://www.w3.org/2001/xml.xsd, which defines the
attribute xml:lang as having a type which is a union
of language and an anonymous type whose
only value is the empty string:

3.4.3.1
Facets

language has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.4.1
Facets

NMTOKEN has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.5.1
Facets

NMTOKENS has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.6.1
Facets

Name has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.7.1
Facets

NCName has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.8.1
Facets

ID has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.9.1
Facets

IDREF has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.10.1
Facets

IDREFS has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.11.1
Facets

ENTITY has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.12.1
Facets

ENTITIES has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.13.1 Lexical representation

integer
has a lexical representation consisting of a finite-length sequence
of one or more
decimal digits (#x30-#x39) with an optional leading sign. If the sign is omitted,
"+" is assumed. For example: -1, 0, 12678967543233, +100000.

integer has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.14.1 Lexical representation

nonPositiveInteger
has a lexical representation consisting of
an optional preceding sign
followed by a non-empty
finite-length sequence of decimal digits (#x30-#x39).
The sign may be "+" or may be omitted only for
lexical forms denoting zero; in all other lexical forms, the negative
sign ('-') must be present.
For example: -1, 0, -12678967543233, -100000.

nonPositiveInteger has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.15.1 Lexical representation

negativeInteger
has a lexical representation consisting
of a negative sign ('-') followed by a non-empty finite-length sequence of
decimal digits (#x30-#x39). For example: -1, -12678967543233,
-100000.

negativeInteger has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.16 long

3.4.16.1 Lexical Representation

long
has a lexical representation consisting
of an optional sign followed by a non-empty finite-length
sequence of decimal digits (#x30-#x39). If
the sign is omitted, "+" is assumed.
For example: -1, 0,
12678967543233, +100000.

long has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.17 int

3.4.17.1 Lexical Representation

int
has a lexical representation consisting
of an optional sign followed by a non-empty finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example: -1, 0, 126789675, +100000.

int has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.18 short

3.4.18.1 Lexical representation

short
has a lexical representation consisting
of an optional sign followed by a non-empty finite-length sequence of decimal
digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example: -1, 0, 12678, +10000.

short has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.19 byte

3.4.19.1 Lexical representation

byte
has a lexical representation consisting
of an optional sign followed by a non-empty finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted, "+" is assumed.
For example: -1, 0, 126, +100.

byte has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.20.1 Lexical representation

nonNegativeInteger
has a lexical representation consisting of
an optional sign followed by a non-empty finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted,
the positive sign ('+') is assumed.
If the sign is present, it must be "+" except for lexical forms
denoting zero, which may be preceded by a positive ('+') or a negative ('-') sign.
For example:
1, 0, 12678967543233, +100000.

nonNegativeInteger has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.21 unsignedLong

3.4.21.1 Lexical representation

unsignedLong
has a lexical representation consisting of
an optional sign followed by a
non-empty
finite-length sequence of decimal digits (#x30-#x39).
If the sign is omitted, the positive sign
('+') is assumed. If the sign is present, it must be
'+' except for lexical forms denoting zero, which may
be preceded by a positive ('+') or a negative
('-') sign. For example: 0, 12678967543233,
100000.

unsignedLong has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.22 unsignedInt

3.4.22.1 Lexical representation

unsignedInt
has a lexical representation consisting
of an optional sign followed by a
non-empty
finite-length sequence of decimal digits (#x30-#x39).
If the sign is omitted, the positive sign
('+') is assumed. If the sign is present, it must be
'+' except for lexical forms denoting zero, which may
be preceded by a positive ('+') or a negative
('-') sign. For example: 0,
1267896754, 100000.

unsignedInt has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.23 unsignedShort

3.4.23.1 Lexical representation

unsignedShort
has a lexical representation consisting of
an optional sign followed by a
non-empty finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted, the positive sign
('+') is assumed. If the sign is present, it must be
'+' except for lexical forms denoting zero, which may
be preceded by a positive ('+') or a negative
('-') sign. For example: 0, 12678, 10000.

unsignedShort has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.24 unsignedByte

3.4.24.1 Lexical representation

unsignedByte
has a lexical representation consisting of
an optional sign followed by a
non-empty finite-length
sequence of decimal digits (#x30-#x39). If the sign is omitted, the positive sign
('+') is assumed. If the sign is present, it must be
'+' except for lexical forms denoting zero, which may
be preceded by a positive ('+') or a negative
('-') sign. For example: 0, 126, 100.

unsignedByte has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

3.4.25.1 Lexical representation

positiveInteger
has a lexical representation consisting
of an optional positive sign ('+') followed by a
non-empty finite-length
sequence of decimal digits (#x30-#x39).
For example: 1, 12678967543233, +100000.

positiveInteger has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

The lexical
space of yearMonthDuration consists of
strings which match the regular expression
'-?P((([0-9]+Y)([0-9]+M)?)|([0-9]+M))' or the
expression '-?P[0-9]+(Y([0-9]+M)?|M)', but the
formal definition of yearMonthDuration uses a
simpler regular expression in its ·pattern·
facet: '[^DT]*'. This pattern matches only
strings of characters which contain no 'D'
and no 'T', thus restricting the ·lexical space·
of duration to strings with no day, hour,
minute, or seconds fields.

yearMonthDuration has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

The lexical space of
dayTimeDuration consists of
strings in the ·lexical space· of duration which
match the regular expression '[^YM]*[DT].*';
this pattern eliminates all durations with year or month fields,
leaving only those with day, hour, minutes, and/or seconds
fields.

dayTimeDuration has the following
·constraining facets· with the values shown; these
facets may be ↓further restricted↓↑specified↑
in the derivation of new types↑, if the
value given is at least as restrictive as the one shown↑:

Note: The ordered facet has the value
partial even though the datatype is
in fact totally ordered, because (as explained in
ordered (§4.2.1)),
the value of that facet is unchanged by derivation.

3.4.28 dateTimeStamp

[Definition:]
The dateTimeStamp datatype is derived from
dateTime by giving the value required to its
explicitTimezone facet. The result is that all values of
dateTimeStamp are required to have explicit time zone offsets
and the datatype is totally ordered.

This section presents the mechanisms necessary to integrate datatypes into
the context of [XSD 1.1 Part 1: Structures], mostly in terms of
the schema
component
abstraction introduced there. The account of datatypes given in this
specification is also intended to be useful in other contexts.
Any specification or other formal system intending to use datatypes as
defined above, particularly if definition of new datatypes via
facet-based restriction is envisaged, will need to provide analogous
mechanisms for some, but not necessarily all, of what follows below.
For example, the {target namespace} and
{final} properties are required because of
particular aspects of [XSD 1.1 Part 1: Structures] which are not
in principle necessary for the use of datatypes as defined here.

The following sections provide full details on the properties and
significance of each kind of schema component involved in datatype
definitions. For each property, the kinds of values it is allowed to have is
specified. Any property not identified as optional is required to
be present; optional properties which are not present have
absent as their value.
Any property identified as a having a set, subset or ·list·
value may have an empty value unless this is explicitly ruled out: this is
not the same as absent.
Any property value identified as a superset or a subset of some set may
be equal to that set, unless a proper superset or subset is explicitly
called for.

The {fundamental facets} property provides some
basic information about the datatype being defined: its cardinality,
whether an ordering is defined for it by this specification,
whether it has upper and lower bounds, and whether it is numeric.

4.1.2 XML Representation of Simple Type Definition Schema Components

The XML representation for a Simple Type Definition schema component
is a <simpleType> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

A subset of
{restriction, extension, list,
union}, determined as follows.
[Definition:] Let
FS be
the actual value of the
final[attribute],
if present, otherwise the actual value of the
finalDefault[attribute] of the ancestor
schema element,
if present, otherwise the empty string. Then the property value is
the appropriate case among the following:

Note: In this case, a <restriction> element will invariably
be present.

↓Editorial Note: Priority Feedback Request↓

↓

Note that the rule just given allows ·unions· to be members of other
·unions·. This is a change from version 1.0 of this specification,
which prohibited ·unions· in {member type definitions} and replaced
any reference to a ·union·M, in the XML declaration of a
second ·union·U, with the members of M. This
had the unintended consequence that that if M had facets
they were lost, and U erroneously accepted values not
accepted by M. In order to correct this error, this
version of this specification allows ·unions· in
{member type definitions} and removes the wording which replaced
references to ·unions· with their members.

The XML Schema Working Group solicits input from implementors and
users of this specification as to whether this change is an acceptable
way of repairing the problem in version 1.0 of this specification, or
whether it would be preferable to allow ·unions· as members of other
·unions· only if they have an empty {facets} property. If such a
change would make this specification more (or less) attractive to
users or implementors, please let us know.

↓

Example

As an example, taken from a typical display oriented text markup language,
one might want to express font sizes as an integer between 8 and 72, or with
one of the tokens "small", "medium" or "large". The ·union·Simple Type Definition
below would accomplish that.

Note:
Since every value in the ·value space· is denoted by some
·literal·, and every ·literal· in the ·lexical space· maps to
some value, the requirement that the ·literal· be in the
·lexical space· entails the requirement that the value it
maps to should fulfill all of the constraints imposed by the
{facets} of the datatype. If
the datatype is a ·list·, the Datatype Valid constraint also
entails that each whitespace-delimited token in the list
be datatype-valid against the ·item type· of the list.
If the datatype is a ·union·, the Datatype Valid constraint
entails that the ·literal· be datatype-valid against at
least one of the ·member types·.

2.2If the {variety} of T is ·list·, then
each space-delimited substring of L is Datatype Valid with
respect to the {item type definition} of T. Let
V be the sequence consisting of the values identified by
Datatype Valid for each of those substrings, in order.

Note that whiteSpace facets and
other ·pre-lexical· facets
do not take part in checking Datatype
Valid. In cases where this specification is used in conjunction with
schema-validation of XML documents,
such facets are used to
normalize infoset values before the normalized results
are checked for datatype validity. In the case of unions the
·pre-lexical· facets to use are those
associated with B in
clause 2.3 above.
When more than one ·pre-lexical· facet
applies, the whiteSpace facet is applied first; the order in
which ·implementation-defined· facets
are applied is ·implementation-defined·.

The
definition of anySimpleType
is the root of the Simple Type Definition
hierarchy;
as such it mediates between the other
simple type definitions,
which all eventually trace back to it via their
{base type definition} properties,
and the
definition of anyType,
which is
its{base type definition}.

Note:·Implementation-defined· datatypes will normally have a value
other than 'http://www.w3.org/2001/XMLSchema' for the
{target namespace}
property. That namespace is controlled by the W3C and
datatypes will be added to it only by W3C or its designees.

[Definition:]
Each fundamental facet is a
schema component that provides a limited piece of information about
some aspect of each datatype. All ·fundamental
facet· components are defined in this section.
For example, cardinality is a
·fundamental facet·.
Most ·fundamental facets·
are given a value
fixed with each primitive datatype's definition, and this value is not changed by
subsequent ·derivations· (even when
it would perhaps be reasonable to expect an application to give a more accurate value based
on the constraining facets used to define the ·derivation·). The
cardinality and bounded facets
are exceptions to this rule; their values may change as a result of certain
·derivations·.

Note: Schema components are identified by kind. "Fundamental"
is not a kind of component. Each kind of ·fundamental facet·
("ordered",
"bounded", etc.) is
a separate kind of schema component.

Note: The value of any ·fundamental facet· component can always
be calculated from other properties of its ·owner·.
Fundamental facets are not required for schema processing,
but some applications use them.

4.2.1 ordered

For some datatypes,
this document specifies an order relation for their value spaces (see
Order (§2.2.3)); the ordered facet reflects
this. It takes the values total, partial,
and false, with the meanings described below.
For the ·primitive· datatypes,
the value of the ordered facet is
specified in Fundamental Facets (§F.1).
For ·ordinary· datatypes, the value is inherited without change
from the ·base type·.
For a ·list·, the value is always false;
for a ·union·, the value is computed as described below.

A false value means no order is prescribed;
a total value
assures that the prescribed order is a total
order; a partial value means
that the prescribed order is a partial
order, but not (for the primitive type in question) a total order.

Note:
The value false in the ordered facet does not
mean no partial or total ordering exists for the value
space, only that none is specified by this document for use in
checking upper and lower bounds. Mathematically, any set of values
possesses least one trivial partial ordering, in which every value
pair that is not equal is incomparable.

Note: When new datatypes are derived from datatypes with partial orders,
the constraints imposed can sometimes result in a value space
for which the ordering is total, or trivial. The value of the
ordered facet is not, however, changed to reflect this.
The value partial should therefore be interpreted with
appropriate caution.

Note: Some of the "real-world" datatypes which are the basis for those defined herein
are ordered in some applications, even though no order is prescribed for schema-processing
purposes. For example, boolean is sometimes ordered, and string
and ·list· datatypes ·constructed· from
ordered ·atomic· datatypes are sometimes given "lexical"
orderings. They are not ordered for schema-processing purposes.

4.2.2 bounded

Some ordered datatypes have the property that
there is one value greater than or equal to every other value, and
another that is
less than or equal to every other value. (In the case of
·ordinary·
datatypes, these two values
are
not necessarily in the value space of the derived datatype,
but they must be in the value
space of the primitive datatype from which they have been derived.)
The bounded facet value is boolean and is
generally true for such bounded datatypes.
However, it will remain false when the mechanism for imposing
such a bound is difficult to detect, as, for example, when the
boundedness occurs because of derivation using a pattern
component.

4.2.3 cardinality

Every value space has a specific number of members. This number can be characterized as
finite or infinite. (Currently there are no datatypes with infinite
value spaces larger than countable.) The cardinality facet value is
either finite or countably infinite and is generally finite for datatypes with
finite value spaces. However, it will remain countably infinite when the mechanism for
causing finiteness is difficult to detect, as, for example, when finiteness occurs because of a
derivation using a pattern component.

Note: Schema components are identified by kind. "Constraining"
is not a kind of component. Each kind of ·constraining facet·
("whiteSpace",
"length", etc.) is a separate kind of schema component.

This specification distinguishes three kinds of constraining facets:

[Definition:] A constraining facet which
is used to normalize an initial ·literal· before checking
to see whether the resulting character sequence is a member of a datatype's
·lexical space· is a pre-lexical facet.

Note: A reference to an ·unknown· facet might be a reference to
an ·implementation-defined· facet supported by some other processor,
or might be the result of a typographic error, or might
have some other explanation.

The descriptions of individual facets given
below include both constraints on Simple Type Definition components
and rules for checking the datatype validity of a given literal against
a given datatype. The validation rules typically depend upon having
a full knowledge of the datatype; full knowledge of the datatype,
in turn, depends on having a fully instantiated Simple Type Definition.
A full instantiation of the Simple Type Definition, and the checking
of the component constraints, require knowledge of the ·base type·.
It follows that if a datatype's ·base type· is ·unknown·, the
Simple Type Definition defining the datatype will be incompletely
instantiated, and the datatype itself will be ·unknown·.
Similarly, any datatype defined using an ·unknown··constraining facet·
will be ·unknown·. It is not possible to perform datatype validation
as defined here using ·unknown· datatypes.

Note: The preceding paragraph does not forbid implementations from attempting
to make use of such partial information as they have about ·unknown·
datatypes. But the exploitation of such partial knowledge is not
datatype validity checking as defined here and is to be distinguished
from it in the implementation's documentation and interface.

Note:
For string and datatypes derived from string,
length will not always coincide with "string length" as perceived
by some users or with the number of storage units in some digital representation.
Therefore, care should be taken when specifying a value for length
and in attempting to infer storage requirements from a given value for
length.

The following is the definition of a ·user-defined·
datatype to represent product codes which must be
exactly 8 characters in length. By fixing the value of the
length facet we ensure that types derived from productCode can
change or set the values of other facets, such as pattern, but
cannot change the length.

Note: The {fixed} property is defined for
parallelism with other facets and for compatiblity with version 1.0
of this specification. But it is a consequence of
length valid restriction (§4.3.1.4) that the value of
the length facet cannot be changed, regardless of
whether {fixed} is
true or false.

4.3.1.2 XML Representation of length Schema Components

The XML representation for a length schema
component is a <length> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

Note:
For string and datatypes derived from string,
minLength will not always coincide with "string length" as perceived
by some users or with the number of storage units in some digital representation.
Therefore, care should be taken when specifying a value for minLength
and in attempting to infer storage requirements from a given value for
minLength.

4.3.2.2 XML Representation of minLength Schema Component

The XML representation for a minLength schema
component is a <minLength> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

Note:
For string and datatypes derived from string,
maxLength will not always coincide with "string length" as perceived
by some users or with the number of storage units in some digital representation.
Therefore, care should be taken when specifying a value for maxLength
and in attempting to infer storage requirements from a given value for
maxLength.

4.3.3.2 XML Representation of maxLength Schema Components

The XML representation for a maxLength schema
component is a <maxLength> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

The following is the definition of a ·user-defined·
datatype which is a better representation of postal codes in the
United States, by limiting strings to those which are matched by
a specific ·regular expression·.

4.3.4.1 The pattern Schema Component

4.3.4.2 XML Representation of pattern Schema Components

The XML representation for a pattern schema
component is
one or more <pattern>
element information items. The
correspondences between the properties of the information item and
properties of the component are as follows:

Note: The {value} property
will only have more than one member when ·facet-based restriction· involves
a pattern facet at more than one step in a
type derivation. During validation, lexical forms will be
checked against every member of the resulting {value}, effectively
creating a conjunction of patterns.

In summary, ·pattern·
facets specified on the same step in a type
derivation are ORed together, while ·pattern·
facets specified on different steps of a type derivation
are ANDed together.

Thus, to impose two ·pattern· constraints simultaneously,
schema authors may either write a single ·pattern· which
expresses the intersection of the two ·pattern·s they wish to
impose, or define each ·pattern· on a separate type derivation
step.

Note:
As noted in Datatype (§2.1),
certain uses of the ·pattern· facet may
eliminate from the lexical space the canonical forms of some values
in the value space; this can be inconvenient for applications
which write out the canonical form of a value and rely on
being able to read it in again as a legal lexical form.
This specification provides no recourse in such situations;
applications are free to deal with it as they see fit.
Caution is advised.

Note: For components constructed from XML representations in schema documents,
the satisfaction of this constraint is a consequence of the XML mapping rules:
any pattern imposed by a simple type definition S will always
also be imposed by any type derived from S by ·facet-based restriction·.
This constraint ensures that components constructed by other means
(so-called "born-binary" components) similarly preserve
pattern facets across ·facet-based restriction·.

4.3.5.1 The enumeration Schema Component

4.3.5.2 XML Representation of enumeration Schema Components

The XML representation for an enumeration schema
component is
one or more <enumeration>
element information items. The
correspondences between the properties of the information item and
properties of the component are as follows:

After the processing implied by replace, contiguous
sequences of #x20's are collapsed to a single #x20, and ↓leading and
trailing #x20's are↓↑any #x20
at the start or end of the string is then↑ removed.

Note:
The notation #xA used here (and elsewhere in this specification)
represents the Universal Character Set (UCS) code point
hexadecimal A (line feed), which is denoted by
U+000A. This notation is to be distinguished from
&#xA;, which is the XML character reference to that same UCS
code point.

whiteSpace is applicable to all ·atomic· and
·list· datatypes. For all ·atomic·
datatypes other than string (and types derived
by ·facet-based restriction· from it) the value of whiteSpace is
collapse and cannot be changed by a schema author; for
string the value of whiteSpace is
preserve; for any type derived by
·facet-based restriction· from
string the value of whiteSpace can
be any of the three legal values. For all datatypes
·constructed· by ·list· the
value of whiteSpace is collapse and cannot
be changed by a schema author. For all datatypes
·constructed· by ·union·whiteSpace does not apply directly; however, the
normalization behavior of ·union· types is controlled by
the value of whiteSpace on that one of the
·basic members·
against which the ·union·
is successfully validated.

Note: The values "replace" and
"collapse" may appear to provide a
convenient way to "unwrap" text (i.e. undo the effects of
pretty-printing and word-wrapping). In some cases, especially
highly constrained data consisting of lists of artificial tokens
such as part numbers or other identifiers, this appearance is
correct. For natural-language data, however, the whitespace
processing prescribed for these values is not only unreliable but
will systematically remove the information needed to perform
unwrapping correctly. For Asian scripts, for example, a correct
unwrapping process will replace line boundaries not with blanks but
with zero-width separators or nothing. In consequence, it is
normally unwise to use these values for natural-language data, or
for any data other than lists of highly constrained tokens.

4.3.6.2 XML Representation of whiteSpace Schema Components

The XML representation for a whiteSpace schema
component is a <whiteSpace> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

4.3.7.2 XML Representation of maxInclusive Schema Components

The XML representation for a maxInclusive schema
component is a <maxInclusive> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

4.3.8.2 XML Representation of maxExclusive Schema Components

The XML representation for a maxExclusive schema
component is a <maxExclusive> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

4.3.9.2 XML Representation of minExclusive Schema Components

The XML representation for a minExclusive schema
component is a <minExclusive> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

4.3.10.2 XML Representation of minInclusive Schema Components

The XML representation for a minInclusive schema
component is a <minInclusive> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

For decimal,
if the {value} of totalDigits is
t, the effect is to require that values be equal to
i / 10n, for some
integers i and n, with
| i | < 10t
and
0 ≤ n ≤ t.
This has as a consequence that the values are expressible
using at most t digits in decimal notation.

The term 'totalDigits' is chosen to reflect the fact that
it restricts the ·value space· to those values that
can be represented lexically using at most
totalDigits digits in
decimal notation, or at most totalDigits digits
for the coefficient, in scientific notation.
Note that it does not restrict
the ·lexical space· directly; a lexical
representation that adds
non-significant
leading or trailing
zero digits is still permitted.
It also has no effect on the values
NaN, INF, and -INF.

4.3.11.2 XML Representation of totalDigits Schema Components

The XML representation for a totalDigits schema
component is a <totalDigits> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

4.3.12 fractionDigits

[Definition:] fractionDigits
places an upper limit on the
arithmetic precision
of decimal values: if the {value} of
fractionDigits = f, then the value space is
restricted to values equal to
i / 10n for some integers
i and
n and
0 ≤ n ≤ f.
The value of
fractionDigits·must· be a nonNegativeInteger

The term fractionDigits is chosen to reflect the fact that it
restricts the ·value space· to those values that can be
represented lexically
in decimal notation using at most
fractionDigits
to the right of the decimal point. Note that it does not restrict
the ·lexical space· directly; a
lexical representation that adds
non-significant
leading or trailing zero digits is still permitted.

Example

The following is the definition of a ·user-defined·
datatype which could be used to represent the magnitude
of a person's body temperature on the Celsius scale.
This definition would appear in a schema authored by an "end-user"
and shows how to define a datatype by specifying facet values which
constrain the range of the ·base type·.

4.3.12.2 XML Representation of fractionDigits Schema Components

The XML representation for a fractionDigits schema
component is a <fractionDigits> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

The term 'maxScale' is chosen to reflect the fact that it
restricts the ·value space· to those values that can
be represented lexically in scientific notation using an integer
coefficient and a scale (or negative exponent) no greater than
maxScale. (It has nothing to do with the use of the
term 'scale' to denote the radix or base of a
notation.) Note that maxScale does not restrict the
·lexical space· directly; a lexical representation
that adds non-significant leading or trailing zero digits, or that uses
a lower exponent with a non-integer coefficient is still permitted.

Example

The following is the definition of a user-defined
datatype which could be used to represent a floating-point decimal
datatype which allows seven decimal digits for the coefficient and
exponents between −95 and 96. Note that the scale is −1 times
the exponent.

4.3.13.2 XML Representation of maxScale Schema Components

The XML representation for a maxScale schema
component is a <maxScale> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

The term minScale is chosen to reflect the fact that it
restricts the ·value space· to those values that can
be represented lexically in exponential form using an integer
coefficient and a scale (negative exponent)
at least as large as minScale. Note that
it does not restrict the ·lexical space· directly; a
lexical representation that adds additional leading zero digits,
or that uses a larger exponent (and a correspondingly smaller coefficient)
is still permitted.

Example

The following is the definition of a user-defined
datatype which could be used to represent amounts in a decimal
currency; it corresponds to a SQL column definition of
DECIMAL(8,2). The effect is to allow values
between -999,999.99 and 999,999.99, with a fixed interval
of 0.01 between values.

4.3.14.2 XML Representation of minScale Schema Components

The XML representation for a minScale schema
component is a <minScale> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

4.3.15.1 The assertions Schema Component

4.3.15.2 XML Representation of assertions Schema Components

The XML representation for an assertions schema component is
one or more <assertion> element information items. The
correspondences between the properties of the information item and
properties of the component are as follows:

Note:
Annotations specified within an <assertion> element are captured by
the individual Assertion component to which it maps.

4.3.15.3 Assertions Validation Rules

The following rule refers to
"the nearest built-in" datatype
and to the "XDM representation" of a value
under a datatype.
[Definition:] For
any datatype T, the nearest built-in datatype to
T is the first ·built-in· datatype encountered in following
the chain of links connecting each datatype to its
·base type·. If T is a ·built-in· datatype, then the
nearest built-in datatype of T is T itself; otherwise,
it is the nearest built-in datatype of T's ·base type·.

[Definition:] For
any value V
and any datatype
T, the XDM representation of V under
T is
defined recursively as follows. Call the XDM representation
X. Then

A value V
is facet-valid with respect to an
assertions facet
belonging to a simple type T
if and only if the {test}
property of each Assertion in its {value} evaluates to true under the
conditions laid out below, without raising any
dynamic error or
type error.

Evaluation of {test} is performed as defined in
[XPath 2.0], with the following conditions:

1.1
The in-scope variables
in the static context
is a set with a single member. The expanded QName
of that member has no namespace
URI and
has
'value' as the local
name. The (static) type of the member is
anyAtomicType*.

Note: The XDM type label anyAtomicType* simply says
that for static typing purposes the variable $value
will have a value consisting of a sequence of zero or more
atomic values.

1.2 There is no context
item for the evaluation of the XPath expression.

Note:
As a consequence the expression '.',
or any implicit or
explicit reference to the context item, will raise a
dynamic error, which will cause the assertion to be treated as false.
If an error is detected statically, then the assertion
violates the schema component constraint
XPath Valid
and causes an error to be flagged in the schema.

The variable "$value" can be
used to refer to the value being checked.

4.3.15.4 Constraints on assertions Schema Components

Note: For components constructed from XML representations in schema documents,
the satisfaction of this constraint is a consequence of the XML mapping rules:
any assertion imposed by a simple type definition S will always
also be imposed by any type derived from S by ·facet-based restriction·.
This constraint ensures that components constructed by other means
(so-called "born-binary" components) similarly preserve
assertions facets across ·facet-based restriction·.

4.3.16 explicitTimezone

[Definition:] explicitTimezone is a
three-valued facet which can can be used to
require or prohibit the time zone offset in date/time datatypes.

The same effect could also be achieved using the
pattern facet, as shown below,
but it is somewhat less clear what is going on in
this derivation, and it is better practice to use
the more straightforward explicitTimezone
for this purpose.

4.3.16.2 XML Representation of explicitTimezone Schema Components

The XML representation for an explicitTimezone schema
component is an <explicitTimezone> element information item. The
correspondences between the properties of the information item and
properties of the component are as follows:

Note: The effect of this rule is to allow datatypes with
a explicitTimezone value of optional to be
restricted by specifying a value of required
or prohibited, and to forbid any other derivations
using this facet.

5 Conformance

XSD 1.1: Datatypes is intended
to be usable in a variety of contexts.

In the usual case, it will embedded in a
host language such as [XSD 1.1 Part 1: Structures],
which refers to this specification normatively to define some part of
the host language. In some cases, XSD 1.1: Datatypes may
be implemented independently of any host language.

[Definition:] Something
which may vary among conforming implementations, but which must
be specified by the implementor for each particular implementation,
is implementation-defined.

[Definition:] Something
which may vary among conforming implementations, is not specified by
this or any W3C specification, and is not required to be specified
by the implementor for any particular implementation,
is implementation-dependent.

5.1 Host Languages

When XSD 1.1: Datatypes is embedded in a host
language, the definition of conformance is specified by the
host language, not by this specification. That is, when this
specification is implemented in the context of an implementation
of a host language, the question of conformance to this
specification (separate from the host language) does not arise.

This specification imposes certain constraints on the
embedding of XSD 1.1: Datatypes by a host
language; these are indicated in the normative text by
the use of the verbs 'must', etc.,
with the phrase "host language" as the subject
of the verb.

Note: For convenience, the most important of these constraints
are noted here:

Host languages should specify that all of the datatypes decribed
here as built-ins are automatically available.

Host languages may specify that additional datatypes are also
made available automatically.

If user-defined datatypes are to be supported in the host language,
then the host language must specify how user-defined datatypes are
defined and made available for use.

In addition, host languages must require conforming
implementations of
the host language to obey all of the constraints and rules
specified here.

5.2 Independent implementations

[Definition:] Implementations claiming minimal conformance to this specification
independent of any host language must do
all of the following:

2Completely and correctly implement all of
rules governing the XML representation of simple type definitions
specified in Datatype components (§4).

3Map the XML representations of simple type definitions to
simple type definition components as specified in the mapping
rules given in Datatype components (§4).

Note: The term schema-document aware is used here for
parallelism with the corresponding term in [XSD 1.1 Part 1: Structures].
The reference to schema documents may be taken as referring
to the fact that schema-document-aware implementations accept
the XML representation of simple type definitions found in
XSD schema documents. It does not mean that
the simple type definitions must themselves be free-standing
XML documents, nor that they typically will be.

5.3 Conformance of data

Abstract representations of simple type definitions conform to this
specification if and only if they obey all of the ·constraints on schemas· defined in this
specification.

XML representations of simple type definitions conform to this
specification if they obey all of the applicable rules
defined in this specification.

Note: Because the conformance of the resulting simple type definition
component depends not only on the XML representation of a given
simple type definition, but on the properties of its
·base type·, the conformance of an XML representation of a
simple type definition does not guarantee that, in the
context of other schema components, it will map to
a conforming component.

5.4 Partial Implementation of Infinite Datatypes

Some ·primitive· datatypes defined in this specification have
infinite ·value spaces·; no finite implementation can completely
handle all their possible values. For some such datatypes, minimum
implementation limits are specified below. For other infinite types
such as string,
hexBinary, and
base64Binary, no minimum implementation limits are
specified.

When this specification is used in the context of other languages
(as it is, for example, by [XSD 1.1 Part 1: Structures]), the
host language may specify other minimum implementation limits.

When presented with a literal or value exceeding the capacity of
its partial implementation of a datatype, a minimally conforming
implementation of this specification will sometimes be unable to
determine with certainty whether the value is datatype-valid or
not. Sometimes it will be unable to represent the value correctly
through its interface to any downsteam application.

When either of these is so, a conforming processor must indicate
to the user and/or downstream application that it cannot process
the input data with assured correctness (much as it would indicate
if it ran out of memory). When the datatype validity of a value
or literal is uncertain because it exceeds the capacity of a
partial implementation, the literal or value must not be treated
as invalid, and the unsupported value must not be quietly changed
to a supported value.

This specification does not constrain the method used to indicate
that a literal or value in the input data has exceeded the
capacity of the implementation, or the form such indications take.

All ·minimally conforming· processors
must support decimal values whose absolute value can be expressed as
i / 10k, where
i and k are nonnegative integers such that
i < 1016 and
k ≤ 16 (i.e., those expressible with sixteen total
digits).

Note: The conformance limits given in the text correspond to those
of the decimal64 type defined in
↓the
current draft of IEEE 754R↓↑[IEEE 754-2008]↑,
which can be stored in a 64-bit field. The XML Schema Working Group
recommends that implementors support limits corresponding to those of
the decimal128 type. This entails supporting the values in the value
space of the otherwise unconstrained datatype for which
totalDigits is set to 34, maxScale to 6176,
and minScale to −6111.

↓Editorial Note: Priority Feedback Request↓

↓

The XML Schema Working Group requests feedback from implementors
and users of XML Schema concerning the minimum and recommended
implementation limits for precisionDecimal. If other
limits, larger or smaller, would make this datatype more attractive to
users or implementors, please let us know.

↓

A Schema for Schema Documents (Datatypes)
(normative)

The XML representation of the datatypes-relevant
part of the schema for schema documents is presented here
as a normative
part of the specification.

Like any other
XML document, schema documents may carry XML and document type declarations. An
XML declaration and a document type declaration are provided here for convenience.
Since
this schema document describes the XML Schema language, the targetNamespace
attribute on the schema element refers to the XML Schema namespace
itself.

The following, although in the form of a
schema document, does not conform to the rules for schema documents
defined in this specification. It contains explicit XML
representations of the primitive datatypes which need not be declared
in a schema document, since they are automatically included in every
schema, and indeed must not be declared in a schema document, since it
is forbidden to try to derive types with anyAtomicType
as the base type definition. It is included here as a form of
documentation.

The following, although in the form of a
schema document, contains XML representations of components already
present in all schemas by definition. It is included here as a form
of documentation.

Note: These datatypes do not need to be declared in a schema document,
since they are automatically included in every schema.

Issue (B-1933):

It is an open question whether this and similar XML documents should
be accepted or rejected by software conforming to this specification.
The XML Schema Working Group expects to resolve this question in connection
with its work on issues relating to schema composition.

In the meantime, some existing schema processors will accept
declarations for them; other existing processors will reject such
declarations as duplicates.