Abstract

The Extensible Messaging and Presence Protocol (XMPP) is an application profile of the Extensible Markup Language (XML) that enables the near-real-time exchange of structured yet extensible data between any two or more network entities. This document defines XMPP's core protocol methods: setup and teardown of XML streams, channel encryption, authentication, error handling, and communication primitives for messaging, network availability ("presence"), and request-response interactions.

Status of this Memo

This Internet-Draft is submitted in full
conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current
Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any time.
It is inappropriate to use Internet-Drafts as reference material or to cite
them other than as “work in progress.”

This Internet-Draft will expire on April 28, 2011.

Copyright Notice

Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.

Since 2004 the Internet community has gained extensive implementation and deployment experience with XMPP, including formal interoperability testing carried out under the auspices of the XMPP Standards Foundation (XSF). This document incorporates comprehensive feedback from software developers and service providers, including a number of backward-compatible modifications summarized under Appendix D (Differences from RFC 3920). As a result, this document reflects the rough consensus of the Internet community regarding the core features of XMPP 1.0, thus obsoleting RFC 3920.

1.3.
Functional Summary

This non-normative section provides a developer-friendly, functional summary of XMPP; refer to the sections that follow for a normative definition of XMPP.

The purpose of XMPP is to enable the exchange of relatively small pieces of structured data (called "XML stanzas") over a network between any two (or more) entities. XMPP is typically implemented using a distributed client-server architecture, wherein a client needs to connect to a server in order to gain access to the network and thus be allowed to exchange XML stanzas with other entities (which can be associated with other servers). The process whereby a client connects to a server, exchanges XML stanzas, and ends the connection is:

Exchange an unbounded number of XML stanzas with other entities on the network

Close the XML stream

Close the TCP connection

Within XMPP, one server can optionally connect to another server to enable inter-domain or inter-server communication. For this to happen, the two servers need to negotiate a connection between themselves and then exchange XML stanzas; the process for doing so is:

We define the following terms with regard to XML stanzas or parts thereof:

deliver:

for a server, to pass the data to a connected client

ignore:

for a client or server, to discard the data without acting upon it, presenting it a human user, or returning an error to the sender

route:

for a server, to pass the data to a remote server for subsequent delivery

In examples, lines have been wrapped for improved readability, "[...]" means elision, and the following prepended strings are used (these prepended strings are not to be sent over the wire):

C: = a client

E: = any XMPP entity

I: = an initiating entity

P: = a peer server

R: = a receiving entity

S: = a server

S1: = server1

S2: = server2

Readers need to be aware that the examples are not exhaustive and that, in examples for some protocol flows, the alternate steps shown would not necessarily be triggered by the exact data sent in the previous step; in all cases the protocol definitions specified in this document or in normatively referenced documents rule over any examples provided here.

1.5.
Acknowledgements

This document is an update to, and derived from, RFC 3920. This document would have been impossible without the work of the contributors and commenters acknowledged there.

Hundreds of people have provided implementation feedback, bug reports, requests for clarification, and suggestions for improvement since publication of RFC 3920. Although the document editor has endeavored to address all such feedback, he is solely responsible for any remaining errors and ambiguities.

1.6.
Discussion Venue

The document editor and the broader XMPP developer community welcome discussion and comments related to the topics presented in this document. The primary and preferred venue is the <xmpp@ietf.org> mailing list, for which archives and subscription information are available at https://www.ietf.org/mailman/listinfo/xmpp. Related discussions often occur on the <standards@xmpp.org> mailing list, for which archives and subscription information are available at http://mail.jabber.org/mailman/listinfo/standards.

2.
Architecture

XMPP provides a technology for the asynchronous, end-to-end exchange of structured data by means of direct, persistent XML streams among a distributed network of globally-addressable, presence-aware clients and servers. Because this architectural style involves ubiquitous knowledge of network availability and a conceptually unlimited number of concurrent information transactions in the context of a given client-to-server or server-to-server session, we label it "Availability for Concurrent Transactions" (ACT) to distinguish it from the "Representational State Transfer" [REST] (Fielding, R., “Architectural Styles and the Design of Network-based Software Architectures,” .) architectural style familiar from the World Wide Web. Although the architecture of XMPP is similar in important ways to that of email (see [EMAIL‑ARCH] (Crocker, D., “Internet Mail Architecture,” July 2009.)), it introduces several modifications to facilitate communication in close to real time. The salient features of this ACTive architectural style are as follows.

2.1.
Global Addresses

As with email, XMPP uses globally-unique addresses (based on the Domain Name System) in order to route and deliver messages over the network. All XMPP entities are addressable on the network, most particularly clients and servers but also various additional services that can be accessed by clients and servers. In general, server addresses are of the form <domain.tld> (e.g., <im.example.com>), accounts hosted at a server are of the form <localpart@domainpart> (e.g., <juliet@im.example.com>), and a particular connected device or resource that is currently authorized for interaction on behalf of an account is of the form <localpart@domainpart/resourcepart> (e.g., <juliet@im.example.com/balcony>). For historical reasons, XMPP addresses are often called Jabber IDs or JIDs. Because the formal specification of the XMPP address format depends on internationalization technologies that are in flux at the time of writing, the format is defined in [XMPP‑ADDR] (Saint-Andre, P., “Extensible Messaging and Presence Protocol (XMPP): Address Format,” October 2010.) instead of this document.

2.2.
Presence

XMPP includes the ability for an entity to advertise its network availability or "presence" to other entities. Such availability for communication is signalled end-to-end via dedicated communication primitives in XMPP (the <presence/> stanza). Although knowledge of network availability is not strictly necessary for the exchange of XMPP messages, it facilitates real-time interaction because the originator of a message can know before initiating communication that the intended recipient is online and available. End-to-end presence is defined in [XMPP‑IM] (Saint-Andre, P., “Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence,” October 2010.).

2.3.
Persistent Streams

Availability for communication is also built into a point-to-point "hop" through the use of persistent XML streams over long-lived TCP connections. These "always-on" client-to-server or server-to-server streams enable each party to push data to the other party at any time for immediate routing or delivery. XML streams are defined under Section 4 (XML Streams).

2.4.
Structured Data

The basic protocol data unit in XMPP is not an XML stream (which simply provides the transport for point-to-point communication) but an XML "stanza", which is essentially a fragment of XML that is sent over a stream. The root element of a stanza includes routing attributes (such as "from" and "to" addresses) and the child elements of the stanza contain a payload for delivery to the intended recipient. XML stanzas are defined under Section 8 (XML Stanzas).

2.5.
Distributed Network of Clients and Servers

In practice, XMPP consists of a network of clients and servers that inter-communicate (however, communication between any two given deployed servers is strictly OPTIONAL). Thus, for example, the user <juliet@im.example.com> associated with the server <im.example.com> might be able to exchange messages, presence, and other structured data with the user <romeo@example.net> associated with the server <example.net>. This pattern is familiar from messaging protocols that make use of global addresses, such as the email network (see [SMTP] (Klensin, J., “Simple Mail Transfer Protocol,” October 2008.) and [EMAIL‑ARCH] (Crocker, D., “Internet Mail Architecture,” July 2009.)). As a result, end-to-end communication in XMPP is logically peer-to-peer but physically client-to-server-to-server-to-client, as illustrated in the following diagram.

The following paragraphs describe the responsibilities of clients and servers on the network.

A client is an entity that establishes an XML stream with a server by authenticating using the credentials of a local account and that then completes resource binding (Resource Binding) in order to enable delivery of XML stanzas between the server and the client over the negotiated stream. The client then uses XMPP to communicate with its server, other clients, and any other entities on the network, where the server is responsible for delivering stanzas to local entities or routing them to remote entities. Multiple clients can connect simultaneously to a server on behalf of the same local account, where each client is differentiated by the resourcepart of an XMPP address (e.g., <juliet@im.example.com/balcony> vs. <juliet@im.example.com/chamber>), as defined under [XMPP‑ADDR] (Saint-Andre, P., “Extensible Messaging and Presence Protocol (XMPP): Address Format,” October 2010.) and Section 7 (Resource Binding).

A server is an entity whose primary responsibilities are to:

Manage XML streams (XML Streams) with local clients and deliver XML stanzas (XML Stanzas) to those clients over the negotiated streams; this includes responsibility for ensuring that a client authenticates with the server before being granted access to the XMPP network.

3.
TCP Binding

3.1.
Scope

As XMPP is defined in this specification, an initiating entity (client or server) MUST open a Transmission Control Protocol [TCP] (Postel, J., “Transmission Control Protocol,” September 1981.) connection to the receiving entity (server) before it negotiates XML streams with the receiving entity. The parties then maintain that TCP connection for as long as the XML streams are in use. The rules specified in the following sections apply to the TCP binding.

3.2.
Hostname Resolution

Because XML streams are sent over TCP, the initiating entity needs to determine the IPv4 or IPv6 address (and port) of the receiving entity's "origin domain" before it can attempt to connect to the XMPP network.

The initiating entity uses the IP address(es) from the first successfully resolved hostname (with the corresponding port number returned by the SRV lookup) as the connection address for the receiving entity.

If the initiating entity fails to connect using that IP address but the "A" or "AAAA" lookup returned more than one IP address, then the initiating entity uses the next resolved IP address for that hostname as the connection address.

If the initiating entity fails to connect using all resolved IP addresses for a given hostname, then it repeats the process of resolution and connection for the next hostname returned by the SRV lookup.

If the initiating entity fails to connect using any hostname returned by the SRV lookup, then it can either abort the connection attempt or use the fallback process described in the next section.

3.2.2.
Fallback Processes

The fallback process SHOULD be a normal "A" or "AAAA" address record resolution to determine the IPv4 or IPv6 address of the origin domain, where the port used is the "xmpp-client" port of 5222 for client-to-server connections or the "xmpp-server" port 5269 for server-to-server connections.

3.2.3.
When Not to Use SRV

If the initiating entity has been explicitly configured to associate a particular hostname (and potentially port) with the origin domain of the receiving entity (say, to "hardcode" an association from an origin domain of example.net to a configured hostname of webcm.example.com:80), the initiating entity SHOULD use the configured name instead of performing the preferred SRV resolution process on the origin name.

3.2.4.
Use of SRV Records with Add-On Services

Many XMPP servers are implemented in such a way that they can host add-on services (beyond those defined in this specification and [XMPP‑IM] (Saint-Andre, P., “Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence,” October 2010.)) at DNS domain names that typically are "subdomains" of the main XMPP service (e.g., conference.example.net for a [XEP‑0045] (Saint-Andre, P., “Multi-User Chat,” July 2007.) service associated with the example.net XMPP service) or "subdomains" of the first-level domain of the underlying service (e.g., muc.example.com for a [XEP‑0045] (Saint-Andre, P., “Multi-User Chat,” July 2007.) service associated with the im.example.com XMPP service). If an entity associated with a remote XMPP server wishes to use such an add-on service, it would generate an appropriate XML stanza and the remote server would attempt to resolve the add-on service's DNS domain name via an SRV lookup on resource records such as "_xmpp-server._tcp.conference.example.net." or "_xmpp-server._tcp.muc.example.com.". Therefore if the administrators of an XMPP service wish to enable entities associated with remote servers to access such add-on services, they need to advertise the appropriate "_xmpp-server" SRV records in addition to the "_xmpp-server" record for their main XMPP service. In case SRV records are not available, the fallback methods described under Section 3.2.2 (Fallback Processes) can be used to resolve the DNS domain names of add-on services.

3.3.
Reconnection

It can happen that an XMPP server goes offline while servicing TCP connections from local clients and from other servers. Because the number of such connections can be quite large, the reconnection algorithm employed by entities that seek to reconnect can have a significant impact on software and network performance. If an entity chooses to reconnect, the following guidelines are RECOMMENDED:

The number of seconds that expire before an entity first seeks to reconnect SHOULD be an unpredictable number between 0 and 60 (e.g., so that all clients do not attempt to reconnect exactly 30 seconds after being disconnected).

3.4.
Reliability

The use of long-lived TCP connections in XMPP implies that the sending of XML stanzas over XML streams can be unreliable, since the parties to a long-lived TCP connection might not discover a connectivity disruption in a timely manner. At the XMPP application layer, long connectivity disruptions can result in undelivered stanzas. Although the core XMPP technology defined in this specification does not contain features to overcome this lack of reliability, there exist XMPP extensions for doing so (e.g., [XEP‑0198] (Karneges, J., Hildebrand, J., Saint-Andre, P., and F. Forno, “Stream Management,” June 2009.)).

4.
XML Streams

4.1.
Stream Fundamentals

Two fundamental concepts make possible the rapid, asynchronous exchange of relatively small payloads of structured information between XMPP entities: XML streams and XML stanzas. These terms are defined as follows.

Definition of XML Stream:

An XML stream is a container for the exchange of XML elements between any two entities over a network. The start of an XML stream is denoted unambiguously by an opening "stream header" (i.e., an XML <stream> tag with appropriate attributes and namespace declarations), while the end of the XML stream is denoted unambiguously by a closing XML </stream> tag. During the life of the stream, the entity that initiated it can send an unbounded number of XML elements over the stream, either elements used to negotiate the stream (e.g., to complete TLS negotiation (STARTTLS Negotiation) or SASL negotiation (SASL Negotiation)) or XML stanzas. The "initial stream" is negotiated from the initiating entity (typically a client or server) to the receiving entity (typically a server), and can be seen as corresponding to the initiating entity's "connection to" or "session with" the receiving entity. The initial stream enables unidirectional communication from the initiating entity to the receiving entity; in order to enable exchange of stanzas from the receiving entity to the initiating entity, the receiving entity MUST negotiate a stream in the opposite direction (the "response stream").

In essence, then, one XML stream functions as an envelope for the XML stanzas sent during a session and another XML stream functions as an envelope for the XML stanzas received during a session. We can represent this in a simplistic fashion as follows.

4.2.
Stream Negotiation

4.2.1.
Basic Concepts

Because the receiving entity for a stream acts as a gatekeeper to the domains it services, it imposes certain conditions for connecting as a client or as a peer server. At a minimum, the initiating entity needs to authenticate with the receiving entity before it is allowed to send stanzas to the receiving entity, typically using SASL as described under Section 6 (SASL Negotiation). However, the receiving entity can consider conditions other than authentication to be mandatory, such as encryption using TLS as described under Section 5 (STARTTLS Negotiation). The receiving entity informs the initiating entity about such conditions by communicating "stream features": the set of particular protocol interactions that are mandatory for the initiating entity to complete before the receiving entity will accept XML stanzas from the initiating entity (e.g., authentication), as well as any protocol interactions that are voluntary but that might improve the handling of an XML stream (e.g., establishment of application-layer compression as described in [XEP‑0138] (Hildebrand, J. and P. Saint-Andre, “Stream Compression,” May 2009.)).

The existence of conditions for connecting implies that streams need to be negotiated. The order of layers (TCP, then TLS, then SASL, then XMPP; see Section 13.3 (Order of Layers)) implies that stream negotiation is a multi-stage process. Further structure is imposed by two factors: (1) a given stream feature might be offered only to certain entities or only after certain other features have been negotiated (e.g., resource binding is offered only after SASL authentication), and (2) stream features can be either mandatory-to-negotiate or voluntary-to-negotiate. Finally, for security reasons the parties to a stream need to discard knowledge that they gained during the negotiation process after successfully completing the protocol interactions defined for certain features (e.g., TLS in all cases and SASL in the case when a security layer might be established, as defined in the specification for the relevant SASL mechanism); this is done by flushing the old stream context and exchanging new stream headers over the existing TCP connection.

4.2.2.
Stream Features Format

If the initiating entity includes the 'version' attribute set to a value of at least "1.0" in the initial stream header, after sending the response stream header the receiving entity MUST send a <features/> child element (prefixed by the streams namespace prefix) to the initiating entity in order to announce any conditions for continuation of the stream negotiation process. Each condition takes the form of a child element of the <features/> element, qualified by a namespace that is different from the streams namespace and the content namespace. The <features/> element can contain one child, contain multiple children, or be empty.

Implementation Note: The order of child elements contained in any given <features/> element is not significant.

If a particular stream feature is or can be mandatory-to-negotiate, the definition of that feature needs to do one of the following:

Declare that the feature is always mandatory-to-negotiate (e.g., this is true of resource binding for XMPP clients); or

Specify a way for the receiving entity to flag the feature as mandatory-to-negotiate for this interaction (e.g., this is done for TLS by including an empty <required/> element in the advertisement for that stream feature); it is RECOMMENDED that stream feature definitions for mandatory-to-negotiate features do so by including an empty <required/> element as is done for TLS.

Informational Note: Because there is no generic format for indicating that a feature is mandatory-to-negotiate, it is possible that a feature which is not understood by the initiating entity might be considered mandatory-to-negotiate by the receiving entity, resulting in failure of the stream negotiation process. Although such an outcome would be undesirable, the working group deemed it rare enough that a generic format was not needed.

For security reasons, certain stream features necessitate the initiating entity to send a new initial stream header upon successful negotiation of the feature (e.g., TLS in all cases and SASL in the case when a security layer might be established). If this is true of a given stream feature, the definition of that feature needs to declare that a stream restart is expected after negotiation of the feature.

A <features/> element that contains at least one mandatory-to-negotiate feature indicates that the stream negotiation is not complete and that the initiating entity MUST negotiate further features.

A <features/> element MAY contain more than one mandatory feature. This means that the initiating entity can choose among the mandatory features. For example, perhaps a future technology will perform roughly the same function as TLS, so the receiving entity might advertise support for both TLS and the future technology.

A <features/> element that contains both mandatory and voluntary features indicates that the negotiation is not complete but that the initiating entity MAY complete the voluntary feature(s) before it attempts to negotiate the mandatory feature(s).

A <features/> element that contains only voluntary features indicates that the stream negotiation is complete and that the initiating entity is cleared to send XML stanzas, but that the initiating entity MAY negotiate further features if desired.

4.2.3.
Restarts

On successful negotiation of a feature that necessitates a stream restart, both parties MUST consider the previous stream to be replaced but MUST NOT terminate the underlying TCP connection; instead, the parties MUST reuse the existing connection, which might be in a new state (e.g., encrypted as a result of TLS negotiation). The initiating entity then MUST send a new initial stream header, which SHOULD be preceded by an XML declaration as described under Section 11.5 (Inclusion of XML Declaration). When the receiving entity receives the new initial stream header, it MUST generate a new stream ID (instead of re-using the old stream ID) before sending a new response stream header (which SHOULD be preceded by an XML declaration as described under Section 11.5 (Inclusion of XML Declaration)).

4.2.4.
Resending Features

The receiving entity MUST send an updated list of stream features to the initiating entity after a stream restart, and MAY do so after completing negotiation of a stream feature that does not require a stream restart. The list of updated features MAY be empty if there are no further features to be advertised or MAY include any combination of features.

4.2.5.
Completion of Stream Negotiation

The receiving entity indicates completion of the stream negotiation process by sending to the initiating entity either an empty <features/> element or a <features/> element that contains only voluntary features. After doing so, the receiving entity MAY send an empty <features/> element (e.g., after negotiation of such voluntary features) but MUST NOT send additional stream features to the initiating entity (if the receiving entity has new features to offer, preferably limited to mandatory-to-negotiate or security-critical features, it can simply close the stream using a <reset/> stream error and then advertise the new features when the initiating entity reconnects, preferably closing existing streams in a staggered way so that not all of the initiating entities reconnect at once). Once stream negotiation is complete, the initiating entity is cleared to send XML stanzas over the stream for as long as the stream is maintained by both parties.

Informational Note: Resource binding as specified under Section 7 (Resource Binding) is an historical exception to the foregoing rule, since it is mandatory-to-negotiate for clients but uses XML stanzas for negotiation purposes.

The initiating entity MUST NOT attempt to send XML stanzas (XML Stanzas) to entities other than itself (i.e., the client's connected resource or any other authenticated resource of the client's account) or the server to which it is connected until stream negotiation has been completed. Even if the initiating entity does attempt to do so, the receiving entity MUST NOT accept such stanzas and MUST return a <not-authorized/> stream error. This rule applies to XML stanzas only (i.e., <message/>, <presence/>, and <iq/> elements qualified by the content namespace) and not to XML elements used for stream negotiation (e.g., elements used to complete TLS negotiation (STARTTLS Negotiation) or SASL negotiation (SASL Negotiation)).

Security Note: Because it is possible for a third party to tamper with information that is sent over the stream before a security layer such as TLS is successfully negotiated, it is advisable for the receiving server to treat any such unprotected information with caution.

4.3.
Directionality

An XML stream is always unidirectional, by which is meant that XML stanzas can be sent in only one direction over the stream (either from the initiating entity to the receiving entity or from the receiving entity to the initiating entity).

Depending on the type of session that has been negotiated and the nature of the entities involved, the entities might use:

Two streams over a single TCP connection; this is typical for client-to-server sessions, and a server MUST allow a client to use the same TCP connection for both streams.

Two streams over two TCP connections, where one TCP connection is used for the stream in which stanzas are sent from the initiating entity to the receiving entity and the other TCP connection is used for the stream in which stanzas are sent from the receiving entity to the initiating entity; this is typical for server-to-server sessions.

Multiple streams over two or more TCP connections. This approach is sometimes used for server-to-server communication between two large XMPP service providers; however, this can make it difficult to maintain coherence of data received over multiple streams in situations described under Section 10.1 (In-Order Processing), which is why a server MAY return a <conflict> stream error to a remote server that attempts to negotiate more than one stream (as described under Section 4.8.3.3 (conflict)).

During establishment of a server-to-server session, while completing STARTTLS negotiation (STARTTLS Negotiation) and SASL negotiation (SASL Negotiation) two servers would use one TCP connection, but after the stream negotiation process is done that original TCP connection would be used only for the initiating server to send XML stanzas to the receiving server. In order for the receiving server to send XML stanzas to the initiating server, the receiving server would need to reverse the roles and negotiate an XML stream from the receiving server to the initiating server over a separate TCP connection.

Informational Note: Although XMPP developers sometimes apply the terms "unidirectional" and "bidirectional" to the underlying TCP connection (e.g., calling the TCP connection for a client-to-server session "bidirectional" and the TCP connection for a server-to-server session "unidirectional"), strictly speaking a stream is always unidirectional (because the initiating entity and receiving entity always have a minimum of two streams, one in each direction) and a TCP connection is always bidirectional (because TCP traffic can be sent in both directions). Directionality applies to the application-layer traffic sent over the TCP connection, not to the transport-layer traffic sent over the TCP connection itself.

4.4.
Closing a Stream

An XML stream between two entities can be closed at any time, either because a specific stream error has occurred or in the absence of an error (e.g., when a client simply ends its session).

A stream is closed by sending a closing </stream> tag.

S: </stream:stream>

The entity that sends the closing stream tag SHOULD behave as follows:

Wait for the other party to also close its stream before terminating the underlying TCP connection (this gives the other party an opportunity to finish transmitting any data in the opposite direction before the TCP connection is terminated).

Refrain from initiating the sending of further data over that stream but continue to process data sent by the other entity (and, if necessary, react to such data).

Consider both streams to be void if the other party does not send its closing stream tag within a reasonable amount of time (where the definition of "reasonable" is a matter of implementation or deployment).

After receiving a reciprocal closing stream tag from the other party or waiting a reasonable amount of time with no response, MUST terminate the underlying TCP connection.

4.5.
Handling of Silent Peers

When an entity that is a party to a stream has not received any XMPP traffic from its stream peer for some period of time, the peer might appear to be silent. There are several reasons why this might happen:

The underlying TCP connection is dead.

The XML stream is broken despite the fact that the underlying TCP connection is alive.

The peer is idle and simply has not sent any XMPP traffic over its XML stream to the entity.

These three conditions are best handled separately, as described in the following sections.

Implementation Note: For the purpose of handling silent peers, we treat a two unidirectional TCP connections as conceptually equivalent to a single bidirectional TCP connection (see Section 4.3 (Directionality)); however, implementers need to be aware that, in the case of two unidirectional TCP connections, responses to traffic at the XMPP application layer will come back from the peer on the second TCP connection. In addition, the use of multiple streams in each direction (which is a common deployment choice for server-to-server connectivity among large XMPP service providers) further complicates application-level checking of XMPP streams and their underlying TCP connections, because there is no necessary correlation between any given initial stream and any given response stream.

One common method for checking the TCP connection is to send a space character (U+0020) between XML stanzas, which is allowed for XML streams as described under Section 11.7 (Whitespace); the sending of such a space character is properly called a "whitespace keepalive" (the term "whitespace ping" is often used, despite the fact that it is not a ping since no "pong" is possible).

4.5.3.
Idle Peer

Even if the underlying TCP connection is alive and the stream is not broken, the peer might have sent no stanzas for a certain period of time. In this case, the peer SHOULD close the stream using the handshake described under Section 4.4 (Closing a Stream). If the idle peer does not close the stream, the other party MAY either close the stream using the handshake described under Section 4.4 (Closing a Stream) or return a stream error (e.g., <resource-constraint/> if the entity has reached a limit on the number of open TCP connections or <policy-violation/> if the connection has exceeded a local timeout policy). However, consistent with the order of layers (specified under Section 13.3 (Order of Layers)), the other party is advised to verify that the underlying TCP connection is alive and the stream is unbroken (as described above) before concluding that the peer is idle. Furthermore, it is preferable to be liberal in accepting idle peers, since experience has shown that doing so improves the reliability of communication over XMPP networks and that it is typically more efficient to maintain a stream between two servers than to aggressively timeout such a stream.

4.5.4.
Use of Checking Methods

Implementers are advised to support whichever stream-checking and connection-checking methods they deem appropriate, but to carefully weigh the network impact of such methods against the benefits of discovering broken streams and dead TCP connections in a timely manner. The length of time between the use of any particular check is very much a matter of local service policy and depends strongly on the network environment and usage scenarios of a given deployment and connection type; at the time of writing, it is RECOMMENDED that any such check be performed not more than once every 5 minutes and that, ideally, such checks will be initiated by clients rather than servers. Those who implement XMPP software and deploy XMPP services are encouraged to seek additional advice regarding appropriate timing of stream-checking and connection-checking methods, particularly when power-constrained devices are being used (e.g., in mobile environments).

4.6.
Stream Attributes

The attributes of the root <stream/> element are defined in the following sections.

Security Note: Until and unless the confidentiality and integrity of a stream header is ensured via Transport Layer Security as described under Section 5 (STARTTLS Negotiation), the attributes provided in a stream header could be tampered with by an attacker.

4.6.1.
from

The 'from' attribute communicates an XMPP identity of the entity sending the stream element.

For initial stream headers in client-to-server communication, if the client knows the XMPP identity of the principal controlling the client (typically an account name of the form <localpart@domainpart>), then it SHOULD include the 'from' attribute and set its value to that identity once the stream is in a state in which it is willing to perform authentication, e.g. once TLS has been negotiated. However, because the client might not know the XMPP identity of the principal controlling the entity (e.g., because the XMPP identity is assigned at a level other than the XMPP application layer, as in the General Security Service Application Program Interface [GSS‑API] (Linn, J., “Generic Security Service Application Program Interface Version 2, Update 1,” January 2000.)), inclusion of the 'from' address is OPTIONAL.

Security Note: Including the XMPP identity before the stream is protected via TLS can expose that identity to eavesdroppers.

For initial stream headers in server-to-server communication, a server MUST include the 'from' attribute and MUST set the value to the domainpart of the 'from' attribute of the stanza that caused the stream to be established (because the initiating entity might have more than one XMPP identity, e.g., in the case of a server that provides virtual hosting, it will need to choose an identity that is associated with this stream).

Whether or not the 'from' attribute is included, each entity MUST verify the identity of the other entity before exchanging XML stanzas with it, as described under Section 13.5 (Peer Entity Authentication).

For response stream headers in client-to-server communication, if the client included a 'from' attribute in the initial stream header then the server MUST include a 'to' attribute in the response stream header and MUST set its value to the bare JID specified in the 'from' attribute of the initial stream header. If the client did not include a 'from' attribute in the initial stream header then the server MUST NOT include a 'to' attribute in the response stream header.

For response stream headers in server-to-server communication, the receiving entity MUST include a 'to' attribute in the response stream header and MUST set its value to the hostname specified in the 'from' attribute of the initial stream header.

4.6.3.
id

The 'id' attribute communicates a unique identifier for the stream, called a "stream ID". The stream ID MUST be generated by the receiving entity when it sends a response stream header and MUST BE unique within the receiving application (normally a server).

For response stream headers, the receiving entity MUST include the 'xml:lang' attribute. The following rules apply:

If the initiating entity included an 'xml:lang' attribute in its initial stream header and the receiving entity supports that language in the human-readable XML character data that it generates and sends to the initiating entity (e.g., in the <text/> element for stream and stanza errors), the value of the 'xml:lang' attribute MUST be the identifier for the initiating entity's preferred language (e.g., "de-CH").

If the receiving entity supports a language that closely matches the initiating entity's preferred language (e.g., "de" instead of "de-CH"), then the value of the 'xml:lang' attribute SHOULD be the identifier for the matching language (e.g., "de") but MAY be the identifier for the default language of the receiving entity (e.g., "en").

If the receiving entity does not support the initiating entity's preferred language or a closely matching language (or if the initiating entity did not include the 'xml:lang' attribute in its initial stream header), then the value of the 'xml:lang' attribute MUST be the identifier for the default language of the receiving entity (e.g., "en").

If the initiating entity included the 'xml:lang' attribute in its initial stream header, the receiving entity SHOULD remember that value as the default xml:lang for all stanzas sent by the initiating entity over the current stream. As described under Section 8.1.5 (xml:lang), the initiating entity MAY include the 'xml:lang' attribute in any XML stanzas it sends over the stream. If the initiating entity does not include the 'xml:lang' attribute in any such stanza, the receiving entity SHOULD add the 'xml:lang' attribute to the stanza, where the value of the attribute MUST be the identifier for the language preferred by the initiating entity (even if the receiving entity does not support that language for human-readable XML character data it generates and sends to the initiating entity, such as in stream or stanza errors). If the initiating entity includes the 'xml:lang' attribute in any such stanza, the receiving entity MUST NOT modify or delete it.

The version of XMPP specified in this specification is "1.0"; in particular, XMPP 1.0 encapsulates the stream-related protocols as well as the basic semantics of the three defined XML stanza types (<message/>, <presence/>, and <iq/>).

The numbering scheme for XMPP versions is "<major>.<minor>". The major and minor numbers MUST be treated as separate integers and each number MAY be incremented higher than a single digit. Thus, "XMPP 2.4" would be a lower version than "XMPP 2.13", which in turn would be lower than "XMPP 12.3". Leading zeros (e.g., "XMPP 6.01") MUST be ignored by recipients and MUST NOT be sent.

The major version number will be incremented only if the stream and stanza formats or obligatory actions have changed so dramatically that an older version entity would not be able to interoperate with a newer version entity if it simply ignored the elements and attributes it did not understand and took the actions defined in the older specification.

The minor version number will be incremented only if significant new capabilities have been added to the core protocol (e.g., a newly defined value of the 'type' attribute for message, presence, or IQ stanzas). The minor version number MUST be ignored by an entity with a smaller minor version number, but MAY be used for informational purposes by the entity with the larger minor version number (e.g., the entity with the larger minor version number would simply note that its correspondent would not be able to understand that value of the 'type' attribute and therefore would not send it).

The following rules apply to the generation and handling of the 'version' attribute within stream headers:

The initiating entity MUST set the value of the 'version' attribute in the initial stream header to the highest version number it supports (e.g., if the highest version number it supports is that defined in this specification, it MUST set the value to "1.0").

The receiving entity MUST set the value of the 'version' attribute in the response stream header to either the value supplied by the initiating entity or the highest version number supported by the receiving entity, whichever is lower. The receiving entity MUST perform a numeric comparison on the major and minor version numbers, not a string match on "<major>.<minor>".

If the version number included in the response stream header is at least one major version lower than the version number included in the initial stream header and newer version entities cannot interoperate with older version entities as described, the initiating entity SHOULD generate an <unsupported-version/> stream error.

If either entity receives a stream header with no 'version' attribute, the entity MUST consider the version supported by the other entity to be "0.9" and SHOULD NOT include a 'version' attribute in the response stream header.

4.7.1.
Streams Namespace

The root <stream/> element ("stream header") MUST be qualified by the namespace 'http://etherx.jabber.org/streams' (the "streams namespace"). If this rule is violated, the entity that receives the offending stream header MUST return a stream error to the sending entity, which SHOULD be <invalid-namespace/> (although some existing implementations send <bad-format/> instead).

4.7.2.
Content Namespace

An entity MAY declare a "content namespace" as the default namespace for data sent over the stream (i.e., data other than elements qualified by the streams namespace). If so, (1) the content namespace MUST be other than the streams namespace, and (2) the content namespace MUST be the same for the initial stream and the response stream so that both streams are qualified consistently. The content namespace applies to all first-level child elements sent over the stream unless explicitly qualified by another namespace (i.e., the content namespace is the default namespace).

Alternatively (i.e., instead of declaring the content namespace as the default namespace), an entity MAY explicitly qualify the namespace for each first-level child element of the stream, using so-called "prefix-free canonicalization". These two styles are shown in the following examples.

When a content namespace is declared as the default namespace, in rough outline a stream will look something like the following.

Historically, most XMPP implementations have used the content-namespace-as-default-namespace style rather than the prefix-free canonicalization style for stream headers; however, both styles are acceptable since they are semantically equivalent.

4.7.4.
Namespace Declarations and Prefixes

Because the content namespace is other than the streams namespace, if a content namespace is declared as the default namespace then the following statements are true:

The stream header needs to contain a namespace declaration for both the content namespace and the streams namespace.

The streams namespace declaration needs to include a namespace prefix for the streams namespace.

Interoperability Note: For historical reasons, an implementation MAY accept only the prefix 'stream' for the streams namespace (resulting in prefixed names such as <stream:stream> and <stream:features>). If an entity receives a stream header with a streams namespace prefix it does not accept, it MUST return a stream error to the sending entity, which SHOULD be <bad-namespace-prefix/> (although some existing implementations send <bad-format/> instead).

An implementation MUST NOT generate namespace prefixes for elements qualified by the content namespace if the content namespace is 'jabber:client' or 'jabber:server'.

Namespaces declared in a stream header MUST apply only to that stream (e.g., the 'jabber:server:dialback' namespace used in Server Dialback [XEP‑0220] (Miller, J., Saint-Andre, P., and P. Hancke, “Server Dialback,” March 2010.)). In particular, because XML stanzas intended for routing or delivery over streams with other entities will lose the namespace context declared in the header of the stream in which those stanzas originated, namespaces for extended content within such stanzas MUST NOT be declared in that stream header (see also Section 8.4 (Extended Content)). If either party to a stream declares such namespaces, the other party to the stream SHOULD close the stream with a stream error of <invalid-namespace/>. In any case, an entity MUST ensure that such namespaces are properly declared (according to this section) when routing or delivering stanzas originating from such a stream over streams with other entities.

4.7.5.
Mandatory-to-Implement Content Namespaces

XMPP as defined in this specification uses two content namespaces: 'jabber:client' and 'jabber:server'. These namespaces are nearly identical but are used in different contexts (client-to-server communication for 'jabber:client' and server-to-server communication for 'jabber:server'). The only difference between the two is that the 'to' and 'from' attributes are OPTIONAL on stanzas sent over XML streams qualified by the 'jabber:client' namespace, whereas they are REQUIRED on stanzas sent over XML streams qualified by the 'jabber:server' namespace. Support for these content namespaces implies support for the common attributes (Common Attributes) and basic semantics (Basic Semantics) of all three core stanza types (message, presence, and IQ).

An implementation MAY support content namespaces other than 'jabber:client' or 'jabber:server'. However, because such namespaces would define applications other than XMPP, they are to be defined in separate specifications.

An implementation MAY refuse to support any other content namespaces as default namespaces. If an entity receives a first-level child element qualified by a content namespace it does not support, it MUST return an <invalid-namespace/> stream error.

Client implementations MUST support the 'jabber:client' content namespace as a default namespace.

Server implementations MUST support as default namespaces both the 'jabber:client' content namespace (when the stream is used for communication between a client and a server) and the 'jabber:server' content namespace (when the stream is used for communication between two servers).

Implementation Note: Because a client sends stanzas over a stream whose content namespace is 'jabber:client', if the server to which the client is connected needs to route a client-generated stanza to another server then it MUST "re-scope" the stanza so that its content namespace is 'jabber:server' (i.e., it MUST NOT send a stanza qualified by the 'jabber:client' namespace over a stream whose content namespace is 'jabber:server'). Similarly, a routing server MUST "re-scope" a stanza received over a server-to-server stream (whose content namespace is 'jabber:server') so that the stanza is qualified by the 'jabber:client' namespace before sending it over a client-to-server stream (whose content namespace is 'jabber:client').

4.8.
Stream Errors

The root stream element MAY contain an <error/> child element that is prefixed by the streams namespace prefix. The error child SHALL be sent by a compliant entity if it perceives that a stream-level error has occurred.

4.8.1.
Rules

4.8.1.1.
Stream Errors Are Unrecoverable

Stream-level errors are unrecoverable. Therefore, if an error occurs at the level of the stream, the entity that detects the error MUST send an <error/> element with an appropriate child element that specifies the error condition and immediately close the stream as described under Section 4.4 (Closing a Stream).

4.8.1.2.
Stream Errors Can Occur During Setup

If the error is triggered by the initial stream header, the receiving entity MUST still send the opening <stream> tag, include the <error/> element as a child of the stream element, and send the closing </stream> tag (preferably all at the same time).

4.8.1.3.
Stream Errors When the Host is Unspecified or Unknown

If the initiating entity provides no 'to' attribute or provides an unknown host in the 'to' attribute and the error occurs during stream setup, the value of the 'from' attribute returned by the receiving entity in the stream header sent before closing the stream MUST be either an authoritative hostname for the receiving entity or the empty string.

4.8.1.4.
Where Stream Errors Are Sent

When two TCP connections are used between the initiating entity and the receiving entity (one in each direction) rather than using a single bidirectional connection, the following rules apply:

Stream-level errors related to the initial stream are returned by the receiving entity on the response stream via the same TCP connection.

Stanza errors triggered by outbound stanzas sent from the initiating entity over the initial stream via the same TCP connection are returned by the receiving entity on the response stream via the other, return TCP connection (since they are inbound stanzas from the perspective of the initiating entity).

4.8.2.
Syntax

The syntax for stream errors is as follows, where "defined-condition" is a placeholder for one of the conditions defined under Section 4.8.3 (Defined Stream Error Conditions) and XML data shown within the square brackets '[' and ']' is OPTIONAL.

MAY contain a <text/> child element containing XML character data that describes the error in more detail; this element MUST be qualified by the 'urn:ietf:params:xml:ns:xmpp-streams' namespace and SHOULD possess an 'xml:lang' attribute specifying the natural language of the XML character data.

MAY contain a child element for an application-specific error condition; this element MUST be qualified by an application-defined namespace, and its structure is defined by that namespace (see Section 4.8.4 (Application-Specific Conditions)).

The <text/> element is OPTIONAL. If included, it MUST be used only to provide descriptive or diagnostic information that supplements the meaning of a defined condition or application-specific condition. It MUST NOT be interpreted programmatically by an application. It MUST NOT be used as the error message presented to a human user, but MAY be shown in addition to the error message associated with the defined condition element (and, optionally, the application-specific condition element).

This error MAY be used instead of the more specific XML-related errors, such as <bad-namespace-prefix/>, <invalid-xml/>, <restricted-xml/>, <unsupported-encoding/>, and <not-well-formed/>. However, the more specific errors are RECOMMENDED.

4.8.3.3.
conflict

The server either (1) is closing the existing stream for this entity because a new stream has been initiated that conflicts with the existing stream, or (2) is refusing a new stream for this entity because allowing the new stream would conflict with an existing stream (e.g., because the server allows only a certain number of connections from the same IP address or allows only one server-to-server stream for a given domain pair as a way of helping to ensure in-order processing as described under Section 10.1 (In-Order Processing)).

If a client receives a <conflict/> stream error, during the resource binding aspect of its reconnection attempt it MUST NOT blindly request the resourcepart it used during the former session but instead MUST choose a different resourcepart; details are provided under Section 7 (Resource Binding).

Interoperability Note: RFC 3920 specified that the <connection-timeout/> stream error is to be used if the peer has not generated any traffic over the stream for some period of time. That behavior is no longer recommended; instead, the error SHOULD be used only if the connected client or peer server has not responded to data sent over the stream.

4.8.3.9.
invalid-from

The JID or hostname provided in a 'from' address is not a valid JID or does not match an authorized JID or validated domain as negotiated between servers via SASL or Server Dialback, or as negotiated between a client and a server via authentication and resource binding.

(In the following example, a peer that has authenticated only as "example.net" attempts to send a stanza from an address at "example.org".)

4.8.3.10.
invalid-namespace

The streams namespace name is something other than "http://etherx.jabber.org/streams" (see Section 11.2 (XML Namespace Names and Prefixes)) or the content namespace is not supported (e.g., something other than "jabber:client" or "jabber:server").

(In the following example, the client specifies a namespace of 'http://wrong.namespace.example.org/' for the stream.)

4.8.3.12.
not-authorized

The entity has attempted to send XML stanzas before the stream has been authenticated, or otherwise is not authorized to perform an action related to stream negotiation; the receiving entity MUST NOT process the offending stanza before sending the stream error.

(In the following example, the client attempts to send XML stanzas before authenticating with the server.)

4.8.3.14.
policy-violation

The entity has violated some local service policy (e.g., the stanza exceeds a configured size limit); the server MAY choose to specify the policy in the <text/> element or in an application-specific condition element.

(In the following example, the client sends an XMPP message that is too large according to the server's local service policy.)

4.8.3.19.
see-other-host

The server will not provide service to the initiating entity but is redirecting traffic to another host; the XML character data of the <see-other-host/> element returned by the server MUST specify the alternate hostname or IP address at which to connect, which MUST be a valid domainpart or a domainpart plus port number (separated by the ':' character in the form "domainpart:port"). If the domainpart is the same as the source domain, derived domain, or resolved IP address to which the initiating entity originally connected (differing only by the port number), then the initiating entity SHOULD simply attempt to reconnect at that address. Otherwise, the initiating entity MUST resolve the hostname specified in the <see-other-host/> element as described under Section 3.2 (Hostname Resolution).

4.8.3.24.
unsupported-stanza-type

The initiating entity has sent a first-level child of the stream that is not supported by the server, either because the receiving entity does not understand the namespace or because the receiving entity does not understand the element name for the applicable namespace (which might be the content namespace declared as the default namespace).

(In the following example, the client attempts to send a first-level child element of <pubsub/> qualified by the 'jabber:client' namespace, but the schema for that namespace defines no such element.)

C: <pubsub xmlns='jabber:client'>
<publish node='princely_musings'>
<item id='ae890ac52d0df67ed7cfdf51b644e901'>
<entry xmlns='http://www.w3.org/2005/Atom'>
<title>Soliloquy</title>
<summary>
To be, or not to be: that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing end them?
</summary>
<link rel='alternate' type='text/html'
href='http://denmark.example/2003/12/13/atom03'/>
<id>tag:denmark.example,2003:entry-32397</id>
<published>2003-12-13T18:30:02Z</published>
<updated>2003-12-13T18:30:02Z</updated>
</entry>
</item>
</publish>
</pubsub>
S: <stream:error>
<unsupported-stanza-type
xmlns='urn:ietf:params:xml:ns:xmpp-streams'/>
</stream:error>
</stream:stream>

4.8.4.
Application-Specific Conditions

As noted, an application MAY provide application-specific stream error information by including a properly-namespaced child in the error element. The application-specific element SHOULD supplement or further qualify a defined element. Thus the <error/> element will contain two or three child elements.

5.2.
Support

Support for STARTTLS is REQUIRED in XMPP client and server implementations. An administrator of a given deployment MAY specify that TLS is obligatory for client-to-server communication, server-to-server communication, or both. An initiating entity SHOULD use TLS to secure its stream with the receiving entity before proceeding with SASL authentication.

5.3.
Stream Negotiation Rules

5.3.1.
Mandatory-to-Negotiate

If the receiving entity advertises only the STARTTLS feature or if the receiving entity includes the <required/> child element as explained under Section 5.4.1 (Exchange of Stream Headers and Stream Features), the parties MUST consider TLS as mandatory-to-negotiate. If TLS is mandatory-to-negotiate, the receiving entity SHOULD NOT advertise support for any stream feature except STARTTLS during the initial stage of the stream negotiation process, because further stream features might depend on prior negotiation of TLS given the order of layers in XMPP (e.g., the particular SASL mechanisms offered by the receiving entity will likely depend on whether TLS has been negotiated).

5.3.2.
Restart

5.3.3.
Data Formatting

During STARTTLS negotiation, the entities MUST NOT send any whitespace as separators between XML elements (i.e., from the last character of the <starttls/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-tls' namespace at depth=1 of the stream as sent by the initiating entity, until the last character of the <proceed/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-tls' namespace at depth=1 of the stream as sent by the receiving entity). This prohibition helps to ensure proper security layer byte precision. Any such whitespace shown in the STARTTLS examples provided in this document is included only for the sake of readability.

5.3.4.
Order of TLS and SASL Negotiations

If the initiating entity chooses to use TLS, STARTTLS negotiation MUST be completed before proceeding to SASL negotiation (SASL Negotiation); this order of negotiation is necessary to help safeguard authentication information sent during SASL negotiation, as well as to make it possible to base the use of the SASL EXTERNAL mechanism on a certificate (or other credentials) provided during prior TLS negotiation.

Protecting client credentials by completing server authentication first and then completing client authentication over the protected channel.

Because it is relatively inexpensive to establish streams in XMPP, for the first two cases it is preferable to use an XMPP stream reset (as described under Section 4.8.3.16 (reset)) instead of performing TLS renegotiation.

The third case has improved security characteristics when the TLS client (which might be an XMPP server) presents credentials to the TLS server. If communicating such credentials to an unauthenticated server might leak private information, it can be appropriate to complete TLS negotiation for the purpose of server authentication and then attempt TLS renegotiation for the purpose of client authentication with the TLS server.

However, the third case is sufficiently rare that XMPP entities SHOULD NOT blindly attempt TLS renegotiation.

If an entity that does not support TLS renegotiation detects a renegotiation attempt, then it MUST immediately close the underlying TCP connection without returning a stream error (since the violation has occurred at the TLS layer, not the XMPP layer; see Section 13.3 (Order of Layers)).

5.3.6.
TLS Extensions

5.4.
Process

5.4.1.
Exchange of Stream Headers and Stream Features

The initiating entity resolves the hostname of the receiving entity as specified under Section 3 (TCP Binding), opens a TCP connection to the advertised port at the resolved IP address, and sends an initial stream header to the receiving entity; if the initiating entity is capable of STARTTLS negotiation, it MUST include the 'version' attribute set to a value of at least "1.0" in the initial stream header.

The receiving entity MUST send a response stream header to the initiating entity over the TCP connection opened by the initiating entity; if the receiving entity is capable of STARTTLS negotiation, it MUST include the 'version' attribute set to a value of at least "1.0" in the response stream header.

The receiving entity then MUST send stream features to the initiating entity. If the receiving entity supports TLS, the stream features MUST include an advertisement for support of STARTTLS negotiation, i.e., a <starttls/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-tls' namespace.

If the receiving entity considers STARTTLS negotiation to be mandatory, the <starttls/> element SHOULD contain an empty <required/> child element.

5.4.2.
Initiation of STARTTLS Negotiation

5.4.2.1.
STARTTLS Command

In order to begin the STARTTLS negotiation, the initiating entity issues the STARTTLS command (i.e., a <starttls/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-tls' namespace) to instruct the receiving entity that it wishes to begin a STARTTLS negotiation to secure the stream.

I: <starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'/>

The receiving entity MUST reply with either a <proceed/> element (proceed case) or a <failure/> element (failure case) qualified by the 'urn:ietf:params:xml:ns:xmpp-tls' namespace.

5.4.2.3.
Proceed Case

If the proceed case occurs, the receiving entity MUST return a <proceed/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-tls' namespace.

R: <proceed xmlns='urn:ietf:params:xml:ns:xmpp-tls'/>

The receiving entity MUST consider the TLS negotiation to have begun immediately after sending the closing '>' character of the <proceed/> element to the initiating entity. The initiating entity MUST consider the TLS negotiation to have begun immediately after receiving the closing '>' character of the <proceed/> element from the receiving entity.

The entities now proceed to TLS negotiation as explained in the next section.

So that mutual authentication will be possible, the receiving entity SHOULD send a certificate request to the initiating entity and the initiating entity SHOULD send a certificate (if available) to the receiving entity.

5.4.3.2.
TLS Failure

If the TLS negotiation results in failure, the receiving entity MUST terminate the TCP connection.

The receiving entity MUST NOT send a closing </stream> tag before terminating the TCP connection, since the receiving entity and initiating entity MUST consider the original stream to be replaced upon failure of the TLS negotiation.

The initiating entity MAY attempt to reconnect as explained under Section 3.3 (Reconnection), with or without attempting TLS negotiation (in accordance with local service policy, user-configured preferences, etc.).

5.4.3.3.
TLS Success

If the TLS negotiation is successful, then the entities MUST proceed as follows.

The initiating entity MUST discard any information transmitted in layers above TCP that it obtained from the receiving entity in an insecure manner before TLS took effect (e.g., the receiving entity's 'from' address or the stream ID and stream features received from the receiving entity).

The receiving entity MUST discard any information transmitted in layers above TCP that it obtained from the initiating entity in an insecure manner before TLS took effect (e.g., the initiating entity's 'from' address).

The initiating entity MUST send a new initial stream header to the receiving entity over the encrypted connection.

Implementation Note: The initiating entity MUST NOT send a closing </stream> tag before sending the new initial stream header, since the receiving entity and initiating entity MUST consider the original stream to be replaced upon success of the TLS negotiation.

The receiving entity MUST respond with a new response stream header over the encrypted connection (for which it MUST generate a new stream ID instead of re-using the old stream ID).

The receiving entity also MUST send stream features to the initiating entity, which MUST NOT include the STARTTLS feature but which SHOULD include the SASL stream feature as described under Section 6 (SASL Negotiation).

6.3.
Stream Negotiation Rules

6.3.1.
Mandatory-to-Negotiate

6.3.2.
Restart

6.3.3.
Mechanism Preferences

Any entity that will act as a SASL client or a SASL server MUST maintain an ordered list of its preferred SASL mechanisms according to the client or server, where the list is ordered according to local policy or user configuration (which SHOULD be in order of perceived strength to enable the strongest authentication possible). A server MUST offer and a client MUST try SASL mechanisms in preference order. For example, if the server offers the ordered list "PLAIN SCRAM-SHA-1 GSSAPI" or "SCRAM-SHA-1 GSSAPI PLAIN" but the client's ordered list is "GSSAPI SCRAM-SHA-1", the client MUST try GSSAPI first and then SCRAM-SHA-1 but MUST never try PLAIN (since PLAIN is not on its list).

6.3.4.
Mechanism Offers

If the receiving entity considers TLS negotiation (STARTTLS Negotiation) to be mandatory before it will accept authentication with a particular SASL mechanism, it MUST NOT advertise that mechanism in its list of available SASL mechanisms before TLS negotiation has been completed.

The receiving entity SHOULD offer the SASL EXTERNAL mechanism if both of the following conditions hold:

During TLS negotiation the initiating entity presented a certificate that is acceptable to the receiving entity for purposes of strong identity verification in accordance with local service policies (e.g., because said certificate is unexpired, is unrevoked, and is anchored to a root trusted by the receiving entity).

However, the receiving entity MAY offer the SASL EXTERNAL mechanism under other circumstances, as well.

When the receiving entity offers the SASL EXTERNAL mechanism, the receiving entity SHOULD list the EXTERNAL mechanism first among its offered SASL mechanisms and the initiating entity SHOULD attempt SASL negotiation using the EXTERNAL mechanism first (this preference will tend to increase the likelihood that the parties can negotiate mutual authentication).

6.3.5.
Data Formatting

The following data formatting rules apply to the SASL negotiation:

During SASL negotiation, the entities MUST NOT send any whitespace as separators between XML elements (i.e., from the last character of the <auth/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-sasl' namespace at depth=1 of the stream as sent by the initiating entity, until the last character of the <success/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-sasl' namespace at depth=1 of the stream as sent by the receiving entity). This prohibition helps to ensure proper security layer byte precision. Any such whitespace shown in the SASL examples provided in this document is included only for the sake of readability.

As formally specified in the XML schema for the 'urn:ietf:params:xml:ns:xmpp-sasl' namespace under Appendix A.4 (SASL Namespace), the receiving entity MAY include one or more application-specific child elements inside the <mechanisms/> element to provide information that might be needed by the initiating entity in order to complete successful SASL negotiation using one or more of the offered mechanisms; however, the syntax and semantics of all such elements are out of scope for this specification.

6.3.6.
Security Layers

Upon successful SASL negotiation that involves negotiation of a security layer, both the initiating entity and the receiving MUST discard any application-layer state (i.e, state from the XMPP layer, excluding state from the TLS negotiation or SASL negotiation).

6.3.8.
Authorization Identity

An authorization identity is an optional identity specified by the initiating entity; in client-to-server streams it is typically used by an administrator to perform some management task on behalf of another user, whereas in server-to-server streams it is typically used to specify a particular application at a service (e.g., a multi-user chat server at conference.example.com that is hosted by the example.com XMPP service). If the initiating entity wishes to act on behalf of another entity and the selected SASL mechanism supports transmission of an authorization identity, the initiating entity SHOULD provide an authorization identity during SASL negotiation. If the initiating entity does not wish to act on behalf of another entity, it SHOULD NOT provide an authorization identity.

In the case of client-to-server communication, the value of an authorization identity MUST be a bare JID (<localpart@domainpart>) and not a full JID (<localpart@domainpart/resourcepart>).

In the case of server-to-server communication, the value of an authorization identity MUST be a domainpart only (<domainpart>).

If the initiating entity provides an authorization identity during SASL negotiation, the receiving entity is responsible for verifying that the initiating entity is in fact allowed to assume the specified authorization identity; if not, the receiving entity MUST return an <invalid-authzid/> SASL error as described under Section 6.5.6 (invalid-authzid).

6.3.9.
Realms

The receiving entity MAY include a realm when negotiating certain SASL mechanisms. If the receiving entity does not communicate a realm, the initiating entity MUST NOT assume that any realm exists. The realm MUST be used only for the purpose of authentication; in particular, an initiating entity MUST NOT attempt to derive an XMPP hostname from the realm information provided by the receiving entity.

When the SASL client (the XMPP "initiating entity") requests an authentication exchange, it can include "initial response" data with its request if appropriate for the SASL mechanism in use. In XMPP this is done by including the initial response as the XML character data of the <auth/> element.

At the end of the authentication exchange, the SASL server (the XMPP "receiving entity") can include "additional data with success" if appropriate for the SASL mechanism in use. In XMPP this is done by including the additional data as the XML character data of the <success/> element.

For the sake of protocol efficiency, it is REQUIRED for clients and servers to support these methods and RECOMMENDED to use them; however clients and servers MUST support the less efficient modes as well.

6.4.
Process

6.4.1.
Exchange of Stream Headers and Stream Features

If SASL negotiation follows successful STARTTLS negotiation (STARTTLS Negotiation), then the SASL negotiation occurs over the encrypted stream that has already been negotiated. If not, the initiating entity resolves the hostname of the receiving entity as specified under Section 3 (TCP Binding), opens a TCP connection to the advertised port at the resolved IP address, and sends an initial stream header to the receiving entity; if the initiating entity is capable of SASL negotiation, it MUST include the 'version' attribute set to a value of at least "1.0" in the initial stream header.

The receiving entity MUST send a response stream header to the initiating entity (for which it MUST generate a new stream ID instead of re-using the old stream ID); if the receiving entity is capable of SASL negotiation, it MUST include the 'version' attribute set to a value of at least "1.0" in the response stream header.

The receiving entity also MUST send stream features to the initiating entity. If the receiving entity supports SASL, the stream features SHOULD include an advertisement for support of SASL negotiation, i.e., a <mechanisms/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-sasl' namespace; typically the only case in which support for SASL negotiation would not be advertised here is before STARTTLS negotiation when TLS is required.

The <mechanisms/> element MUST contain one <mechanism/> child element for each authentication mechanism the receiving entity offers to the initiating entity. The order of <mechanism/> elements in the XML indicates the preference order of the SASL mechanisms according to the receiving entity; however the initiating entity MUST maintain its own preference order independent of the preference order of the receiving entity.

6.4.2.
Initiation

In order to begin the SASL negotiation, the initiating entity sends an <auth/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-sasl' namespace and includes an appropriate value for the 'mechanism' attribute, thus starting the handshake for that particular authentication mechanism. This element MAY contain XML character data (in SASL terminology, the "initial response") if the mechanism supports or requires it; if the initiating entity needs to send a zero-length initial response, it MUST transmit the response as a single equals sign character ("="), which indicates that the response is present but contains no data.

If the initiating entity subsequently sends another <auth/> element (even if the ongoing authentication handshake has not yet completed), the server SHOULD discard the ongoing handshake and begin a new handshake for the subsequently requested SASL mechanism.

6.4.3.
Challenge-Response Sequence

If necessary, the receiving entity challenges the initiating entity by sending a <challenge/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-sasl' namespace; this element MAY contain XML character data (which MUST be generated in accordance with the definition of the SASL mechanism chosen by the initiating entity).

The initiating entity responds to the challenge by sending a <response/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-sasl' namespace; this element MAY contain XML character data (which MUST be generated in accordance with the definition of the SASL mechanism chosen by the initiating entity).

If necessary, the receiving entity sends more challenges and the initiating entity sends more responses.

This series of challenge/response pairs continues until one of three things happens:

The initiating entity aborts the handshake for this authentication mechanism.

6.4.5.
SASL Failure

The receiving entity reports failure of the handshake for this authentication mechanism by sending a <failure/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-sasl' namespace (the particular cause of failure MUST be communicated in an appropriate child element of the <failure/> element as defined under Section 6.5 (SASL Errors)).

Where appropriate for the chosen SASL mechanism, the receiving entity SHOULD allow a configurable but reasonable number of retries (at least 2 and no more than 5); this enables the initiating entity (e.g., an end-user client) to tolerate incorrectly-provided credentials (e.g., a mistyped password) without being forced to reconnect.

If the initiating entity attempts a reasonable number of retries with the same SASL mechanism and all attempts fail, it MAY fall back to the next mechanism in its ordered list by sending a new <auth/> request to the receiving entity, this starting a new handshake for that authentication mechanism. If all handshakes fail and there are no remaining mechanisms in the initiating entity's list of supported and acceptable mechanisms, the initiating entity SHOULD simply close the stream.

If the initiating entity exceeds the number of retries, the receiving entity MUST return a stream error, which SHOULD be <policy-violation/> (although some existing implementations send <not-authorized/> instead).

Implementation Note: For server-to-server streams, if the receiving entity cannot offer the SASL EXTERNAL mechanism or any other SASL mechanism based on the security context established during TLS negotiation, the receiving entity MAY attempt to complete weak identity verification using the Server Dialback protocol [XEP‑0220] (Miller, J., Saint-Andre, P., and P. Hancke, “Server Dialback,” March 2010.); however, if according to local service policies weak identity verification is insufficient then the receiving entity SHOULD instead close the stream with a <policy-violation/> stream error.

6.4.6.
SASL Success

The receiving entity reports success of the handshake by sending a <success/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-sasl' namespace; this element MAY contain XML character data (in SASL terminology, "additional data with success") if the chosen SASL mechanism supports or requires it; if the receiving entity needs to send additional data of zero length, it MUST transmit the data as a single equals sign character ("=").

R: <success xmlns='urn:ietf:params:xml:ns:xmpp-sasl'/>

Informational Note: The authorization identity communicated during SASL negotiation is used to determine the canonical address for the initiating client according to the receiving server, as described under Section 4.2.6 (Determination of Addresses).

Upon receiving the <success/> element, the initiating entity MUST initiate a new stream over the existing TCP connection by sending a new initial stream header to the receiving entity.

Implementation Note: The initiating entity MUST NOT send a closing </stream> tag before sending the new initial stream header, since the receiving entity and initiating entity MUST consider the original stream to be replaced upon sending or receiving the <success/> element.

Upon receiving the new initial stream header from the initiating entity, the receiving entity MUST respond by sending a new response stream header to the initiating entity (for which it MUST generate a new stream ID instead of re-using the old stream ID).

6.5.
SASL Errors

The syntax of SASL errors is as follows, where "defined-condition" is one of the SASL-related error conditions defined in the following sections and XML data shown within the square brackets '[' and ']' is OPTIONAL.

Inclusion of the <text/> element is OPTIONAL, and can be used to provide application-specific information about the error condition, which information MAY be displayed to a human but only as a supplement to the defined condition.

6.5.6.
invalid-authzid

The authzid provided by the initiating entity is invalid, either because it is incorrectly formatted or because the initiating entity does not have permissions to authorize that ID; sent in reply to a <response/> element or an <auth/> element with initial response data.

6.5.8.
malformed-request

The request is malformed (e.g., the <auth/> element includes initial response data but the mechanism does not allow that, or the data sent violates the syntax for the specified SASL mechanism); sent in reply to an <abort/>, <auth/>, <challenge/>, or <response/> element.

6.5.10.
not-authorized

The authentication failed because the initiating entity did not provide proper credentials or the receiving entity has detected an attack but wishes to disclose as little information as possible to the attacker; sent in reply to a <response/> element or an <auth/> element with initial response data.

Security Note: This error condition includes but is not limited to the case of incorrect credentials or a nonexistent username. In order to discourage directory harvest attacks, no differentiation is made between incorrect credentials and a nonexistent username.

6.5.11.
temporary-auth-failure

The authentication failed because of a temporary error condition within the receiving entity, and it is advisable for the initiating entity to try again later; sent in reply to an <auth/> element or a <response/> element.

6.5.12.
transition-needed

The authentication failed because the mechanism cannot be used until the initiating entity provides (for one time only) a plaintext password so that the receiving entity can build a hashed password for use in future authentication attempts; sent in reply to an <auth/> element with or without initial response data.

Security Note: An XMPP client MUST treat a <transition-needed/> SASL error with extreme caution, SHOULD NOT provide a plaintext password over an XML stream that is not encrypted via Transport Layer Security, and MUST warn a human user before allowing the user to provide a plaintext password over an unencrypted connection. Even so, the attacker could be located on the server, attempting to capture the plaintext password.

After the initiating entity provides an opening XML stream header and the receiving entity replies in kind, the receiving entity provides a list of acceptable authentication methods. The initiating entity chooses one method from the list and sends it to the receiving entity as the value of the 'mechanism' attribute possessed by an <auth/> element, optionally including an initial response to avoid a round trip.

exchange sequence:

Challenges and responses are carried through the exchange of <challenge/> elements from receiving entity to initiating entity and <response/> elements from initiating entity to receiving entity. The receiving entity reports failure by sending a <failure/> element and success by sending a <success/> element; the initiating entity aborts the exchange by sending an <abort/> element. Upon successful negotiation, both sides consider the original XML stream to be closed and new stream headers are sent by both entities.

7.
Resource Binding

7.1.
Fundamentals

After a client authenticates with a server, it MUST bind a specific resource to the stream so that the server can properly address the client. That is, there MUST be an XMPP resource associated with the bare JID (<localpart@domainpart>) of the client, so that the address for use over that stream is a full JID of the form <localpart@domainpart/resource> (including the resourcepart). This ensures that the server can deliver XML stanzas to and receive XML stanzas from the client in relation to entities other than the server itself or the client's account, as explained under Section 10 (Server Rules for Processing XML Stanzas) (the client could exchange stanzas with the server itself or the client's account before binding a resource since the full JID is needed only for addressing outside the context of the stream negotiated between the client and the server, but this is not commonly done).

After a client has bound a resource to the stream, it is referred to as a "connected resource". A server SHOULD allow an entity to maintain multiple connected resources simultaneously, where each connected resource is associated with a distinct XML stream and differentiated from the other connected resources by a distinct resourcepart.

Security Note: A server SHOULD enable the administrator of an XMPP service to limit the number of connected resources in order to prevent certain denial of service attacks as described under Section 13.12 (Denial of Service).

If, before completing the resource binding step, the client attempts to send an XML stanza to an entity other than the server itself or the client's account, the server MUST NOT process the stanza and MUST return a <not-authorized/> stream error to the client.

The XML namespace name for the resource binding extension is 'urn:ietf:params:xml:ns:xmpp-bind'.

7.3.
Stream Negotiation Rules

7.3.1.
Mandatory-to-Negotiate

7.3.2.
Restart

7.4.
Advertising Support

Upon sending a new response stream header to the client after successful SASL negotiation, the server MUST include a <bind/> element qualified by the 'urn:ietf:params:xml:ns:xmpp-bind' namespace in the stream features it presents to the client.

The server MUST NOT include the resource binding stream feature until after the client has authenticated, typically by means of successful SASL negotiation.

Once the server has generated an XMPP resourcepart for the client, it MUST return an IQ stanza of type "result" to the client, which MUST include a <jid/> child element that specifies the full JID for the connected resource as determined by the server.

7.6.2.
Error Cases

When a client asks the server to generate a resourcepart during resource binding, the following stanza error conditions are defined (and others not specified here are possible; see under Section 8.3 (Stanza Errors)):

The account has reached a limit on the number of simultaneous connected resources allowed.

7.7.
Client-Submitted Resource Identifier

7.7.1.
Success Case

A client asks its server to accept a client-submitted resourcepart by sending an IQ stanza of type "set" containing a <bind/> element with a child <resource/> element containing non-zero-length XML character data.

The server SHOULD accept the client-submitted resourcepart. It does so by returning an IQ stanza of type "result" to the client, including a <jid/> child element that specifies the full JID for the connected resource and contains without modification the client-submitted text.

7.7.2.
Error Cases

When a client attempts to submit its own XMPP resourcepart during resource binding, the following stanza error conditions are defined in addition to those described under Section 7.6.2 (Error Cases) (and others not specified here are possible; see under Section 8.3 (Stanza Errors)):

7.7.2.2.
Conflict

If there is a currently-connected client whose session has the resourcepart being requested by the newly-connecting client, the server MUST do one of the following (which of these the server does is a matter for implementation or local service policy, although suggestions are provided below).

Override the resourcepart provided by the newly-connecting client with a server-generated resourcepart.

This behavior is encouraged, because it simplifies the resource binding process for client implementations.

Disallow the resource binding attempt of the newly-connecting client and maintain the session of the currently-connected client.

This behavior is neither encouraged nor discouraged, despite the fact that it was implicitly encouraged in RFC 3920; however, note that handling of the <conflict/> error described below is unevenly supported among existing client implementations, which often treat it as an authentication error and have been observed to discard cached credentials when receiving it.

Terminate the session of the currently-connected client and allow the resource binding attempt of the newly-connecting client.

Although this was the traditional behavior of early XMPP server implementations, it is now discouraged because it can lead to a neverending cycle of two clients effectively disconnecting each other; however, note that this behavior can be appropriate in some deployment scenarios or if the server knows that the currently-connected client has a dead connection or broken stream as described under Section 4.5 (Handling of Silent Peers).

If the server follows behavior #1, it returns an <iq/> stanza of type "result" to the newly-connecting client, where the <jid/> child of the <bind/> element contains XML character data that indicates the full JID of the client, including the resourcepart that was generated by the server.

If the server follows behavior #2, it sends a <conflict/> stanza error in response to the resource binding attempt of the newly-connecting client but maintains the XML stream so that the newly-connecting client has an opportunity to negotiate a non-conflicting resourcepart (i.e., the newly-connecting client needs to choose a different resourcepart before making another attempt to bind a resource).

If the server follows behavior #3, it sends a <conflict/> stream error to the currently-connected client and returns an IQ stanza of type "result" (indicating success) in response to the resource binding attempt of the newly-connecting client.

7.7.3.
Retries

If an error occurs when a client submits a resourcepart, the server SHOULD allow a configurable but reasonable number of retries (at least 5 and no more than 10); this enables the client to tolerate incorrectly-provided resourceparts (e.g., bad data formats or duplicate text strings) without being forced to reconnect.

After the client has reached the retry limit, the server MUST return a <policy-violation/> stream error to the client.

8.
XML Stanzas

After a client and a server (or two servers) have completed stream negotiation, either party can send XML stanzas. Three kinds of XML stanza are defined for the 'jabber:client' and 'jabber:server' namespaces: <message/>, <presence/>, and <iq/>. In addition, there are five common attributes for these stanza types. These common attributes, as well as the basic semantics of the three stanza types, are defined in this specification; more detailed information regarding the syntax of XML stanzas for instant messaging and presence applications is provided in [XMPP‑IM] (Saint-Andre, P., “Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence,” October 2010.), and for other applications in the relevant XMPP extension specifications.

Support for the XML stanza syntax and semantics defined in this specification is REQUIRED in XMPP client and server implementations.

Security Note: A server MUST NOT process a partial stanza and MUST NOT attach meaning to the transmission timing of any part of a stanza (before receipt of the close tag).

8.1.1.1.
Client-to-Server Streams

The following rules apply to inclusion of the 'to' attribute in stanzas sent from the client to the server over an XML stream qualified by the 'jabber:client' namespace.

A stanza with a specific intended recipient (e.g., a conversation partner, a remote service, the server itself, even another resource associated with the user's bare JID) MUST possess a 'to' attribute whose value is an XMPP address.

The following rules apply to inclusion of the 'to' attribute in stanzas sent from the server to the client over an XML stream qualified by the 'jabber:client' namespace.

If the server has received the stanza from another connected client or from another server, the server MUST NOT modify the 'to' address before delivering the stanza to the client.

If the server has itself generated the stanza (e.g., a response to an IQ stanza of type "get" or "set", even if the stanza did not include a 'to' address), the stanza MAY include a 'to' address, which MUST be the full JID of the client; however, if the stanza does not include a 'to' address then the client MUST treat it as if the 'to' address were included with a value of the client's full JID.

Implementation Note: It is the server's responsibility to deliver only stanzas that are addressed to the client's full JID or the user's bare JID; thus there is no need for the client to check the 'to' address of incoming stanzas. However, if the client does check the 'to' address then it is suggested to check at most the bare JID portion (not the full JID), since the 'to' address might be the user's bare JID, the client's current full JID, or even a full JID with a different resourcepart (e.g., in the case of so-called "offline messages" as described in [XEP‑0160] (Saint-Andre, P., “Best Practices for Handling Offline Messages,” January 2006.)).

When the server generates a stanza from the server itself for delivery to the client, the stanza MUST include a 'from' attribute whose value is the bare JID (i.e., <domain>) of the server as agreed upon during stream negotiation (e.g., based on the 'to' attribute of the initial stream header).

When the server generates a stanza from the server for delivery to the client on behalf of the account of the connected client (e.g., in the context of data storage services provided by the server on behalf of the client), the stanza MUST either (a) not include a 'from' attribute or (b) include a 'from' attribute whose value is the account's bare JID (<localpart@domainpart>).

A server MUST NOT send to the client a stanza without a 'from' attribute if the stanza was not generated by the server (e.g., if it was generated by another client or another server); therefore, when a client receives a stanza that does not include a 'from' attribute, it MUST assume that the stanza is from the user's account on the server.

8.1.3.
id

The 'id' attribute is used by the entity that generates a stanza ("the originating entity") to track any response or error stanza that it might receive in relation to the generated stanza from another entity (such as an intermediate server or the intended recipient).

It is up to the originating entity whether the value of the 'id' attribute will be unique only within its current stream or unique globally.

For <message/> and <presence/> stanzas, it is RECOMMENDED for the originating entity to include an 'id' attribute; for <iq/> stanzas, it is REQUIRED.

If the generated stanza includes an 'id' attribute then it is REQUIRED for the response or error stanza to also include an 'id' attribute, where the value of the 'id' attribute MUST match that of the generated stanza.

If an outbound stanza generated by a client does not possess an 'xml:lang' attribute, the client's server SHOULD add an 'xml:lang' attribute whose value is that specified for the stream as defined under Section 4.6.4 (xml:lang).

If an inbound stanza received by a client or server does not possess an 'xml:lang' attribute, an implementation MUST assume that the default language is that specified for the stream as defined under Section 4.6.4 (xml:lang).

8.2.
Basic Semantics

8.2.1.
Message Semantics

The <message/> stanza can be seen as a "push" mechanism whereby one entity pushes information to another entity, similar to the communications that occur in a system such as email. All message stanzas SHOULD possess a 'to' attribute that specifies the intended recipient of the message; upon receiving such a stanza, a server SHOULD route or deliver it to the intended recipient (see Section 10 (Server Rules for Processing XML Stanzas) for general routing and delivery rules related to XML stanzas).

8.2.2.
Presence Semantics

The <presence/> stanza can be seen as a specialized broadcast or "publish-subscribe" mechanism, whereby multiple entities receive information (in this case, network availability information) about an entity to which they have subscribed. In general, a publishing entity (client) SHOULD send a presence stanza with no 'to' attribute, in which case the server to which the entity is connected SHOULD broadcast that stanza to all subscribed entities. However, a publishing entity MAY also send a presence stanza with a 'to' attribute, in which case the server SHOULD route or deliver that stanza to the intended recipient. See Section 10 (Server Rules for Processing XML Stanzas) for general routing and delivery rules related to XML stanzas, and [XMPP‑IM] (Saint-Andre, P., “Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence,” October 2010.) for rules specific to presence applications.

The 'type' attribute is REQUIRED for IQ stanzas. The value MUST be one of the following (if the value is other than one of the following strings, the recipient or an intermediate router MUST return a stanza error of <bad-request/>):

get -- The stanza requests information, inquires about what data is needed in order to complete further operations, etc.

set -- The stanza provides data that is needed for an operation to be completed, sets new values, replaces existing values, etc.

result -- The stanza is a response to a successful get or set request.

error -- The stanza reports an error that has occurred regarding processing or delivery of a previously-sent get or set request (see Section 8.3 (Stanza Errors)).

An entity that receives an IQ request of type "get" or "set" MUST reply with an IQ response of type "result" or "error". The response MUST preserve the 'id' attribute of the request (or be empty if the generated stanza did not include an 'id' attribute).

An entity that receives a stanza of type "result" or "error" MUST NOT respond to the stanza by sending a further IQ response of type "result" or "error"; however, the requesting entity MAY send another request (e.g., an IQ of type "set" to provide obligatory information discovered through a get/result pair).

An IQ stanza of type "get" or "set" MUST contain exactly one child element, which specifies the semantics of the particular request.

An IQ stanza of type "result" MUST include zero or one child elements.

An IQ stanza of type "error" MAY include the child element contained in the associated "get" or "set" and MUST include an <error/> child; for details, see Section 8.3 (Stanza Errors).

8.3.
Stanza Errors

Stanza-related errors are handled in a manner similar to stream errors (Stream Errors). Unlike stream errors, stanza errors are recoverable; therefore they do not result in termination of the XML stream and underlying TCP connection. Instead, the entity that discovers the error condition returns an error stanza, which is a stanza that:

is of the same kind (message, presence, or IQ) as the generated stanza that triggered the error

has a 'type' attribute set to a value of "error"

swaps the 'from' and 'to' addresses of the generated stanza

mirrors the 'id' attribute (if any) of the generated stanza that triggered the error

contains an <error/> child element that specifies the error condition and therefore provides a hint regarding actions that the sender can take to remedy the error (if possible)

8.3.1.
Rules

The receiving or processing entity that detects an error condition in relation to a stanza SHOULD return an error stanza (and MUST do so for IQ stanzas).

The error stanza MUST simply swap the 'from' and 'to' addresses from the generated stanza.

If the generated stanza was <message/> or <presence/> and included an 'id' attribute then it is REQUIRED for the error stanza to also include an 'id' attribute. If the generated stanza was <iq/> then the error stanza MUST include an 'id' attribute. In all cases, the value of the 'id' attribute MUST match that of the generated stanza (or be empty if the generated stanza did not include an 'id' attribute).

An error stanza MUST contain an <error/> child element.

The entity that returns an error stanza MAY pass along its JID to the sender of the generated stanza (e.g., for diagnostic or tracking purposes) through the addition of a 'by' attribute to the <error/> child element.

The entity that returns an error stanza MAY include the original XML sent so that the sender can inspect and, if necessary, correct the XML before attempting to resend (however, this is a courtesy only and the originating entity MUST NOT depend on receiving the original payload).

An <error/> child MUST NOT be included if the 'type' attribute has a value other than "error" (or if there is no 'type' attribute).

An entity that receives an error stanza MUST NOT respond to the stanza with a further error stanza; this helps to prevent looping.

8.3.2.
Syntax

The syntax for stanza-related errors is as follows, where XML data shown within the square brackets '[' and ']' is OPTIONAL, 'intended-recipient' is the JID of the entity to which the original stanza was addressed, and 'sender' is the JID of the originating entity.

MAY contain a <text/> child element containing XML character data that describes the error in more detail; this element MUST be qualified by the 'urn:ietf:params:xml:ns:xmpp-stanzas' namespace and SHOULD possess an 'xml:lang' attribute specifying the natural language of the XML character data.

MAY contain a child element for an application-specific error condition; this element MUST be qualified by an application-specific namespace that defines the syntax and semantics of the element.

The <text/> element is OPTIONAL. If included, it MUST be used only to provide descriptive or diagnostic information that supplements the meaning of a defined condition or application-specific condition. It MUST NOT be interpreted programmatically by an application. It MUST NOT be used as the error message presented to a human user, but MAY be shown in addition to the error message associated with the defined condition element (and, optionally, the application-specific condition element).

8.3.3.
Defined Conditions

8.3.3.1.
bad-request

The sender has sent a stanza containing XML that does not conform to the appropriate schema or that cannot be processed (e.g., an IQ stanza that includes an unrecognized value of the 'type' attribute, or an element that is qualified by a recognized namespace but that violates the defined syntax for the element); the associated error type SHOULD be "modify".

8.3.3.3.
feature-not-implemented

The feature represented in the XML stanza is not implemented by the intended recipient or an intermediate server and therefore the stanza cannot be processed (e.g., the entity understands the namespace but does not recognize the element name); the associated error type SHOULD be "cancel" or "modify".

Security Note: An application MUST NOT return this error if doing so would provide information about the intended recipient's network availability to an entity that is not authorized to know such information; instead it MUST return a <service-unavailable/> stanza error.

Implementation Note: Enforcement of the format for XMPP localparts is primarily the responsibility of the service at which the associated account or entity is located (e.g., the example.com service is responsible for returning <jid-malformed/> errors related to all JIDs of the form <localpart@example.com>), whereas enforcement of the format for XMPP domainparts is primarily the responsibility of the service that seeks to route a stanza to the service identified by that domainpart (e.g., the example.org service is responsible for returning <jid-malformed/> errors related to stanzas that users of that service have to tried send to JIDs of the form <localpart@example.com>). However, any entity that detects a malformed JID MAY return this error.

8.3.3.9.
not-acceptable

The recipient or server understands the request but cannot process it because the request does not meet criteria defined by the recipient or server (e.g., a request to subscribe to information that does not simultaneously include configuration parameters needed by the recipient); the associated error type SHOULD be "modify".

8.3.3.13.
policy-violation

The entity has violated some local service policy (e.g., a message contains words that are prohibited by the service); the server MAY choose to specify the policy in the <text/> element or in an application-specific condition element; the associated error type SHOULD be "modify" or "wait" depending on the policy being violated.

(In the following example, the client sends an XMPP message that is too large according to the server's local service policy.)

Security Note: An application MUST NOT return this error if doing so would provide information about the intended recipient's network availability to an entity that is not authorized to know such information; instead it MUST return a <service-unavailable/> stanza error.

8.3.3.17.
remote-server-not-found

A remote server or service specified as part or all of the JID of the intended recipient does not exist or cannot be resolved (e.g., there is no _xmpp-server._tcp DNS SRV record, the A or AAAA fallback resolution fails, or A/AAAA lookup succeeds but there is no response on the IANA-registered port 5269); the associated error type SHOULD be "cancel".

8.3.3.18.
remote-server-timeout

A remote server or service specified as part or all of the JID of the intended recipient (or needed to fulfill a request) was resolved but communications could not be established within a reasonable amount of time (e.g., an XML stream cannot be established at the resolved IP address and port, or an XML stream can be established but stream negotiation fails because of problems with TLS, SASL, Server Dialback, etc.); the associated error type SHOULD be "wait".

Security Note: An application MUST return a <service-unavailable/> stanza error instead of <item-not-found/> or <recipient-unavailable/> if sending one of the latter errors would provide information about the intended recipient's network availability to an entity that is not authorized to know such information.

8.3.3.22.
undefined-condition

The error condition is not one of those defined by the other conditions in this list; any error type can be associated with this condition, and it SHOULD be used only in conjunction with an application-specific condition.

8.3.4.
Application-Specific Conditions

As noted, an application MAY provide application-specific stanza error information by including a properly-namespaced child within the error element. Typically, the application-specific element supplements or further qualifies a defined element. Thus, the <error/> element will contain two or three child elements.

A message or presence stanza MAY contain one or more optional child elements specifying content that extends the meaning of the message (e.g., an XHTML-formatted version of the message body as described in [XEP‑0071] (Saint-Andre, P., “XHTML-IM,” September 2008.)), and an IQ stanza of type "get" or "set" MUST contain one such child element. Such a child element MAY have any name and MUST possess a namespace declaration (other than "jabber:client", "jabber:server", or "http://etherx.jabber.org/streams") that defines the data contained within the child element. Such a child element is called an "extension element". An extension element can be included either at the direct child level of the stanza or in any mix of levels.

Similarly, "extension attributes" are allowed. That is: a stanza itself (i.e., the <iq/>, <message/>, and <presence/> elements qualified by the "jabber:client" or "jabber:server" content namespace) and any child element of such a stanza (whether an extension element or a child element qualified by the content namespace) MAY also include one or more attributes qualified by XML namespaces other than the content namespace or the reserved "http://www.w3.org/XML/1998/namespace" namespace (including the so-called "empty namespace" if the attribute is not prefixed; see [XML‑NAMES] (Thompson, H., Hollander, D., Layman, A., Bray, T., and R. Tobin, “Namespaces in XML 1.0 (Third Edition),” December 2009.)).

Interoperability Note: For the sake of backward compatibility and maximum interoperability, an entity that generates a stanza SHOULD NOT include such attributes in the stanza itself or in child elements of the stanza that are qualified by the content namespaces "jabber:client" or "jabber:server" (e.g., the <body/> child of the <message/> stanza).

An extension element or extension attribute is said to be "extended content" and the namespace name for such an element or attribute is said to be an "extended namespace".

Informational Note: Although extended namespaces for XMPP are commonly defined by the XMPP Standards Foundation (XSF) and by the IETF, no specification or IETF standards action is required to define extended namespaces, and any individual or organization is free to define XMPP extensions.

To illustrate these concepts, several examples follow.

The following stanza contains one direct child element whose extended namespace is 'jabber:iq:roster':

The following stanza contains two child elements, one of which is qualified by the "jabber:client" or "jabber:server" content namespace and one of which is qualified by an extended namespace; the extension element in turn contains a child element that is qualified by a different extended namespace.

It is conventional in the XMPP community for implementations to not generate namespace prefixes for elements that are qualified by extended namespaces (outside the XMPP community, this convention is sometimes called "prefix-free canonicalization"). However, if an implementation generates such namespace prefixes then it MUST include the namespace declaration in the stanza itself or a child element of the stanza, not in the stream header (see Section 4.7.3 (Other Namespaces)).

Routing entities (typically servers) SHOULD try to maintain prefixes when serializing XML stanzas for processing, but receiving entities MUST NOT depend on the prefix strings to have any particular value.

Support for any given extended namespace is OPTIONAL on the part of any implementation. If an entity does not understand such a namespace, the entity's expected behavior depends on whether the entity is (1) the recipient or (2) a server that is routing or delivering the stanza to the recipient.

If a recipient receives a stanza that contains an element or attribute it does not understand, it MUST NOT attempt to process that XML data and instead MUST proceed as follows.

If an entity receives a message stanza whose only child element is qualified by a namespace it does not understand, then depending on the XMPP application it MUST either ignore the entire stanza or return a stanza error, which SHOULD be <service-unavailable/>.

If an entity receives a presence stanza whose only child element is qualified by a namespace it does not understand, then it MUST ignore the child element by treating the presence stanza as if it contained no child element.

If an entity receives a message or presence stanza that contains XML data qualified by a namespace it does not understand, then it MUST ignore the portion of the stanza qualified by the unknown namespace.

If an entity receives an IQ stanza of type "get" or "set" containing a child element qualified by a namespace it does not understand, then the entity MUST return an IQ stanza of type "error" with an error condition of <service-unavailable/>.

If a server handles a stanza that is intended for delivery to another entity and that contains a child element it does not understand, it MUST route the stanza unmodified to a remote server or deliver the stanza unmodified to a connected client associated with a local account.

9.
Examples

9.1.
Client-to-Server Examples

The following examples show the XMPP data flow for a client negotiating an XML stream with a server, exchanging XML stanzas, and closing the negotiated stream. The server is "im.example.com", the server requires use of TLS, the client authenticates via the SASL SCRAM-SHA-1 mechanism as <juliet@im.example.com>, and the client binds a client-submitted resource to the stream. It is assumed that before sending the initial stream header, the client has already resolved an SRV record of _xmpp-client._tcp.im.example.com and has opened a TCP connection to the advertised port at the resolved IP address.

9.2.
Server-to-Server Examples

The following examples show the data flow for a server negotiating an XML stream with another server, exchanging XML stanzas, and closing the negotiated stream. The initiating server ("Server1") is im.example.com; the receiving server ("Server2") is example.net and it requires use of TLS; im.example.com presents a certificate and authenticates via the SASL EXTERNAL mechanism. It is assumed that before sending the initial stream header, Server1 has already resolved an SRV record of _xmpp-server._tcp.example.net and has opened a TCP connection to the advertised port at the resolved IP address. Note how Server1 declares the content namespace "jabber:server" as the default namespace and uses prefixes for stream-related elements, whereas Server2 uses prefix-free canonicalization.

9.2.3.
Stanza Exchange

Now Server1 is allowed to send XML stanzas to Server2 over the negotiated stream from im.example.com to example.net; here we assume that the transferred stanzas are those shown earlier for client-to-server communication, albeit over a server-to-server stream qualified by the 'jabber:server' namespace.

9.2.4.
Close

Desiring to send no further messages, Server1 closes the stream. (In practice, the stream would most likely remain open for some time, since Server1 and Server2 do not immediately know if the stream will be needed for further communication.)

S1: </stream:stream>

Consistent with the recommended stream closing handshake, Server2 closes the stream as well:

10.
Server Rules for Processing XML Stanzas

Each server implementation will contain its own logic for processing stanzas it receives. Such logic determines whether the server needs to route a given stanza to another domain, deliver it to a local entity (typically a connected client associated with a local account), or handle it directly within the server itself. This section provides general rules for processing XML stanzas. However, particular XMPP applications MAY specify delivery rules that modify or supplement the following rules (e.g., a set of delivery rules for instant messaging and presence applications is defined in [XMPP‑IM] (Saint-Andre, P., “Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence,” October 2010.)).

10.1.
In-Order Processing

An XMPP server MUST ensure in-order processing of the stanzas and other XML elements it receives over a given stream from a connected client or remote server (for purposes of this section we describe such a stream as an "input stream", in contrast to an "output stream" that a server would use to deliver data to a connected client or to route data to a remote server).

In-order processing applies (a) to any XML elements used to negotiate and manage XML streams, and (b) to all uses of XML stanzas, including but not limited to the following:

Stanzas sent by a connected client and intended for delivery to another entity associated with a local domain (e.g., stanzas addressed from <juliet@im.example.com> to <nurse@im.example.com>). The server MUST ensure that it delivers stanzas addressed to the intended recipient in the order it receives them over the input stream from the sending client, treating stanzas addressed to the bare JID and the full JID of the intended recipient as equivalent for delivery purposes.

Stanzas sent by a connected client and intended for delivery to an entity located at a remote domain (e.g., stanzas addressed from <juliet@im.example.com> to <romeo@example.net>). The routing server MUST ensure that it routes stanzas addressed to the intended recipient in the order it receives them over the input stream from the sending client, treating stanzas addressed to the bare JID and the full JID of the intended recipient as equivalent for routing purposes. To help ensure in-order processing, the routing server MUST route such stanzas over a single output stream to the remote domain, rather than sending some stanzas over one server-to-server stream and other stanzas over another server-to-server stream.

Stanzas routed from one server to another server for delivery to an entity associated with the remote domain (e.g., stanzas addressed from <juliet@im.example.com> to <romeo@example.net> and routed by <im.example.com> over a server-to-server stream to <example.net>). The delivering server MUST ensure that it delivers stanzas to the intended recipient in the order it receives them over the input stream from the routing server, treating stanzas addressed to the bare JID and the full JID of the intended recipient as equivalent for delivery purposes.

Stanzas sent by one server to another server for direct processing by the server that is hosting the remote domain (e.g., stanzas addressed from <im.example.com> to <example.net>).

If the server's processing of a particular request could have an effect on its processing of subsequent data it might receive over that input stream (e.g., enforcement of communication policies), it MUST suspend processing of subsequent data until it has processed the request.

In-order processing applies only to a single input stream. Therefore a server is not responsible for ensuring the coherence of data it receives across multiple input streams associated with the same local account (e.g., stanzas received over two different input streams from <juliet@im.example.com/balcony> and <juliet@im.example.com/chamber>) or the same remote domain (e.g., two different input streams negotiated by a remote domain; however, a server MAY return a <conflict> stream error to a remote server that attempts to negotiate more than one stream, as described under Section 4.8.3.3 (conflict)).

How a server processes stanzas sent to the bare JID <localpart@domainpart> has implications for directory harvesting.

How a server processes stanzas sent to a full JID has implications for presence leaks. However, the attack is less direct here (because the attacker needs to try many different resources in an attempt to find the one resource that matches) so it is of somewhat lesser concern.

10.3.
No 'to' Address

If the stanza possesses no 'to' attribute, the server MUST handle it directly on behalf of the entity that sent it, where the meaning of "handle it directly" depends on whether the stanza is message, presence, or IQ. Because all stanzas received from other servers MUST possess a 'to' attribute, this rule applies only to stanzas received from a local entity (typically a client) that is connected to the server.

10.3.3.
IQ

If the server receives an IQ stanza with no 'to' attribute, it MUST process the stanza on behalf of the account from which received the stanza, as follows:

If the IQ stanza is of type "get" or "set" and the server understands the namespace that qualifies the payload, the server MUST handle the stanza on behalf of the sending entity or return an appropriate error to the sending entity. Although the meaning of "handle" is determined by the semantics of the qualifying namespace, in general the server will respond to the IQ stanza of type "get" or "set" by returning an appropriate IQ stanza of type "result" or "error", responding as if the server were the bare JID of the sending entity. As an example, if the sending entity sends an IQ stanza of type "get" where the payload is qualified by the 'jabber:iq:roster' namespace (as described in [XMPP‑IM] (Saint-Andre, P., “Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence,” October 2010.)), then the server will return the roster associated with the sending entity's bare JID to the particular resource of the sending entity that requested the roster.

If the IQ stanza is of type "get" or "set" and the server does not understand the namespace that qualifies the payload, the server MUST return an error to the sending entity, which MUST be <service-unavailable/>.

If the IQ stanza is of type "error" or "result", the server MUST handle the error or result in accordance with the payload of the associated IQ stanza or type "get" of "set" (if there is no such associated stanza, the server MUST ignore the error or result stanza).

10.4.
Remote Domain

If the domainpart of the JID contained in the 'to' attribute does not match one of the configured hostnames of the server, the server SHOULD attempt to route the stanza to the remote domain (subject to local service provisioning and security policies regarding inter-domain communication, since such communication is optional for any given deployment). As described in the following sections, there are two possible cases.

Security Note: These rules apply only client-to-server streams. As described under Section 8.1.1.2 (Server-to-Server Streams), a server MUST NOT accept a stanza over a server-to-server stream if the domainpart of the JID in the 'to' attribute does not match a hostname serviced by the receiving server.

10.4.3.
Error Handling

If routing of a stanza to the intended recipient's server is unsuccessful, the sender's server MUST return an error to the sender. If resolution of the remote domain is unsuccessful, the stanza error MUST be <remote-server-not-found/>. If resolution succeeds but streams cannot be negotiated, the stanza error MUST be <remote-server-timeout/>.

If stream negotiation with the intended recipient's server is successful but the remote server cannot deliver the stanza to the recipient, the remote server MUST return an appropriate error to the sender by way of the sender's server.

10.5.
Local Domain

If the hostname of the domainpart of the JID contained in the 'to' attribute matches one of the configured hostnames of the server, the server MUST first determine if the hostname is serviced by the server itself or by a specialized local service. If the latter, the server MUST route the stanza to that service. If the former, the server MUST proceed as follows.

10.5.1.
Mere Domain

10.5.2.
Domain with Resource

If the JID contained in the 'to' attribute is of the form <domainpart/resourcepart>, then the server MUST either (a) handle the stanza as appropriate for the stanza kind or (b) return an error stanza to the sender.

10.5.3.2.
Bare JID

If the JID contained in the 'to' attribute is of the form <localpart@domainpart>, how the stanza is processed depends on the stanza type.

For a message stanza, if there exists at least one connected resource for the account the server SHOULD deliver it to at least one of the connected resources. If there exists no connected resource, the server MUST either (a) store the message offline for delivery when the account next has a connected resource or (b) return a <service-unavailable/> stanza error.

10.5.3.3.
Full JID

If the JID contained in the 'to' attribute is of the form <localpart@domainpart/resourcepart> and there is no connected resource that exactly matches the full JID, the stanza SHOULD be processed as if the JID were of the form <localpart@domainpart>.

If the JID contained in the 'to' attribute is of the form <localpart@domainpart/resourcepart> and there is a connected resource that exactly matches the full JID, the server MUST deliver the stanza to that connected resource.

An XMPP implementation MUST behave as follows with regard to these features:

An XMPP implementation MUST NOT inject characters matching such features into an XML stream.

If an XMPP implementation receives characters matching such features over an XML stream, it MUST return a stream error, which SHOULD be <restricted-xml/> (although some existing implementations send <bad-format/> instead).

An XMPP entity MUST NOT generate data that is not XML-well-formed. An XMPP entity MUST NOT accept data that is not XML-well-formed; instead it MUST return a <not-well-formed/> stream error and close the stream over which the data was received.

An XMPP entity MUST NOT generate data that is not namespace-well-formed. An XMPP entity MUST NOT accept data that is not namespace-well-formed (in particular, an XMPP server MUST NOT route or deliver data that is not namespace-well-formed); instead it MUST return either a stanza error of <not-acceptable/> or a stream error of <not-well-formed/> (where it is preferable to return a stream error because accepting such data can open an entity to certain denial of service attacks).

Implementation Note: Because it is mandatory for an XMPP implementation to support all and only the UTF-8 encoding and because UTF-8 always has the same byte order, an implementation MUST NOT send a byte order mark ("BOM") at the beginning of the data stream. If an entity receives the [UNICODE] (The Unicode Consortium, “The Unicode Standard, Version 3.2.0,” 2000.) character U+FEFF anywhere in an XML stream (including as the first character of the stream), it MUST interpret that character as a zero width no-break space, not as a byte order mark.

12.
Internationalization Considerations

As specified under Section 4.6 (Stream Attributes), an XML stream SHOULD include an 'xml:lang' attribute specifying the default language for any XML character data that is intended to be presented to a human user. As specified under Section 8.1.5 (xml:lang), an XML stanza SHOULD include an 'xml:lang' attribute if the stanza contains XML character data that is intended to be presented to a human user. A server SHOULD apply the default 'xml:lang' attribute to stanzas it routes or delivers on behalf of connected entities, and MUST NOT modify or delete 'xml:lang' attributes on stanzas it receives from other entities.

13.
Security Considerations

13.1.
Fundamentals

XMPP technologies are typically deployed using a decentralized client-server architecture. As a result, several paths are possible when two XMPP entities need to communicate:

Both entities are servers. In this case, the entities can establish a direct server-to-server stream between themselves.

One entity is a server and the other entity is a client whose account is hosted on that server. In this case, the entities can establish a direct client-to-server stream between themselves.

Both entities are clients whose accounts are hosted on the same server. In this case, the entities cannot establish a direct stream between themselves, but there is only one intermediate entity between them, whose policies they might understand and in which they might have some level of trust (e.g., the server might require the use of Transport Layer Security for all client connections).

Both entities are clients but their accounts are hosted on different servers. In this case, the entities cannot establish a direct stream between themselves and there are two intermediate entities between them; each client might have some trust in the server that hosts its account but might know nothing about the policies of the server to which the other client connects.

This specification covers only the security of a direct XML stream between two servers or between a client and a server (cases #1 and #2), where each stream can be considered a single "hop" along a communication path. The goal of security for a multi-hop path (cases #3 and #4), although very desirable, is out of scope for this specification.

In accordance with [SEC‑GUIDE] (Rescorla, E. and B. Korver, “Guidelines for Writing RFC Text on Security Considerations,” July 2003.), this specification covers communication security (confidentiality, data integrity, and peer entity authentication), non-repudiation, and systems security (unauthorized usage, inappropriate usage, and denial of service). We also discuss common security issues such as information leaks, firewalls, and directory harvesting, as well as best practices related to the re-use of technologies such as base64, DNS, cryptographic hash functions, SASL, TLS, UTF-8, and XML.

13.4.
Confidentiality and Integrity

The use of Transport Layer Security (TLS) with non-null cipher suites provides a reliable mechanism for the ensuring the confidentiality and integrity of data exchanged between a client and a server or between two servers. Therefore TLS helps to protect against eavesdropping, password sniffing, man-in-the-middle attacks, and stanza replays, insertion, deletion, and modification over an XML stream. XMPP clients and servers MUST support TLS as defined under Section 5 (STARTTLS Negotiation).

Informational Note: The confidentiality and integrity of a stream can be ensured by methods other than TLS, e.g. by means of a SASL mechanism that involves negotiation of a security layer.

Security Note: The use of TLS in XMPP applies to a single stream. Because XMPP is typically deployed using a distributed client-server architecture (as explained under Section 2.5 (Distributed Network of Clients and Servers)), a stanza might traverse multiple streams, and not all of those streams might be TLS-protected. For example, a stanza sent from a client with a session at one server (e.g., <romeo@example.net/orchard>) and intended for delivery to a client with a session at another server (e.g., <juliet@example.com/balcony>) will traverse three streams: the stream from the sender's client to its server, the stream from the sender's server to the recipient's server, and the stream from the recipient's server to the recipient's client. Furthermore, the stanza will be processed as cleartext within the sender's server and the recipient's server. Therefore, even if the stream from the sender's client to its server is protected, the confidentiality and integrity of a stanza sent over that protected stream cannot be guaranteed when the stanza is processed by the sender's server, sent from the sender's server to the recipient's server, processed by the recipient's server, or sent from the recipient's server to the recipient's client. Only a robust technology for end-to-end encryption could ensure the confidentiality and integrity of a stanza as it traverses all of the "hops" along a communication path (e.g., a technology that meets the requirements defined in [E2E‑REQS] (Saint-Andre, P., “Requirements for End-to-End Encryption in the Extensible Messaging and Presence Protocol (XMPP),” March 2010.)). Unfortunately, the XMPP community has so far failed to produce an end-to-end encryption technology that might be suitable for widespread implementation and deployment, and definition of such a technology is out of scope for this document.

13.5.
Peer Entity Authentication

The use of the Simple Authentication and Security Layer (SASL) for authentication provides a reliable mechanism for peer entity authentication. Therefore SASL helps to protect against user spoofing, unauthorized usage, and man-in-the middle attacks. XMPP clients and servers MUST support SASL as defined under Section 6 (SASL Negotiation).

13.6.
Strong Security

[STRONGSEC] (Schiller, J., “Strong Security Requirements for Internet Engineering Task Force Standard Protocols,” August 2002.) defines "strong security" and its importance to communication over the Internet. For the purpose of XMPP communication over client-to-server and server-to-server streams, the term "strong security" refers to the use of security technologies that provide both mutual authentication and integrity checking (e.g., a combination of TLS encryption and SASL authentication using appropriate SASL mechanisms). In particular, when using certificate-based authentication to provide strong security, a trust chain SHOULD be established out-of-band, although a shared certification authority signing certificates could allow a previously unknown certificate to establish trust in-band. See the next section regarding certificate validation procedures.

Implementations MUST support strong security. Service provisioning SHOULD use strong security.

The initial stream and the response stream MUST be secured separately, although security in both directions MAY be established via mechanisms that provide mutual authentication.

13.7.
Certificates

Channel encryption of an XML stream using Transport Layer Security as described under Section 5 (STARTTLS Negotiation), and in some cases also authentication as described under Section 6 (SASL Negotiation), is commonly based on a digital certificate presented by the receiving entity (or, in the case of mutual authentication, both the receiving entity and the initiating entity). This section describes best practices regarding the generation of digital certificates to be presented by XMPP entities and the verification of digital certificates presented by XMPP entities.

Support for the XmppAddr identifier type (specified under Section 13.7.1.4 (XmppAddr Identifier Type)) is encouraged in XMPP client and server software implementations for the sake of backward-compatibility, but is no longer encouraged in certificates issued by certification authorities or requested by service providers.

13.7.1.2.2.
Examples

For our first (relatively simple) example, consider a company called "Example Products, Inc." It hosts an XMPP service at "im.example.com" (i.e., user addresses at the service are of the form "user@im.example.com"), and SRV lookups for the xmpp-client and xmpp-server services at "im.example.com" yield one machine, called "x.example.com", as follows:

For our second (more complex) example, consider an ISP called "Example Internet Services". It hosts an XMPP service at "example.net" (i.e., user addresses at the service are of the form "user@example.net"), but SRV lookups for the xmpp-client and xmpp-server services at "example.net" yield two machines ("x1.example.net" and "x2.example.net"), as follows:

Example Internet Services also hosts chatrooms at chat.example.net, and provides an xmpp-server SRV record for that service as well (thus enabling entity from remote domains to access that service). It also might provide other such services in the future, so it wishes to represent a wildcard in its certificate to handle such growth.

The certificate presented by either x1.example.net or x2.example.net contains the following representations:

13.7.1.3.
Client Certificates

In a digital certificate to be presented by an XMPP client controlled by a human user (i.e., a CLIENT CERTIFICATE), it is RECOMMENDED for the certificate to include one or more JIDs associated with an XMPP user. If included, a JID MUST be represented as an XmppAddr as specified under Section 13.7.1.4 (XmppAddr Identifier Type).

13.7.2.
Certificate Validation

When an XMPP entity is presented with a server certificate or client certificate by a peer for the purpose of encryption or authentication of XML streams as described under Section 5 (STARTTLS Negotiation) and Section 6 (SASL Negotiation), the entity MUST attempt to validate the certificate to determine if the certificate will be considered a TRUSTED CERTIFICATE, i.e., a certificate that is acceptable for encryption and/or authentication in accordance with the XMPP entity's local service policies or configured settings.

For both server certificates and client certificates, the validating entity MUST attempt to verify the integrity of the certificate, MUST attempt to verify that the certificate has been properly signed by the issuing Certificate Authority, MUST attempt to validate the full certification path, and MUST support certificate revocation messages. An implementation MUST enable a human user to view information about the certification path.

If these validation attempts fail, either entity MAY choose to unilaterally terminate the session.

The following sections describe certificate validation rules for server-to-server and client-to-server streams.

The server finds one XmppAddr for which the domainpart of the represented JID matches one of the configured hostnames of the server; the server SHOULD use this represented JID as the validated identity of the client.

Sub-Case #2:

The server finds more than one XmppAddr for which the domainpart of the represented JID matches one of the configured hostnames of the server; the server SHOULD use one of these represented JIDs as the validated identity of the client, choosing among them according to local service policies or based on the 'to' address of the initial stream header.

Sub-Case #3:

The server finds no XmppAddrs, or finds at least one XmppAddr but the domainpart of the represented JID does not match one of the configured hostnames of the server; the server MUST NOT use the represented JID (if any) as the validated identity of the client but instead MUST either validate the identity of the client using other means.

13.7.2.2.3.
Case #3

13.7.2.3.
Checking of Certificates in Long-Lived Streams

Because XMPP uses long-lived XML streams, it is possible that a certificate presented during stream negotiation might expire or be revoked while the stream is still live (this is especially relevant in the context of server-to-server streams). Therefore, each party to a long-lived stream SHOULD:

Cache the expiration date of the certificate presented by the other party and any certificates on which that certificate depends (such as a root or intermediate certificate for a certification authority), and close the stream when any such certificate expires, with a stream error of <reset/> (Section 4.8.3.16 (reset)).

After the stream is closed, the initiating entity from the closed stream will need to re-connect and the receiving entity will need to authenticate the initiating entity based on whatever certificate it presents during negotiation of the new stream.

13.7.2.4.
Use of Certificates in XMPP Extensions

Certificates MAY be used in extensions to XMPP for the purpose of application-layer encryption or authentication above the level of XML streams (e.g., for end-to-end encryption). Such extensions will define their own certificate handling rules, which at a minimum SHOULD be consistent with the rules defined in this specification but MAY specify additional rules.

13.9.
Technology Reuse

13.9.1.
Use of base64 in SASL

Both the client and the server MUST verify any base64 data received during SASL negotiation (SASL Negotiation). An implementation MUST reject (not ignore) any characters that are not explicitly allowed by the base64 alphabet; this helps to guard against creation of a covert channel that could be used to "leak" information.

An implementation MUST NOT break on invalid input and MUST reject any sequence of base64 characters containing the pad ('=') character if that character is included as something other than the last character of the data (e.g., "=AAA" or "BBBB=CCC"); this helps to guard against buffer overflow attacks and other attacks on the implementation.

While base 64 encoding visually hides otherwise easily recognized information (such as passwords), it does not provide any computational confidentiality.

13.9.3.
Use of Hash Functions

XMPP itself does not directly mandate the use of any particular hash function. However, technologies on which XMPP depends (e.g., TLS and particular SASL mechanisms), as well as various XMPP extensions, might make use of hash functions. Those who implement XMPP technologies or who develop XMPP extensions are advised to closely monitor the state of the art regarding attacks against cryptographic hashes in Internet protocols as they relate to XMPP. For helpful guidance, refer to [HASHES] (Hoffman, P. and B. Schneier, “Attacks on Cryptographic Hashes in Internet Protocols,” November 2005.).

13.9.4.
Use of SASL

Because the initiating entity chooses an acceptable SASL mechanism from the list presented by the receiving entity, the initiating entity depends on the receiving entity's list for authentication. This dependency introduces the possibility of a downgrade attack if an attacker can gain control of the channel and therefore present a weak list of mechanisms. To help prevent this attack, the parties SHOULD protect the channel using TLS before attempting SASL negotiation.

Most XMPP servers authenticate account connections by means of passwords. It is well-known that most human users choose relatively weak passwords. Although service provisioning is out of scope for this document, XMPP servers that allow password-based authentication SHOULD enforce minimal criteria for password strength to help prevent dictionary attacks.

13.10.
Information Leaks

13.10.1.
IP Addresses

13.10.2.
Presence Information

One of the core aspects of XMPP is presence: information about the network availability of an XMPP entity (i.e., whether the entity is currently online or offline). A "presence leak" occurs when an entity's network availability is inadvertently and involuntarily revealed to a second entity that is not authorized to know the first entity's network availability.

Although presence is discussed more fully in [XMPP‑IM] (Saint-Andre, P., “Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence,” October 2010.), it is important to note that an XMPP server MUST NOT leak presence. In particular at the core XMPP level, real-time addressing and network availability is associated with a specific connected resource; therefore, any disclosure of a connected resource's full JID comprises a presence leak. To help prevent such a presence leak, a server MUST NOT return different stanza errors if a potential attacker sends XML stanzas to the entity's bare JID (<localpart@domainpart>) or full JID (<localpart@domainpart/resourcepart>).

13.12.
Denial of Service

A Denial-of-Service (DoS) attack is an attack in which one or more machines target a victim and attempt to prevent the victim from doing useful work. The victim can be a network server, client or router, a network link or an entire network, an individual Internet user or a company doing business using the Internet, an Internet Service Provider (ISP), country, or any combination of or variant on these.

Some considerations discussed in this document help to prevent denial of service attacks (e.g., the mandate that a server MUST NOT process XML stanzas from clients that have not yet provided appropriate authentication credentials and MUST NOT process XML stanzas from peer servers whose identity it has not either authenticated via SASL or weakly verified via Server Dialback).

A server implementation SHOULD enable a server administrator to limit the number of TCP connections that it will accept from a given IP address at any one time. If an entity attempts to connect but the maximum number of TCP connections has been reached, the receiving server MUST NOT allow the new connection to proceed.

A server implementation SHOULD enable a server administrator to limit the number of TCP connection attempts that it will accept from a given IP address in a given time period. If an entity attempts to connect but the maximum number of connection attempts has been reached, the receiving server MUST NOT allow the new connection to proceed.

A server implementation SHOULD enable a server administrator to limit the number of connected resources it will allow an account to bind at any one time. If a client attempts to bind a resource but it has already reached the configured number of allowable resources, the receiving server MUST return a <resource-constraint/> stanza error.

A server implementation SHOULD enable a server administrator to limit the size of stanzas it will accept from a connected client or peer server (where "size" is inclusive of all XML markup as defined in Section 2.4 of [XML] (Maler, E., Yergeau, F., Sperberg-McQueen, C., Paoli, J., and T. Bray, “Extensible Markup Language (XML) 1.0 (Fifth Edition),” November 2008.), from the opening "<" character of the stanza to the closing ">" character). An entity's maximum stanza size MUST NOT be smaller than 10000 bytes. If a connected resource or peer server sends a stanza that violates the upper limit, the receiving server MUST either return a <policy-violation/> stanza error (thus allowing the sender to recover) or close the stream with a <policy-violation/> stream error.

A server implementation SHOULD enable a server administrator to limit the number of XML stanzas that a connected client is allowed to send to distinct recipients within a given time period. If a connected client sends too many stanzas to distinct recipients in a given time period, the receiving server SHOULD NOT process the stanza and instead SHOULD return a <policy-violation/> stanza error.

A server implementation SHOULD enable a server administrator to limit the amount of bandwidth it will allow a connected client or peer server to use in a given time period.

A server implementation MAY enable a server administrator to limit the types of stanzas (based on the extended content "payload") that it will allow a connected resource or peer server send over an active connection. Such limits and restrictions are a matter of deployment policy.

A server implementation MAY refuse to route or deliver any stanza that it considers to be abusive, with or without returning an error to the sender.

13.13.
Firewalls

Although DNS SRV records can instruct connecting entities to use TCP ports other than 5222 (client-to-server) and 5269 (server-to-server), communication using XMPP typically occurs over those ports, which are registered with the IANA (see Section 14 (IANA Considerations)). Use of these well-known ports allows administrators to easily enable or disable XMPP activity through existing and commonly-deployed firewalls.

13.14.
Interdomain Federation

The term "federation" is commonly used to describe communication between two servers.

Because service provisioning is a matter of policy, it is OPTIONAL for any given server to support federation. If a particular server enables federation, it SHOULD enable strong security as previously described to ensure both authentication and confidentiality; compliant implementations SHOULD support TLS and SASL for this purpose.

Before RFC 3920 defined TLS plus SASL EXTERNAL with certificates for encryption and authentication of server-to-server streams, the only method for weak identity verification of a peer server was Server Dialback as defined in [XEP‑0220] (Miller, J., Saint-Andre, P., and P. Hancke, “Server Dialback,” March 2010.). Even when [DNSSEC] (Arends, R., Austein, R., Larson, M., Massey, D., and S. Rose, “DNS Security Introduction and Requirements,” March 2005.) is used, Server Dialback provides only weak identity verification and provides no confidentiality or integrity. At the time of writing, Server Dialback is still the most widely-used technique for some level of assurance over server-to-server streams. This reality introduces the possibility of a downgrade attack from TLS + SASL EXTERNAL to Server Dialback if an attacker can gain control of the channel and therefore convince the initiating server that the receiving server does not support TLS or does not have an appropriate certificate. To help prevent this attack, the parties SHOULD protect the channel using TLS before proceeding, even if the presented certificates are self-signed or otherwise untrusted.

13.15.
Non-Repudiation

Systems that provide both peer entity authentication and data integrity have the potential to enable an entity to prove to a third party that another entity intended to send particular data. Although XMPP systems can provide both peer entity authentication and data integrity, XMPP was never designed to provide non-repudiation.

15.
Conformance Requirements

This section describes a protocol feature set that summarizes the conformance requirements of this specification. This feature set is appropriate for use in software certification, interoperability testing, and implementation reports. For each feature, this section provides the following information:

A human-readable name

An informational description

A reference to the particular section of this document that normatively defines the feature

Whether the feature applies to the Client role, the Server role, or both (where "N/A" signifies that the feature is not applicable to the specified role)

Correctly process XML data qualified by an unsupported XML namespace, where "correctly process" means to ignore that portion of the stanza in the case of a message or presence stanza and return an error in the case of an IQ stanza (for the intended recipient), and to route or deliver the stanza (for a routing entity such as a server).

Include exactly one child element in an <iq/> stanza of type "get" or "set", zero or one child elements in an <iq/> stanza of type "result", and one or two child elements in an <iq/> stanza of type "error".

Consider the previous stream to be replaced upon negotiation of a stream feature that necessitates a stream restart, and send or receive a new initial stream header after negotiation of such a stream feature.

Appendix B.
Contact Addresses

Consistent with [MAILBOXES] (Crocker, D., “MAILBOX NAMES FOR COMMON SERVICES, ROLES AND FUNCTIONS,” May 1997.), an organization that offers an XMPP service SHOULD provide an Internet mailbox of "XMPP" for inquiries related to that service, where the host portion of the resulting mailto URI MUST be the organization's domain, not the domain of the XMPP service itself (e.g., the XMPP service might be offered at im.example.com but the Internet mailbox would be <xmpp@example.com>).

Specified return of the <restricted-xml/> stream error in response to receipt of prohibited XML features.

Specified that the SASL SCRAM mechanism is a mandatory-to-implement technology for client-to-server streams.

Specified that TLS plus the SASL PLAIN mechanism is a mandatory-to-implement technology for client-to-server streams.

Specified that support for the SASL EXTERNAL mechanism is required for servers but only recommended for clients (since end-user X.509 certificates are difficult to obtain and not yet widely deployed).

Removed the hard two-connection rule for server-to-server streams.

More clearly specified the certificate profile for both public key certificates and issuer certificates.

Added the <reset/> streams error condition to handle expired/revoked certificates or the addition of security-critical features to an existing stream.

Added the <account-disabled/>, <credentials-expired/>, <encryption-required/>, <malformed-request/>, and <transition-needed/> SASL error conditions to handle error flows mistakenly left out of RFC 3920 or discussed in RFC 4422 but not in RFC 2222.

Removed unnecessary requirement for escaping of characters that map to certain predefined entities, which do not need to be escaped in XML.

Clarified the process of DNS SRV lookups and fallbacks.

Clarified the handling of SASL security layers.

Clarified the stream negotiation process and associated flow chart.

Clarified the handling of stream features.

Added a 'by' attribute to the <error/> element for stanza errors so that the entity that has detected the error can include its JID for diagnostic or tracking purposes.

Clarified the handling of data that violates the well-formedness definitions for XML 1.0 and XML namespaces.

Specified the security considerations in more detail, especially with regard to presence leaks and denial of service attacks.

Moved documentation of the Server Dialback protocol from this specification to a separate specification maintained by the XMPP Standards Foundation.

In addition, numerous changes of an editorial nature were made in order to more fully specify and clearly explain XMPP.