Why XMPP?

The motivating factor in researching XMPP for use with Kestrel is the issue of virtual machines being terminated without any cleanup or signals that the machine is leaving the pool. XMPP provides the infrastructure for determining if a machine is connected to the pool. It also provides a mechanism for sharing the status of a connected machine (idle, busy, etc).

The impetus for researching IM protocols came from reading that a common feature of botnets is using IRC for communication with the command and control.

What is XMPP?

XMPP stands for Extensible Messaging and Presence Protocol, and it was accepted as a standard by the IETF in 2004 (RFC 3920, RFC 3921).

Connection Implementations

Typically, TCP is used for the underlying connection between XMPP clients and servers, but HTTP can be used as well (BOSH). Since the connection originates with the client entity and client to client connections are not used, NAT traversal is not an issue.

Security

XMPP specifies that implementations must use TLS and SASL.

XML

The protocol works by streaming two XML files for sending receiving XML Stanzas. An XML stanza is one of message, presence, or iq. The message and presence stanzas send either user content or information about the user's status (idle, busy, etc). The iq stanza is used for querying the XMPP server using get/set operations.

The JID

The JID is the identifier for each client in the system and are similar to email addresses. Each JID can consist of a username, domain, and a resource. The format is: username@domain/resource. The server itself can be addressed with just the domain. Since a user can have multiple connections to the server open at any given time, the resource component is used to differentiate them. A message sent to username@domain will be sent to all connections from that user. However, a message sent to username@domain/resource will be sent only to that particular connection.

Server Federation

XMPP can interoperate between servers similarly to how SMTP can. A message sent to a JID on a different domain will be sent to that domain to be delivered.

How can we use XMPP?

The basic infrastructure provided by XMPP already matches the setup and needs of our machine pools. We can use JIDs to identify each machine, as well as each core if we use the JID resource identifiers. Handling machines suddenly dropping from the pool is also already provided for us.

The next step is creating an architecture for the actual Kestrel application. There are two ways this can be done: implement Kestrel as an XMPP client bot, or implement Kestrel as an XMPP server component. In either case, we will have to create a client bot for use on the machines.

A client bot is simply an XMPP client that is a program instead of a human user. The bot can send and receive messages, and respond to them accordingly.

Server components come in two flavors: internal and external. Internal components are actual additions to the server code itself using a provided plugin API. Such components can be constrained by the licensing of the server, particularly if it is GPL software. External components are separate programs from the server and communicate with it using a standard protocol (XEP 0114), and as such do not rely on any particular server implementation.

There is very little difference between client bots and components in terms of the actual business logic. The main difference is scalability - components do not have to deal with rosters, which can be a large drain of resources when rosters become large. Since we want Kestrel to be able to manage several thousand machines at once, using components will be the implementation goal. However, we can start with a simple client bot to flesh out the API and later transition to a component once the API is more stable.

Available Implementations

There are several XMPP server implementations available, and nearly all of them are provided under the GPL. The most popular ones are jabberd2 and ejabberd. Jabberd2 is written in C, and ejabberd is written in Erlang. Ejabberd currently is the most popular XMPP server implementation.

There are a few Python libraries for XMPP available (xmpppy, PyXMPP). Xmpppy is provided under the GPL, and PyXMPP is provided under the LGPL.

Also, the Twisted Python framework in addition with Wokkel provide an XMPP server and client library using the MIT license.