Scalable XMPP bots with erlang and exmpp, part I

Introducing EXMPP

In this series of articles we will introduce exmpp, the long awaited high-performance XMPP library for erlang, released by ProcessOne weeks ago. While doing so, we will learn how to use the library to build a highly scalable XMPP bot. As a library, exmpp is oriented towards high performance and low memory usage, while being general enough to be useful in different scenarios (clients, servers, components).

In this first installment we will learn the basis of exmpp, and start with our bot implementation.

Data structures

When working with exmpp, there are two sets of data structures (exmpp uses erlang records for its data structures) that you will be working with most of the time: the ones that represents XML fragments, and the one that represent JIDs.

#xmlel{}

This record represents an element node in a XML tree. It contains information on the element’s default namespace, tag name, attributes and child nodes. As the time of this writing, the definition of xmlel{} is:

declared_ns is used for namespace declarations, useful when you need to know if a specific namespace is defined specifically at a given element, even if that namespace is not used anywhere. Yes, there are such cases . Most of the time, you won’t care about that field. The rest of the fields are self documenting.

Basic API

Most of the time, you don’t need to know the exact data representation, as exmpp provides a rich set of APIs to work with, that will make your life much easier. Not only you don’t need to work directly at the data representation level, you are encouraged to not doing so. By using the API you get all the benefit of data encapsulation, and allow your code to easily adapt in the case of a future change in the physical representation.

This is specially true for JIDs (even the definition of the #jid records is in an internal header file that your code isn’t supposed to import), as their use is so pervasive in XMPP entities, than a small improvement on its physical representation can lead to big gains for the entire application. The current representation is a compromise between memory and CPU usage, that was found to work well in our tests, but it isn’t written on stone.

If you browse the exmpp source code, you will note that it is splitted in a set of directories, according to its intended usage target:

src/core Contains the core functionality on exmpp. Here you will find functions for XML parsing and serialization, for working with the XML tree, with JIDs, etc.

src/client Functions in this module are targeted to build and interpret stanzas generated at the client side of the XMPP protocol.

src/server Functions in this module are targeted to build and interpret stanzas generated at the server side of the XMPP protocol.

At this point you might be wondering why some values are represented as atoms in the xml structure (for example the ‘jabber:client’ namespace, the ‘type’ attribute or the ‘body’ tag). This is the consequence of an important design decision within exmpp. The fact is that exmpp knows more about XMPP than a general purpose XML parser. It knows the ‘jabber:client’ namespace, and it knows that it will be used in all c2s stanzas. So why waste memory repeating it again and again each time?

Exmpp uses erlang atoms to represent those well-known values, thus saving memory and processing time. The set of known tags, namespaces and attributes is taken from the XMPP specification and the published XEPs.

Non-blocking parser

In our examples so far, we were parsing an entire document at one single step. This isn’t of great help for a streaming protocol like XMPP. But that was only a special case, exmpp is built around stream parsers, and prepared to receive its input chunk by chunk:

So far, so good. exmpp_xml:start_parser/0 creates a parser using the default options. One of such options is {root_depth, 0}, that instructs exmpp to return the parsed data only after the entire document has been parsed (it returns the element at depth 0, the root element).

We are closer, but still do not have what we need. In XMPP, all stanzas are children of the opening <stream> element, exmpp can’t wait for the ending </stream>. What we really want is to receive entire stanzas, that are at depth 1:

In the above example, we got the opening stanza as soon as exmpp founds it. Note that the children field is set to ‘undefined’, to inform us that at this point we haven’t parsed the children nodes yet.

In contrast, the <message/> stanza is delivered by exmpp only when the entire stanza has been parsed, as it is at the desired root_depth. Most of the cases this is the desired behavior (besides 0, you can change the root_depth option to any other positive integer, not only 1).

Streams

We have seen how to manipulate and parse stanzas. Exmpp also provides helper modules to establish and interact with XMPP streams, in different scenarios. These modules are in charge of establishing the corresponding network connection, handling authentication, initializing and feeding the parser, and all the basic things that are required by any entity communicating with the XMPP network.

In the examples directory of exmpp’s source distribution you will find a module named echo_client.erl. It is a simple bot that connects to the server as a normal user, and echoes back any message you send to him. It is small enough to be included here: